Add new SentenceTransformer model.

Browse files

Files changed (9) hide show

README.md +211 -89
config.json +2 -2
config_sentence_transformers.json +3 -3
model.safetensors +1 -1
modules.json +6 -0
sentence_bert_config.json +1 -1
special_tokens_map.json +2 -2
tokenizer.json +1 -1
tokenizer_config.json +8 -1

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-base_model: microsoft/mpnet-base
 library_name: sentence-transformers
 pipeline_tag: sentence-similarity
 tags:
@@ -7,62 +7,126 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:3059
-- loss:MultipleNegativesSymmetricRankingLoss
 widget:
-- source_sentence: Solve length problems involving similar shapes where the missing
-    length is the shorter one Rectangle P has been enlarged by a scale factor of 1.5
-    to give rectangle Q. What length should replace the star? ![Two rectangles, the
-    smaller labelled P and the larger labelled Q. Q has width 9cm and length 12cm.
-    The length of P is marked with a star.]() 9 cm
   sentences:
-  - Does not recognise the corresponding sides in similar shapes or enlargements
-  - Does not know what a cube number is
-  - When solving a problem that requires an inverse operation (e.g. missing number
-    problems), does the original operation
-- source_sentence: Recognise that the diameter is twice the radius If the diameter
-    of a circle is 5.4 cm, the radius is... 10.8 cm
   sentences:
-  - Believes you can add or subtract from inside brackets without expanding when solving
-    an equation
-  - Does not understand that shapes are congruent if they have the same size and shape
-  - Doubles the diameter when finding the radius
-- source_sentence: 'Multiply proper fractions in the form: Fraction × Fraction Calculate:
-    1/9×1/5 1/14'
   sentences:
-  - Believes the mode is the most common frequency rather than the highest frequency
-  - When multiplying fractions, multiplies the numerator and adds the denominator
-  - Converts a fraction to a decimal by using only the numerator after the decimal
-    point
-- source_sentence: Find missing angles using angles around a point What is the size
-    of angle x ? ![Angles around a point split into two parts, one is labelled 290
-    degrees and the other x]() 45^∘
   sentences:
-  - Does not know that angles around a point sum to 360
-  - Does not know how to find the next term in a sequence
-  - Added the values together instead of finding the percentage
-- source_sentence: Given a positive x value, find the corresponding y value for reciprocal
-    graphs This is a part of the table of values for the equation y=3/x x 3 y What
-    should replace the star? 0
   sentences:
-  - Believes that a fraction with equal numerator and denominator cancels to 0
-  - Mixes up squaring and multiplying by 2 or doubling
-  - Gives the vertex instead of the 3-letter angle notation
 ---
-# SentenceTransformer based on microsoft/mpnet-base
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
-- **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 768 tokens
 - **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
@@ -76,8 +140,9 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
   (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
 )
 ```
@@ -99,9 +164,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
 # Run inference
 sentences = [
-    'Given a positive x value, find the corresponding y value for reciprocal graphs This is a part of the table of values for the equation y=3/x x 3 y What should replace the star? 0',
-    'Believes that a fraction with equal numerator and denominator cancels to 0',
-    'Mixes up squaring and multiplying by 2 or doubling',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -153,23 +218,23 @@ You can finetune this model on your own dataset.
 ### Training Dataset
-#### Unnamed Dataset
-* Size: 3,059 training samples
-* Columns: <code>sentence_0</code> and <code>sentence_1</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                          | sentence_1                                                                        |
-  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                              | string                                                                            |
-  | details | <ul><li>min: 13 tokens</li><li>mean: 56.34 tokens</li><li>max: 275 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.83 tokens</li><li>max: 39 tokens</li></ul> |
 * Samples:
-  | sentence_0                                                                                                                                                                         | sentence_1                                                                    |
-  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
-  | <code>Simplify algebraic expressions to maintain equivalence by collecting like terms involving just one linear variable 3(4 x+6)-2(x-9) ≡ A x+B What is the value of A ? 3</code> | <code>Only multiplies the numerical terms when expanding a bracket</code>     |
-  | <code>Express pictorial representations of objects as a ratio ![A group of 8 squares and 5 circles]() What is the ratio of squares to circles? 8: 13</code>                        | <code>When writing ratio from diagram, writes total as one side</code>        |
-  | <code>Find 100 less than a given number What number is 100 less than 325,076 ? 3250.76</code>                                                                                      | <code>Divides rather than subtracts when given the command 'less than'</code> |
-* Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
   ```json
   {
       "scale": 20.0,
@@ -181,9 +246,16 @@ You can finetune this model on your own dataset.
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
-- `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
@@ -192,24 +264,24 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
-- `learning_rate`: 5e-05
-- `weight_decay`: 0.0
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1
-- `num_train_epochs`: 3
 - `max_steps`: -1
-- `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.0
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
@@ -227,7 +299,7 @@ You can finetune this model on your own dataset.
 - `jit_mode_eval`: False
 - `use_ipex`: False
 - `bf16`: False
-- `fp16`: False
 - `fp16_opt_level`: O1
 - `half_precision_backend`: auto
 - `bf16_full_eval`: False
@@ -245,7 +317,7 @@ You can finetune this model on your own dataset.
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
 - `label_names`: None
-- `load_best_model_at_end`: False
 - `ignore_data_skip`: False
 - `fsdp`: []
 - `fsdp_min_num_params`: 0
@@ -297,29 +369,67 @@ You can finetune this model on your own dataset.
 - `batch_eval_metrics`: False
 - `eval_on_start`: False
 - `eval_use_gather_object`: False
-- `batch_sampler`: batch_sampler
-- `multi_dataset_batch_sampler`: round_robin
 </details>
 ### Training Logs
-| Epoch  | Step | Training Loss |
-|:------:|:----:|:-------------:|
-| 0.5    | 96   | -             |
-| 1.0    | 192  | -             |
-| 1.5    | 288  | -             |
-| 2.0    | 384  | -             |
-| 2.5    | 480  | -             |
-| 2.6042 | 500  | 0.8125        |
-| 3.0    | 576  | -             |
 ### Framework Versions
-- Python: 3.10.12
-- Sentence Transformers: 3.1.0
-- Transformers: 4.44.2
-- PyTorch: 2.4.0+cu121
-- Accelerate: 0.34.2
 - Datasets: 2.19.2
 - Tokenizers: 0.19.1
@@ -340,6 +450,18 @@ You can finetune this model on your own dataset.
 }
 ```
 <!--
 ## Glossary

 ---
+base_model: sentence-transformers/all-mpnet-base-v2
 library_name: sentence-transformers
 pipeline_tag: sentence-similarity
 tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:2940
+- loss:MultipleNegativesRankingLoss
 widget:
+- source_sentence: 'Question: Write algebraic expressions with correct algebraic convention
+    involving powers. Simplify, if possible:
+    (
+    a^2 x a
+    ).
+    Options:
+    A. 2 a^2
+    B. 3 a
+    C. a^3
+    D. Does not simplify
+    Answer: Does not simplify'
   sentences:
+  - Does not understand power notation
+  - Does not understand how to multiply algebraic terms
+  - Adds instead of multiplying when expanding bracket
+- source_sentence: 'Question: Recognise other roots of numbers. 4th root of (16)=?
+    Options:
+    A. 64
+    B. 16
+    C. 4
+    D. 2
+    Answer: 16'
   sentences:
+  - Believes the decimal point button writes a fraction
+  - Thinks that square root is found by dividing by 4
+  - Does not understand the root power of 4
+- source_sentence: 'Question: Add algebraic fractions with the same denominator. Write
+    this as a single fraction as simply as possible
+    (
+    (2 / x)+(3 / x)
+    ).
+    Options:
+    A. (5 x / x^2)
+    B. (5 / x)
+    C. (5 / 2 x)
+    D. (6 / x^2)
+    Answer: (5 / 2 x)'
   sentences:
+  - When adding fractions with identical numerators, leaves the numerator and adds
+    the denominators
+  - When there are two modes, finds the mean of these values and gives that as the
+    mode
+  - When adding fractions, adds the numerators and denominators
+- source_sentence: 'Question: Recognise perpendicular lines. These two lines are ...
+    Two lines on a graph meeting at a right angle.
+    Options:
+    A. parallelogram
+    B. perpendicular
+    C. parallel
+    D. particular
+    Answer: parallel'
   sentences:
+  - Believes perpendicular is the term used to describe two lines that are parallel
+  - Believes parallel is the term used to describe two lines at right angles
+  - When multiplying a decimal by an integer, ignores decimal point and just multiplies
+    the digits
+- source_sentence: "Question: Round numbers greater than 1 to one significant figure.\
+    \ Round this number to  1  significant figure:\n 400099.\n\nOptions:\nA. 400000\n\
+    B. 500000\nC. 400100\nD. 400099\n\nAnswer: 400100"
   sentences:
+  - When asked for a specific term in a sequence gives the term after
+  - Rounds up rather than to one significant figure
+  - Rounded to nearest 100 instead of 1sf
 ---
+# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 84f2bcc00d77236f9e89c8a360a00fb1139bf47d -->
+- **Maximum Sequence Length:** 384 tokens
 - **Output Dimensionality:** 768 tokens
 - **Similarity Function:** Cosine Similarity
+- **Training Dataset:**
+    - csv
 <!-- - **Language:** Unknown -->
 <!-- - **License:** Unknown -->
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
   (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
 )
 ```
 model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
 # Run inference
 sentences = [
+    'Question: Round numbers greater than 1 to one significant figure. Round this number to  1  significant figure:\n 400099.\n\nOptions:\nA. 400000\nB. 500000\nC. 400100\nD. 400099\n\nAnswer: 400100',
+    'Rounded to nearest 100 instead of 1sf',
+    'Rounds up rather than to one significant figure',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 ### Training Dataset
+#### csv
+* Dataset: csv
+* Size: 2,940 training samples
+* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                              | positive                                                                          | negative                                                                          |
+  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                              | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 33 tokens</li><li>mean: 89.65 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.71 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.04 tokens</li><li>max: 39 tokens</li></ul> |
 * Samples:
+  | anchor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | positive                                                                                                            | negative                                                                                           |
+  |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
+  | <code>Question: Add algebraic fractions where the denominators are single terms and are not multiples of each other. Express the following as a single fraction, writing your answer as simply as possible:  (t / s)+(2 s / t).<br><br>Options:<br>A. (t^2+4 s^2 / s t)<br>B. (t+2 s / s+t)<br>C. (2 s t / s+t)<br>D. (t^2+2 s^2 / s t)<br><br>Answer: (2 s t / s+t)</code>                                                                                                                                                                                                                             | <code>When adding/subtracting fractions, adds/subtracts the denominators and multiplies the numerators</code>       | <code>When adding fractions, adds the numerators and denominators</code>                           |
+  | <code>Question: Calculate the volume of a cone where the dimensions are all given in the same units. STEP  2 <br><br>Jessica is trying to work out the volume of this cone. A cone with the slant height labelled 9cm, the perpendicular height labelled h and half the cone's base (forming a right angled triangle with the slant and perpendicular heights) is labelled 6cm. First she needs the perpendicular height.<br><br>Which of the following equations is true?<br><br>Options:<br>A. h^2=9^2+6^2<br>B. h^2=9^2-6^2<br>C. h^2=12^2+9^2<br>D. h^2=12^2-9^2<br><br>Answer: h^2=12^2-9^2</code> | <code>When using Pythagoras to find the height of an isosceles triangle, uses the whole base instead of half</code> | <code>Has used slant height and base to find area rather than perpendicular height and base</code> |
+  | <code>Question: Convert from hours to minutes. 3  hours is the same as ___________ minutes.<br><br>Options:<br>A. 180<br>B. 90<br>C. 30<br>D. 300<br><br>Answer: 90</code>                                                                                                                                                                                                                                                                                                                                                                                                                              | <code>Thinks there are 30 minutes in a hour</code>                                                                  | <code>Answers as if there are 100 minutes in an hour when changing from hours to minutes</code>    |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
       "scale": 20.0,
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 24
+- `per_device_eval_batch_size`: 24
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.01
+- `num_train_epochs`: 20
+- `lr_scheduler_type`: cosine_with_restarts
+- `warmup_ratio`: 0.1
+- `fp16`: True
+- `load_best_model_at_end`: True
+- `batch_sampler`: no_duplicates
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 24
+- `per_device_eval_batch_size`: 24
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 20
 - `max_steps`: -1
+- `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
 - `warmup_steps`: 0
 - `log_level`: passive
 - `log_level_replica`: warning
 - `jit_mode_eval`: False
 - `use_ipex`: False
 - `bf16`: False
+- `fp16`: True
 - `fp16_opt_level`: O1
 - `half_precision_backend`: auto
 - `bf16_full_eval`: False
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
 - `label_names`: None
+- `load_best_model_at_end`: True
 - `ignore_data_skip`: False
 - `fsdp`: []
 - `fsdp_min_num_params`: 0
 - `batch_eval_metrics`: False
 - `eval_on_start`: False
 - `eval_use_gather_object`: False
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
 </details>
 ### Training Logs
+| Epoch   | Step    | Training Loss |
+|:-------:|:-------:|:-------------:|
+| 0.2581  | 16      | 3.3287        |
+| 0.5     | 31      | -             |
+| 0.5161  | 32      | 2.7886        |
+| 0.7742  | 48      | 2.4706        |
+| 1.0     | 62      | -             |
+| 1.0323  | 64      | 2.1136        |
+| 1.2903  | 80      | 2.0489        |
+| 1.5     | 93      | -             |
+| 1.5484  | 96      | 1.8572        |
+| 1.8065  | 112     | 1.6209        |
+| 2.0     | 124     | -             |
+| 2.0645  | 128     | 1.4044        |
+| 2.3226  | 144     | 1.4125        |
+| 2.5     | 155     | -             |
+| 2.5806  | 160     | 1.2445        |
+| 2.8387  | 176     | 1.1282        |
+| 3.0     | 186     | -             |
+| 3.0968  | 192     | 0.9416        |
+| 3.3548  | 208     | 0.9882        |
+| 3.5     | 217     | -             |
+| 3.6129  | 224     | 0.8752        |
+| 3.8710  | 240     | 0.7814        |
+| 4.0     | 248     | -             |
+| 4.1290  | 256     | 0.681         |
+| 4.3871  | 272     | 0.7641        |
+| 4.5     | 279     | -             |
+| 4.6452  | 288     | 0.6145        |
+| 4.9032  | 304     | 0.5826        |
+| 5.0     | 310     | -             |
+| 5.1613  | 320     | 0.5234        |
+| 5.4194  | 336     | 0.5709        |
+| 5.5     | 341     | -             |
+| 5.6774  | 352     | 0.4848        |
+| 5.9355  | 368     | 0.4474        |
+| 6.0     | 372     | -             |
+| 6.1935  | 384     | 0.4027        |
+| 6.4516  | 400     | 0.4644        |
+| **6.5** | **403** | **-**         |
+| 6.7097  | 416     | 0.3946        |
+| 6.9677  | 432     | 0.3325        |
+| 7.0     | 434     | -             |
+| 7.2258  | 448     | 0.3746        |
+| 7.4839  | 464     | 0.364         |
+| 7.5     | 465     | -             |
+* The bold row denotes the saved checkpoint.
 ### Framework Versions
+- Python: 3.10.14
+- Sentence Transformers: 3.1.1
+- Transformers: 4.44.0
+- PyTorch: 2.4.0
+- Accelerate: 0.33.0
 - Datasets: 2.19.2
 - Tokenizers: 0.19.1
 }
 ```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
 <!--
 ## Glossary

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "microsoft/mpnet-base",
   "architectures": [
     "MPNetModel"
   ],
@@ -19,6 +19,6 @@
   "pad_token_id": 1,
   "relative_attention_num_buckets": 32,
   "torch_dtype": "float32",
-  "transformers_version": "4.44.2",
   "vocab_size": 30527
 }

 {
+  "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
   "architectures": [
     "MPNetModel"
   ],
   "pad_token_id": 1,
   "relative_attention_num_buckets": 32,
   "torch_dtype": "float32",
+  "transformers_version": "4.44.0",
   "vocab_size": 30527
 }

config_sentence_transformers.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "__version__": {
-    "sentence_transformers": "3.1.0",
-    "transformers": "4.44.2",
-    "pytorch": "2.4.0+cu121"
   },
   "prompts": {},
   "default_prompt_name": null,

 {
   "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.44.0",
+    "pytorch": "2.4.0"
   },
   "prompts": {},
   "default_prompt_name": null,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:57c233bb7d3527046e1742fea21d0da344f902a338c8871519dc244e97cbc679
 size 437967672

 version https://git-lfs.github.com/spec/v1
+oid sha256:7f34aac67b9b829bf1273699e6d541f4a1b3ada25c6db5220d352bd64abf40f2
 size 437967672

modules.json CHANGED Viewed

@@ -10,5 +10,11 @@
     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
   }
 ]

     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
   }
 ]

sentence_bert_config.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
-  "max_seq_length": 512,
   "do_lower_case": false
 }

 {
+  "max_seq_length": 384,
   "do_lower_case": false
 }

special_tokens_map.json CHANGED Viewed

@@ -9,7 +9,7 @@
   "cls_token": {
     "content": "<s>",
     "lstrip": false,
-    "normalized": true,
     "rstrip": false,
     "single_word": false
   },
@@ -37,7 +37,7 @@
   "sep_token": {
     "content": "</s>",
     "lstrip": false,
-    "normalized": true,
     "rstrip": false,
     "single_word": false
   },

   "cls_token": {
     "content": "<s>",
     "lstrip": false,
+    "normalized": false,
     "rstrip": false,
     "single_word": false
   },
   "sep_token": {
     "content": "</s>",
     "lstrip": false,
+    "normalized": false,
     "rstrip": false,
     "single_word": false
   },

tokenizer.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "version": "1.0",
   "truncation": {
     "direction": "Right",
-    "max_length": 512,
     "strategy": "LongestFirst",
     "stride": 0
   },

   "version": "1.0",
   "truncation": {
     "direction": "Right",
+    "max_length": 384,
     "strategy": "LongestFirst",
     "stride": 0
   },

tokenizer_config.json CHANGED Viewed

@@ -55,11 +55,18 @@
   "do_lower_case": true,
   "eos_token": "</s>",
   "mask_token": "<mask>",
-  "model_max_length": 512,
   "pad_token": "<pad>",
   "sep_token": "</s>",
   "strip_accents": null,
   "tokenize_chinese_chars": true,
   "tokenizer_class": "MPNetTokenizer",
   "unk_token": "[UNK]"
 }

   "do_lower_case": true,
   "eos_token": "</s>",
   "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 384,
+  "pad_to_multiple_of": null,
   "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
   "sep_token": "</s>",
+  "stride": 0,
   "strip_accents": null,
   "tokenize_chinese_chars": true,
   "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
   "unk_token": "[UNK]"
 }