Gurveer05 commited on
Commit
4d23a8f
·
verified ·
1 Parent(s): 98b3558

Add new SentenceTransformer model.

Browse files
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: microsoft/mpnet-base
3
  library_name: sentence-transformers
4
  pipeline_tag: sentence-similarity
5
  tags:
@@ -7,62 +7,126 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
- - dataset_size:3059
11
- - loss:MultipleNegativesSymmetricRankingLoss
12
  widget:
13
- - source_sentence: Solve length problems involving similar shapes where the missing
14
- length is the shorter one Rectangle P has been enlarged by a scale factor of 1.5
15
- to give rectangle Q. What length should replace the star? ![Two rectangles, the
16
- smaller labelled P and the larger labelled Q. Q has width 9cm and length 12cm.
17
- The length of P is marked with a star.]() 9 cm
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  sentences:
19
- - Does not recognise the corresponding sides in similar shapes or enlargements
20
- - Does not know what a cube number is
21
- - When solving a problem that requires an inverse operation (e.g. missing number
22
- problems), does the original operation
23
- - source_sentence: Recognise that the diameter is twice the radius If the diameter
24
- of a circle is 5.4 cm, the radius is... 10.8 cm
 
 
 
 
 
 
 
 
 
 
 
 
25
  sentences:
26
- - Believes you can add or subtract from inside brackets without expanding when solving
27
- an equation
28
- - Does not understand that shapes are congruent if they have the same size and shape
29
- - Doubles the diameter when finding the radius
30
- - source_sentence: 'Multiply proper fractions in the form: Fraction × Fraction Calculate:
31
- 1/9×1/5 1/14'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  sentences:
33
- - Believes the mode is the most common frequency rather than the highest frequency
34
- - When multiplying fractions, multiplies the numerator and adds the denominator
35
- - Converts a fraction to a decimal by using only the numerator after the decimal
36
- point
37
- - source_sentence: Find missing angles using angles around a point What is the size
38
- of angle x ? ![Angles around a point split into two parts, one is labelled 290
39
- degrees and the other x]() 45^∘
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  sentences:
41
- - Does not know that angles around a point sum to 360
42
- - Does not know how to find the next term in a sequence
43
- - Added the values together instead of finding the percentage
44
- - source_sentence: Given a positive x value, find the corresponding y value for reciprocal
45
- graphs This is a part of the table of values for the equation y=3/x x 3 y What
46
- should replace the star? 0
 
47
  sentences:
48
- - Believes that a fraction with equal numerator and denominator cancels to 0
49
- - Mixes up squaring and multiplying by 2 or doubling
50
- - Gives the vertex instead of the 3-letter angle notation
51
  ---
52
 
53
- # SentenceTransformer based on microsoft/mpnet-base
54
 
55
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
56
 
57
  ## Model Details
58
 
59
  ### Model Description
60
  - **Model Type:** Sentence Transformer
61
- - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
62
- - **Maximum Sequence Length:** 512 tokens
63
  - **Output Dimensionality:** 768 tokens
64
  - **Similarity Function:** Cosine Similarity
65
- <!-- - **Training Dataset:** Unknown -->
 
66
  <!-- - **Language:** Unknown -->
67
  <!-- - **License:** Unknown -->
68
 
@@ -76,8 +140,9 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
76
 
77
  ```
78
  SentenceTransformer(
79
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
80
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
 
81
  )
82
  ```
83
 
@@ -99,9 +164,9 @@ from sentence_transformers import SentenceTransformer
99
  model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
100
  # Run inference
101
  sentences = [
102
- 'Given a positive x value, find the corresponding y value for reciprocal graphs This is a part of the table of values for the equation y=3/x x 3 y What should replace the star? 0',
103
- 'Believes that a fraction with equal numerator and denominator cancels to 0',
104
- 'Mixes up squaring and multiplying by 2 or doubling',
105
  ]
106
  embeddings = model.encode(sentences)
107
  print(embeddings.shape)
@@ -153,23 +218,23 @@ You can finetune this model on your own dataset.
153
 
154
  ### Training Dataset
155
 
156
- #### Unnamed Dataset
157
-
158
 
159
- * Size: 3,059 training samples
160
- * Columns: <code>sentence_0</code> and <code>sentence_1</code>
 
161
  * Approximate statistics based on the first 1000 samples:
162
- | | sentence_0 | sentence_1 |
163
- |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
164
- | type | string | string |
165
- | details | <ul><li>min: 13 tokens</li><li>mean: 56.34 tokens</li><li>max: 275 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.83 tokens</li><li>max: 39 tokens</li></ul> |
166
  * Samples:
167
- | sentence_0 | sentence_1 |
168
- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
169
- | <code>Simplify algebraic expressions to maintain equivalence by collecting like terms involving just one linear variable 3(4 x+6)-2(x-9) A x+B What is the value of A ? 3</code> | <code>Only multiplies the numerical terms when expanding a bracket</code> |
170
- | <code>Express pictorial representations of objects as a ratio ![A group of 8 squares and 5 circles]() What is the ratio of squares to circles? 8: 13</code> | <code>When writing ratio from diagram, writes total as one side</code> |
171
- | <code>Find 100 less than a given number What number is 100 less than 325,076 ? 3250.76</code> | <code>Divides rather than subtracts when given the command 'less than'</code> |
172
- * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
173
  ```json
174
  {
175
  "scale": 20.0,
@@ -181,9 +246,16 @@ You can finetune this model on your own dataset.
181
  #### Non-Default Hyperparameters
182
 
183
  - `eval_strategy`: steps
184
- - `per_device_train_batch_size`: 16
185
- - `per_device_eval_batch_size`: 16
186
- - `multi_dataset_batch_sampler`: round_robin
 
 
 
 
 
 
 
187
 
188
  #### All Hyperparameters
189
  <details><summary>Click to expand</summary>
@@ -192,24 +264,24 @@ You can finetune this model on your own dataset.
192
  - `do_predict`: False
193
  - `eval_strategy`: steps
194
  - `prediction_loss_only`: True
195
- - `per_device_train_batch_size`: 16
196
- - `per_device_eval_batch_size`: 16
197
  - `per_gpu_train_batch_size`: None
198
  - `per_gpu_eval_batch_size`: None
199
  - `gradient_accumulation_steps`: 1
200
  - `eval_accumulation_steps`: None
201
  - `torch_empty_cache_steps`: None
202
- - `learning_rate`: 5e-05
203
- - `weight_decay`: 0.0
204
  - `adam_beta1`: 0.9
205
  - `adam_beta2`: 0.999
206
  - `adam_epsilon`: 1e-08
207
- - `max_grad_norm`: 1
208
- - `num_train_epochs`: 3
209
  - `max_steps`: -1
210
- - `lr_scheduler_type`: linear
211
  - `lr_scheduler_kwargs`: {}
212
- - `warmup_ratio`: 0.0
213
  - `warmup_steps`: 0
214
  - `log_level`: passive
215
  - `log_level_replica`: warning
@@ -227,7 +299,7 @@ You can finetune this model on your own dataset.
227
  - `jit_mode_eval`: False
228
  - `use_ipex`: False
229
  - `bf16`: False
230
- - `fp16`: False
231
  - `fp16_opt_level`: O1
232
  - `half_precision_backend`: auto
233
  - `bf16_full_eval`: False
@@ -245,7 +317,7 @@ You can finetune this model on your own dataset.
245
  - `disable_tqdm`: False
246
  - `remove_unused_columns`: True
247
  - `label_names`: None
248
- - `load_best_model_at_end`: False
249
  - `ignore_data_skip`: False
250
  - `fsdp`: []
251
  - `fsdp_min_num_params`: 0
@@ -297,29 +369,67 @@ You can finetune this model on your own dataset.
297
  - `batch_eval_metrics`: False
298
  - `eval_on_start`: False
299
  - `eval_use_gather_object`: False
300
- - `batch_sampler`: batch_sampler
301
- - `multi_dataset_batch_sampler`: round_robin
302
 
303
  </details>
304
 
305
  ### Training Logs
306
- | Epoch | Step | Training Loss |
307
- |:------:|:----:|:-------------:|
308
- | 0.5 | 96 | - |
309
- | 1.0 | 192 | - |
310
- | 1.5 | 288 | - |
311
- | 2.0 | 384 | - |
312
- | 2.5 | 480 | - |
313
- | 2.6042 | 500 | 0.8125 |
314
- | 3.0 | 576 | - |
315
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316
 
317
  ### Framework Versions
318
- - Python: 3.10.12
319
- - Sentence Transformers: 3.1.0
320
- - Transformers: 4.44.2
321
- - PyTorch: 2.4.0+cu121
322
- - Accelerate: 0.34.2
323
  - Datasets: 2.19.2
324
  - Tokenizers: 0.19.1
325
 
@@ -340,6 +450,18 @@ You can finetune this model on your own dataset.
340
  }
341
  ```
342
 
 
 
 
 
 
 
 
 
 
 
 
 
343
  <!--
344
  ## Glossary
345
 
 
1
  ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
  library_name: sentence-transformers
4
  pipeline_tag: sentence-similarity
5
  tags:
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
+ - dataset_size:2940
11
+ - loss:MultipleNegativesRankingLoss
12
  widget:
13
+ - source_sentence: 'Question: Write algebraic expressions with correct algebraic convention
14
+ involving powers. Simplify, if possible:
15
+
16
+ (
17
+
18
+ a^2 x a
19
+
20
+ ).
21
+
22
+
23
+ Options:
24
+
25
+ A. 2 a^2
26
+
27
+ B. 3 a
28
+
29
+ C. a^3
30
+
31
+ D. Does not simplify
32
+
33
+
34
+ Answer: Does not simplify'
35
  sentences:
36
+ - Does not understand power notation
37
+ - Does not understand how to multiply algebraic terms
38
+ - Adds instead of multiplying when expanding bracket
39
+ - source_sentence: 'Question: Recognise other roots of numbers. 4th root of (16)=?
40
+
41
+
42
+ Options:
43
+
44
+ A. 64
45
+
46
+ B. 16
47
+
48
+ C. 4
49
+
50
+ D. 2
51
+
52
+
53
+ Answer: 16'
54
  sentences:
55
+ - Believes the decimal point button writes a fraction
56
+ - Thinks that square root is found by dividing by 4
57
+ - Does not understand the root power of 4
58
+ - source_sentence: 'Question: Add algebraic fractions with the same denominator. Write
59
+ this as a single fraction as simply as possible
60
+
61
+ (
62
+
63
+ (2 / x)+(3 / x)
64
+
65
+ ).
66
+
67
+
68
+ Options:
69
+
70
+ A. (5 x / x^2)
71
+
72
+ B. (5 / x)
73
+
74
+ C. (5 / 2 x)
75
+
76
+ D. (6 / x^2)
77
+
78
+
79
+ Answer: (5 / 2 x)'
80
  sentences:
81
+ - When adding fractions with identical numerators, leaves the numerator and adds
82
+ the denominators
83
+ - When there are two modes, finds the mean of these values and gives that as the
84
+ mode
85
+ - When adding fractions, adds the numerators and denominators
86
+ - source_sentence: 'Question: Recognise perpendicular lines. These two lines are ...
87
+ Two lines on a graph meeting at a right angle.
88
+
89
+
90
+ Options:
91
+
92
+ A. parallelogram
93
+
94
+ B. perpendicular
95
+
96
+ C. parallel
97
+
98
+ D. particular
99
+
100
+
101
+ Answer: parallel'
102
  sentences:
103
+ - Believes perpendicular is the term used to describe two lines that are parallel
104
+ - Believes parallel is the term used to describe two lines at right angles
105
+ - When multiplying a decimal by an integer, ignores decimal point and just multiplies
106
+ the digits
107
+ - source_sentence: "Question: Round numbers greater than 1 to one significant figure.\
108
+ \ Round this number to 1 significant figure:\n 400099.\n\nOptions:\nA. 400000\n\
109
+ B. 500000\nC. 400100\nD. 400099\n\nAnswer: 400100"
110
  sentences:
111
+ - When asked for a specific term in a sequence gives the term after
112
+ - Rounds up rather than to one significant figure
113
+ - Rounded to nearest 100 instead of 1sf
114
  ---
115
 
116
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
117
 
118
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
119
 
120
  ## Model Details
121
 
122
  ### Model Description
123
  - **Model Type:** Sentence Transformer
124
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 84f2bcc00d77236f9e89c8a360a00fb1139bf47d -->
125
+ - **Maximum Sequence Length:** 384 tokens
126
  - **Output Dimensionality:** 768 tokens
127
  - **Similarity Function:** Cosine Similarity
128
+ - **Training Dataset:**
129
+ - csv
130
  <!-- - **Language:** Unknown -->
131
  <!-- - **License:** Unknown -->
132
 
 
140
 
141
  ```
142
  SentenceTransformer(
143
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
144
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
145
+ (2): Normalize()
146
  )
147
  ```
148
 
 
164
  model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
165
  # Run inference
166
  sentences = [
167
+ 'Question: Round numbers greater than 1 to one significant figure. Round this number to 1 significant figure:\n 400099.\n\nOptions:\nA. 400000\nB. 500000\nC. 400100\nD. 400099\n\nAnswer: 400100',
168
+ 'Rounded to nearest 100 instead of 1sf',
169
+ 'Rounds up rather than to one significant figure',
170
  ]
171
  embeddings = model.encode(sentences)
172
  print(embeddings.shape)
 
218
 
219
  ### Training Dataset
220
 
221
+ #### csv
 
222
 
223
+ * Dataset: csv
224
+ * Size: 2,940 training samples
225
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
226
  * Approximate statistics based on the first 1000 samples:
227
+ | | anchor | positive | negative |
228
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
229
+ | type | string | string | string |
230
+ | details | <ul><li>min: 33 tokens</li><li>mean: 89.65 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.71 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.04 tokens</li><li>max: 39 tokens</li></ul> |
231
  * Samples:
232
+ | anchor | positive | negative |
233
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
234
+ | <code>Question: Add algebraic fractions where the denominators are single terms and are not multiples of each other. Express the following as a single fraction, writing your answer as simply as possible: (t / s)+(2 s / t).<br><br>Options:<br>A. (t^2+4 s^2 / s t)<br>B. (t+2 s / s+t)<br>C. (2 s t / s+t)<br>D. (t^2+2 s^2 / s t)<br><br>Answer: (2 s t / s+t)</code> | <code>When adding/subtracting fractions, adds/subtracts the denominators and multiplies the numerators</code> | <code>When adding fractions, adds the numerators and denominators</code> |
235
+ | <code>Question: Calculate the volume of a cone where the dimensions are all given in the same units. STEP 2 <br><br>Jessica is trying to work out the volume of this cone. A cone with the slant height labelled 9cm, the perpendicular height labelled h and half the cone's base (forming a right angled triangle with the slant and perpendicular heights) is labelled 6cm. First she needs the perpendicular height.<br><br>Which of the following equations is true?<br><br>Options:<br>A. h^2=9^2+6^2<br>B. h^2=9^2-6^2<br>C. h^2=12^2+9^2<br>D. h^2=12^2-9^2<br><br>Answer: h^2=12^2-9^2</code> | <code>When using Pythagoras to find the height of an isosceles triangle, uses the whole base instead of half</code> | <code>Has used slant height and base to find area rather than perpendicular height and base</code> |
236
+ | <code>Question: Convert from hours to minutes. 3 hours is the same as ___________ minutes.<br><br>Options:<br>A. 180<br>B. 90<br>C. 30<br>D. 300<br><br>Answer: 90</code> | <code>Thinks there are 30 minutes in a hour</code> | <code>Answers as if there are 100 minutes in an hour when changing from hours to minutes</code> |
237
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
238
  ```json
239
  {
240
  "scale": 20.0,
 
246
  #### Non-Default Hyperparameters
247
 
248
  - `eval_strategy`: steps
249
+ - `per_device_train_batch_size`: 24
250
+ - `per_device_eval_batch_size`: 24
251
+ - `learning_rate`: 2e-05
252
+ - `weight_decay`: 0.01
253
+ - `num_train_epochs`: 20
254
+ - `lr_scheduler_type`: cosine_with_restarts
255
+ - `warmup_ratio`: 0.1
256
+ - `fp16`: True
257
+ - `load_best_model_at_end`: True
258
+ - `batch_sampler`: no_duplicates
259
 
260
  #### All Hyperparameters
261
  <details><summary>Click to expand</summary>
 
264
  - `do_predict`: False
265
  - `eval_strategy`: steps
266
  - `prediction_loss_only`: True
267
+ - `per_device_train_batch_size`: 24
268
+ - `per_device_eval_batch_size`: 24
269
  - `per_gpu_train_batch_size`: None
270
  - `per_gpu_eval_batch_size`: None
271
  - `gradient_accumulation_steps`: 1
272
  - `eval_accumulation_steps`: None
273
  - `torch_empty_cache_steps`: None
274
+ - `learning_rate`: 2e-05
275
+ - `weight_decay`: 0.01
276
  - `adam_beta1`: 0.9
277
  - `adam_beta2`: 0.999
278
  - `adam_epsilon`: 1e-08
279
+ - `max_grad_norm`: 1.0
280
+ - `num_train_epochs`: 20
281
  - `max_steps`: -1
282
+ - `lr_scheduler_type`: cosine_with_restarts
283
  - `lr_scheduler_kwargs`: {}
284
+ - `warmup_ratio`: 0.1
285
  - `warmup_steps`: 0
286
  - `log_level`: passive
287
  - `log_level_replica`: warning
 
299
  - `jit_mode_eval`: False
300
  - `use_ipex`: False
301
  - `bf16`: False
302
+ - `fp16`: True
303
  - `fp16_opt_level`: O1
304
  - `half_precision_backend`: auto
305
  - `bf16_full_eval`: False
 
317
  - `disable_tqdm`: False
318
  - `remove_unused_columns`: True
319
  - `label_names`: None
320
+ - `load_best_model_at_end`: True
321
  - `ignore_data_skip`: False
322
  - `fsdp`: []
323
  - `fsdp_min_num_params`: 0
 
369
  - `batch_eval_metrics`: False
370
  - `eval_on_start`: False
371
  - `eval_use_gather_object`: False
372
+ - `batch_sampler`: no_duplicates
373
+ - `multi_dataset_batch_sampler`: proportional
374
 
375
  </details>
376
 
377
  ### Training Logs
378
+ | Epoch | Step | Training Loss |
379
+ |:-------:|:-------:|:-------------:|
380
+ | 0.2581 | 16 | 3.3287 |
381
+ | 0.5 | 31 | - |
382
+ | 0.5161 | 32 | 2.7886 |
383
+ | 0.7742 | 48 | 2.4706 |
384
+ | 1.0 | 62 | - |
385
+ | 1.0323 | 64 | 2.1136 |
386
+ | 1.2903 | 80 | 2.0489 |
387
+ | 1.5 | 93 | - |
388
+ | 1.5484 | 96 | 1.8572 |
389
+ | 1.8065 | 112 | 1.6209 |
390
+ | 2.0 | 124 | - |
391
+ | 2.0645 | 128 | 1.4044 |
392
+ | 2.3226 | 144 | 1.4125 |
393
+ | 2.5 | 155 | - |
394
+ | 2.5806 | 160 | 1.2445 |
395
+ | 2.8387 | 176 | 1.1282 |
396
+ | 3.0 | 186 | - |
397
+ | 3.0968 | 192 | 0.9416 |
398
+ | 3.3548 | 208 | 0.9882 |
399
+ | 3.5 | 217 | - |
400
+ | 3.6129 | 224 | 0.8752 |
401
+ | 3.8710 | 240 | 0.7814 |
402
+ | 4.0 | 248 | - |
403
+ | 4.1290 | 256 | 0.681 |
404
+ | 4.3871 | 272 | 0.7641 |
405
+ | 4.5 | 279 | - |
406
+ | 4.6452 | 288 | 0.6145 |
407
+ | 4.9032 | 304 | 0.5826 |
408
+ | 5.0 | 310 | - |
409
+ | 5.1613 | 320 | 0.5234 |
410
+ | 5.4194 | 336 | 0.5709 |
411
+ | 5.5 | 341 | - |
412
+ | 5.6774 | 352 | 0.4848 |
413
+ | 5.9355 | 368 | 0.4474 |
414
+ | 6.0 | 372 | - |
415
+ | 6.1935 | 384 | 0.4027 |
416
+ | 6.4516 | 400 | 0.4644 |
417
+ | **6.5** | **403** | **-** |
418
+ | 6.7097 | 416 | 0.3946 |
419
+ | 6.9677 | 432 | 0.3325 |
420
+ | 7.0 | 434 | - |
421
+ | 7.2258 | 448 | 0.3746 |
422
+ | 7.4839 | 464 | 0.364 |
423
+ | 7.5 | 465 | - |
424
+
425
+ * The bold row denotes the saved checkpoint.
426
 
427
  ### Framework Versions
428
+ - Python: 3.10.14
429
+ - Sentence Transformers: 3.1.1
430
+ - Transformers: 4.44.0
431
+ - PyTorch: 2.4.0
432
+ - Accelerate: 0.33.0
433
  - Datasets: 2.19.2
434
  - Tokenizers: 0.19.1
435
 
 
450
  }
451
  ```
452
 
453
+ #### MultipleNegativesRankingLoss
454
+ ```bibtex
455
+ @misc{henderson2017efficient,
456
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
457
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
458
+ year={2017},
459
+ eprint={1705.00652},
460
+ archivePrefix={arXiv},
461
+ primaryClass={cs.CL}
462
+ }
463
+ ```
464
+
465
  <!--
466
  ## Glossary
467
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "microsoft/mpnet-base",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
@@ -19,6 +19,6 @@
19
  "pad_token_id": 1,
20
  "relative_attention_num_buckets": 32,
21
  "torch_dtype": "float32",
22
- "transformers_version": "4.44.2",
23
  "vocab_size": 30527
24
  }
 
1
  {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
  "architectures": [
4
  "MPNetModel"
5
  ],
 
19
  "pad_token_id": 1,
20
  "relative_attention_num_buckets": 32,
21
  "torch_dtype": "float32",
22
+ "transformers_version": "4.44.0",
23
  "vocab_size": 30527
24
  }
config_sentence_transformers.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "__version__": {
3
- "sentence_transformers": "3.1.0",
4
- "transformers": "4.44.2",
5
- "pytorch": "2.4.0+cu121"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
 
1
  {
2
  "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57c233bb7d3527046e1742fea21d0da344f902a338c8871519dc244e97cbc679
3
  size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f34aac67b9b829bf1273699e6d541f4a1b3ada25c6db5220d352bd64abf40f2
3
  size 437967672
modules.json CHANGED
@@ -10,5 +10,11 @@
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
 
 
 
 
 
 
13
  }
14
  ]
 
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
  }
20
  ]
sentence_bert_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "max_seq_length": 512,
3
  "do_lower_case": false
4
  }
 
1
  {
2
+ "max_seq_length": 384,
3
  "do_lower_case": false
4
  }
special_tokens_map.json CHANGED
@@ -9,7 +9,7 @@
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
- "normalized": true,
13
  "rstrip": false,
14
  "single_word": false
15
  },
@@ -37,7 +37,7 @@
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
- "normalized": true,
41
  "rstrip": false,
42
  "single_word": false
43
  },
 
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
+ "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
 
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
+ "normalized": false,
41
  "rstrip": false,
42
  "single_word": false
43
  },
tokenizer.json CHANGED
@@ -2,7 +2,7 @@
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
- "max_length": 512,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
 
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
+ "max_length": 384,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
tokenizer_config.json CHANGED
@@ -55,11 +55,18 @@
55
  "do_lower_case": true,
56
  "eos_token": "</s>",
57
  "mask_token": "<mask>",
58
- "model_max_length": 512,
 
 
59
  "pad_token": "<pad>",
 
 
60
  "sep_token": "</s>",
 
61
  "strip_accents": null,
62
  "tokenize_chinese_chars": true,
63
  "tokenizer_class": "MPNetTokenizer",
 
 
64
  "unk_token": "[UNK]"
65
  }
 
55
  "do_lower_case": true,
56
  "eos_token": "</s>",
57
  "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
  "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
  "sep_token": "</s>",
65
+ "stride": 0,
66
  "strip_accents": null,
67
  "tokenize_chinese_chars": true,
68
  "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
  "unk_token": "[UNK]"
72
  }