bobox commited on
Commit
24c4f80
·
verified ·
1 Parent(s): 518eada

Training in progress, step 718, checkpoint

Browse files
checkpoint-718/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1536,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-718/README.md ADDED
@@ -0,0 +1,1168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BXresearch/DeBERTa2-0.9B-ST-v2
3
+ datasets:
4
+ - sentence-transformers/stsb
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ metrics:
9
+ - pearson_cosine
10
+ - spearman_cosine
11
+ - pearson_manhattan
12
+ - spearman_manhattan
13
+ - pearson_euclidean
14
+ - spearman_euclidean
15
+ - pearson_dot
16
+ - spearman_dot
17
+ - pearson_max
18
+ - spearman_max
19
+ - cosine_accuracy
20
+ - cosine_accuracy_threshold
21
+ - cosine_f1
22
+ - cosine_f1_threshold
23
+ - cosine_precision
24
+ - cosine_recall
25
+ - cosine_ap
26
+ - dot_accuracy
27
+ - dot_accuracy_threshold
28
+ - dot_f1
29
+ - dot_f1_threshold
30
+ - dot_precision
31
+ - dot_recall
32
+ - dot_ap
33
+ - manhattan_accuracy
34
+ - manhattan_accuracy_threshold
35
+ - manhattan_f1
36
+ - manhattan_f1_threshold
37
+ - manhattan_precision
38
+ - manhattan_recall
39
+ - manhattan_ap
40
+ - euclidean_accuracy
41
+ - euclidean_accuracy_threshold
42
+ - euclidean_f1
43
+ - euclidean_f1_threshold
44
+ - euclidean_precision
45
+ - euclidean_recall
46
+ - euclidean_ap
47
+ - max_accuracy
48
+ - max_accuracy_threshold
49
+ - max_f1
50
+ - max_f1_threshold
51
+ - max_precision
52
+ - max_recall
53
+ - max_ap
54
+ pipeline_tag: sentence-similarity
55
+ tags:
56
+ - sentence-transformers
57
+ - sentence-similarity
58
+ - feature-extraction
59
+ - generated_from_trainer
60
+ - dataset_size:5749
61
+ - loss:AnglELoss
62
+ widget:
63
+ - source_sentence: Left side of a silver train engine.
64
+ sentences:
65
+ - A close-up of a black train engine.
66
+ - Two boys are in midair jumping into an inground pool.
67
+ - An older Asian couple poses with a newborn baby at the dinner table.
68
+ - source_sentence: Four girls in swimsuits are playing volleyball at the beach.
69
+ sentences:
70
+ - A little girl is walking down a hallway.
71
+ - The man is erasing the chalk board.
72
+ - Four women in bikinis are playing volleyball on the beach.
73
+ - source_sentence: A woman is cooking meat.
74
+ sentences:
75
+ - The dogs are alone in the forest.
76
+ - A man is speaking.
77
+ - A dog jumps through a hoop.
78
+ - source_sentence: A person is folding a square paper piece.
79
+ sentences:
80
+ - A woman is carrying her baby.
81
+ - A person folds a piece of paper.
82
+ - A dog is trying to get through his dog door.
83
+ - source_sentence: The boy is playing the piano.
84
+ sentences:
85
+ - The woman is pouring oil into the pan.
86
+ - A small black and white dog is swimming in water.
87
+ - Two brown dogs are playing with each other in the snow.
88
+ model-index:
89
+ - name: SentenceTransformer based on BXresearch/DeBERTa2-0.9B-ST-v2
90
+ results:
91
+ - task:
92
+ type: semantic-similarity
93
+ name: Semantic Similarity
94
+ dataset:
95
+ name: sts test
96
+ type: sts-test
97
+ metrics:
98
+ - type: pearson_cosine
99
+ value: 0.9166868414572735
100
+ name: Pearson Cosine
101
+ - type: spearman_cosine
102
+ value: 0.9288517457757245
103
+ name: Spearman Cosine
104
+ - type: pearson_manhattan
105
+ value: 0.9280078114609798
106
+ name: Pearson Manhattan
107
+ - type: spearman_manhattan
108
+ value: 0.9298778921764931
109
+ name: Spearman Manhattan
110
+ - type: pearson_euclidean
111
+ value: 0.9284572778345226
112
+ name: Pearson Euclidean
113
+ - type: spearman_euclidean
114
+ value: 0.9303868482609214
115
+ name: Spearman Euclidean
116
+ - type: pearson_dot
117
+ value: 0.8999594910673557
118
+ name: Pearson Dot
119
+ - type: spearman_dot
120
+ value: 0.9052810779226765
121
+ name: Spearman Dot
122
+ - type: pearson_max
123
+ value: 0.9284572778345226
124
+ name: Pearson Max
125
+ - type: spearman_max
126
+ value: 0.9303868482609214
127
+ name: Spearman Max
128
+ - task:
129
+ type: binary-classification
130
+ name: Binary Classification
131
+ dataset:
132
+ name: allNLI dev
133
+ type: allNLI-dev
134
+ metrics:
135
+ - type: cosine_accuracy
136
+ value: 0.7578125
137
+ name: Cosine Accuracy
138
+ - type: cosine_accuracy_threshold
139
+ value: 0.7991553544998169
140
+ name: Cosine Accuracy Threshold
141
+ - type: cosine_f1
142
+ value: 0.6228070175438596
143
+ name: Cosine F1
144
+ - type: cosine_f1_threshold
145
+ value: 0.6627379059791565
146
+ name: Cosine F1 Threshold
147
+ - type: cosine_precision
148
+ value: 0.48299319727891155
149
+ name: Cosine Precision
150
+ - type: cosine_recall
151
+ value: 0.8765432098765432
152
+ name: Cosine Recall
153
+ - type: cosine_ap
154
+ value: 0.5935429428248785
155
+ name: Cosine Ap
156
+ - type: dot_accuracy
157
+ value: 0.75
158
+ name: Dot Accuracy
159
+ - type: dot_accuracy_threshold
160
+ value: 843.883544921875
161
+ name: Dot Accuracy Threshold
162
+ - type: dot_f1
163
+ value: 0.6079295154185022
164
+ name: Dot F1
165
+ - type: dot_f1_threshold
166
+ value: 691.1240234375
167
+ name: Dot F1 Threshold
168
+ - type: dot_precision
169
+ value: 0.4726027397260274
170
+ name: Dot Precision
171
+ - type: dot_recall
172
+ value: 0.8518518518518519
173
+ name: Dot Recall
174
+ - type: dot_ap
175
+ value: 0.5773443196804586
176
+ name: Dot Ap
177
+ - type: manhattan_accuracy
178
+ value: 0.75390625
179
+ name: Manhattan Accuracy
180
+ - type: manhattan_accuracy_threshold
181
+ value: 638.1295166015625
182
+ name: Manhattan Accuracy Threshold
183
+ - type: manhattan_f1
184
+ value: 0.6244343891402715
185
+ name: Manhattan F1
186
+ - type: manhattan_f1_threshold
187
+ value: 805.15966796875
188
+ name: Manhattan F1 Threshold
189
+ - type: manhattan_precision
190
+ value: 0.4928571428571429
191
+ name: Manhattan Precision
192
+ - type: manhattan_recall
193
+ value: 0.8518518518518519
194
+ name: Manhattan Recall
195
+ - type: manhattan_ap
196
+ value: 0.5966657639990006
197
+ name: Manhattan Ap
198
+ - type: euclidean_accuracy
199
+ value: 0.7578125
200
+ name: Euclidean Accuracy
201
+ - type: euclidean_accuracy_threshold
202
+ value: 20.63375473022461
203
+ name: Euclidean Accuracy Threshold
204
+ - type: euclidean_f1
205
+ value: 0.6272727272727272
206
+ name: Euclidean F1
207
+ - type: euclidean_f1_threshold
208
+ value: 25.803028106689453
209
+ name: Euclidean F1 Threshold
210
+ - type: euclidean_precision
211
+ value: 0.49640287769784175
212
+ name: Euclidean Precision
213
+ - type: euclidean_recall
214
+ value: 0.8518518518518519
215
+ name: Euclidean Recall
216
+ - type: euclidean_ap
217
+ value: 0.5937801624670859
218
+ name: Euclidean Ap
219
+ - type: max_accuracy
220
+ value: 0.7578125
221
+ name: Max Accuracy
222
+ - type: max_accuracy_threshold
223
+ value: 843.883544921875
224
+ name: Max Accuracy Threshold
225
+ - type: max_f1
226
+ value: 0.6272727272727272
227
+ name: Max F1
228
+ - type: max_f1_threshold
229
+ value: 805.15966796875
230
+ name: Max F1 Threshold
231
+ - type: max_precision
232
+ value: 0.49640287769784175
233
+ name: Max Precision
234
+ - type: max_recall
235
+ value: 0.8765432098765432
236
+ name: Max Recall
237
+ - type: max_ap
238
+ value: 0.5966657639990006
239
+ name: Max Ap
240
+ - task:
241
+ type: binary-classification
242
+ name: Binary Classification
243
+ dataset:
244
+ name: Qnli dev
245
+ type: Qnli-dev
246
+ metrics:
247
+ - type: cosine_accuracy
248
+ value: 0.72265625
249
+ name: Cosine Accuracy
250
+ - type: cosine_accuracy_threshold
251
+ value: 0.6946593523025513
252
+ name: Cosine Accuracy Threshold
253
+ - type: cosine_f1
254
+ value: 0.7279693486590038
255
+ name: Cosine F1
256
+ - type: cosine_f1_threshold
257
+ value: 0.6946593523025513
258
+ name: Cosine F1 Threshold
259
+ - type: cosine_precision
260
+ value: 0.6884057971014492
261
+ name: Cosine Precision
262
+ - type: cosine_recall
263
+ value: 0.7723577235772358
264
+ name: Cosine Recall
265
+ - type: cosine_ap
266
+ value: 0.7556080779923782
267
+ name: Cosine Ap
268
+ - type: dot_accuracy
269
+ value: 0.6953125
270
+ name: Dot Accuracy
271
+ - type: dot_accuracy_threshold
272
+ value: 729.351806640625
273
+ name: Dot Accuracy Threshold
274
+ - type: dot_f1
275
+ value: 0.7050847457627119
276
+ name: Dot F1
277
+ - type: dot_f1_threshold
278
+ value: 687.4432983398438
279
+ name: Dot F1 Threshold
280
+ - type: dot_precision
281
+ value: 0.6046511627906976
282
+ name: Dot Precision
283
+ - type: dot_recall
284
+ value: 0.8455284552845529
285
+ name: Dot Recall
286
+ - type: dot_ap
287
+ value: 0.6970235044462813
288
+ name: Dot Ap
289
+ - type: manhattan_accuracy
290
+ value: 0.7265625
291
+ name: Manhattan Accuracy
292
+ - type: manhattan_accuracy_threshold
293
+ value: 757.5811767578125
294
+ name: Manhattan Accuracy Threshold
295
+ - type: manhattan_f1
296
+ value: 0.7307692307692308
297
+ name: Manhattan F1
298
+ - type: manhattan_f1_threshold
299
+ value: 795.5022583007812
300
+ name: Manhattan F1 Threshold
301
+ - type: manhattan_precision
302
+ value: 0.6934306569343066
303
+ name: Manhattan Precision
304
+ - type: manhattan_recall
305
+ value: 0.7723577235772358
306
+ name: Manhattan Recall
307
+ - type: manhattan_ap
308
+ value: 0.7716838426822531
309
+ name: Manhattan Ap
310
+ - type: euclidean_accuracy
311
+ value: 0.73046875
312
+ name: Euclidean Accuracy
313
+ - type: euclidean_accuracy_threshold
314
+ value: 25.439434051513672
315
+ name: Euclidean Accuracy Threshold
316
+ - type: euclidean_f1
317
+ value: 0.7315175097276265
318
+ name: Euclidean F1
319
+ - type: euclidean_f1_threshold
320
+ value: 25.521507263183594
321
+ name: Euclidean F1 Threshold
322
+ - type: euclidean_precision
323
+ value: 0.7014925373134329
324
+ name: Euclidean Precision
325
+ - type: euclidean_recall
326
+ value: 0.7642276422764228
327
+ name: Euclidean Recall
328
+ - type: euclidean_ap
329
+ value: 0.7677200801958495
330
+ name: Euclidean Ap
331
+ - type: max_accuracy
332
+ value: 0.73046875
333
+ name: Max Accuracy
334
+ - type: max_accuracy_threshold
335
+ value: 757.5811767578125
336
+ name: Max Accuracy Threshold
337
+ - type: max_f1
338
+ value: 0.7315175097276265
339
+ name: Max F1
340
+ - type: max_f1_threshold
341
+ value: 795.5022583007812
342
+ name: Max F1 Threshold
343
+ - type: max_precision
344
+ value: 0.7014925373134329
345
+ name: Max Precision
346
+ - type: max_recall
347
+ value: 0.8455284552845529
348
+ name: Max Recall
349
+ - type: max_ap
350
+ value: 0.7716838426822531
351
+ name: Max Ap
352
+ ---
353
+
354
+ # SentenceTransformer based on BXresearch/DeBERTa2-0.9B-ST-v2
355
+
356
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BXresearch/DeBERTa2-0.9B-ST-v2](https://huggingface.co/BXresearch/DeBERTa2-0.9B-ST-v2) on the [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
357
+
358
+ ## Model Details
359
+
360
+ ### Model Description
361
+ - **Model Type:** Sentence Transformer
362
+ - **Base model:** [BXresearch/DeBERTa2-0.9B-ST-v2](https://huggingface.co/BXresearch/DeBERTa2-0.9B-ST-v2) <!-- at revision cafb66e82995a2b6620420a30de218463a1cdafd -->
363
+ - **Maximum Sequence Length:** 512 tokens
364
+ - **Output Dimensionality:** 1536 tokens
365
+ - **Similarity Function:** Cosine Similarity
366
+ - **Training Dataset:**
367
+ - [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
368
+ - **Language:** en
369
+ <!-- - **License:** Unknown -->
370
+
371
+ ### Model Sources
372
+
373
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
374
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
375
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
376
+
377
+ ### Full Model Architecture
378
+
379
+ ```
380
+ SentenceTransformer(
381
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
382
+ (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
383
+ )
384
+ ```
385
+
386
+ ## Usage
387
+
388
+ ### Direct Usage (Sentence Transformers)
389
+
390
+ First install the Sentence Transformers library:
391
+
392
+ ```bash
393
+ pip install -U sentence-transformers
394
+ ```
395
+
396
+ Then you can load this model and run inference.
397
+ ```python
398
+ from sentence_transformers import SentenceTransformer
399
+
400
+ # Download from the 🤗 Hub
401
+ model = SentenceTransformer("bobox/DeBERTa2-0.9B-ST-stsb-checkpoints-tmp")
402
+ # Run inference
403
+ sentences = [
404
+ 'The boy is playing the piano.',
405
+ 'The woman is pouring oil into the pan.',
406
+ 'A small black and white dog is swimming in water.',
407
+ ]
408
+ embeddings = model.encode(sentences)
409
+ print(embeddings.shape)
410
+ # [3, 1536]
411
+
412
+ # Get the similarity scores for the embeddings
413
+ similarities = model.similarity(embeddings, embeddings)
414
+ print(similarities.shape)
415
+ # [3, 3]
416
+ ```
417
+
418
+ <!--
419
+ ### Direct Usage (Transformers)
420
+
421
+ <details><summary>Click to see the direct usage in Transformers</summary>
422
+
423
+ </details>
424
+ -->
425
+
426
+ <!--
427
+ ### Downstream Usage (Sentence Transformers)
428
+
429
+ You can finetune this model on your own dataset.
430
+
431
+ <details><summary>Click to expand</summary>
432
+
433
+ </details>
434
+ -->
435
+
436
+ <!--
437
+ ### Out-of-Scope Use
438
+
439
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
440
+ -->
441
+
442
+ ## Evaluation
443
+
444
+ ### Metrics
445
+
446
+ #### Semantic Similarity
447
+ * Dataset: `sts-test`
448
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
449
+
450
+ | Metric | Value |
451
+ |:--------------------|:-----------|
452
+ | pearson_cosine | 0.9167 |
453
+ | **spearman_cosine** | **0.9289** |
454
+ | pearson_manhattan | 0.928 |
455
+ | spearman_manhattan | 0.9299 |
456
+ | pearson_euclidean | 0.9285 |
457
+ | spearman_euclidean | 0.9304 |
458
+ | pearson_dot | 0.9 |
459
+ | spearman_dot | 0.9053 |
460
+ | pearson_max | 0.9285 |
461
+ | spearman_max | 0.9304 |
462
+
463
+ #### Binary Classification
464
+ * Dataset: `allNLI-dev`
465
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
466
+
467
+ | Metric | Value |
468
+ |:-----------------------------|:-----------|
469
+ | cosine_accuracy | 0.7578 |
470
+ | cosine_accuracy_threshold | 0.7992 |
471
+ | cosine_f1 | 0.6228 |
472
+ | cosine_f1_threshold | 0.6627 |
473
+ | cosine_precision | 0.483 |
474
+ | cosine_recall | 0.8765 |
475
+ | cosine_ap | 0.5935 |
476
+ | dot_accuracy | 0.75 |
477
+ | dot_accuracy_threshold | 843.8835 |
478
+ | dot_f1 | 0.6079 |
479
+ | dot_f1_threshold | 691.124 |
480
+ | dot_precision | 0.4726 |
481
+ | dot_recall | 0.8519 |
482
+ | dot_ap | 0.5773 |
483
+ | manhattan_accuracy | 0.7539 |
484
+ | manhattan_accuracy_threshold | 638.1295 |
485
+ | manhattan_f1 | 0.6244 |
486
+ | manhattan_f1_threshold | 805.1597 |
487
+ | manhattan_precision | 0.4929 |
488
+ | manhattan_recall | 0.8519 |
489
+ | manhattan_ap | 0.5967 |
490
+ | euclidean_accuracy | 0.7578 |
491
+ | euclidean_accuracy_threshold | 20.6338 |
492
+ | euclidean_f1 | 0.6273 |
493
+ | euclidean_f1_threshold | 25.803 |
494
+ | euclidean_precision | 0.4964 |
495
+ | euclidean_recall | 0.8519 |
496
+ | euclidean_ap | 0.5938 |
497
+ | max_accuracy | 0.7578 |
498
+ | max_accuracy_threshold | 843.8835 |
499
+ | max_f1 | 0.6273 |
500
+ | max_f1_threshold | 805.1597 |
501
+ | max_precision | 0.4964 |
502
+ | max_recall | 0.8765 |
503
+ | **max_ap** | **0.5967** |
504
+
505
+ #### Binary Classification
506
+ * Dataset: `Qnli-dev`
507
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
508
+
509
+ | Metric | Value |
510
+ |:-----------------------------|:-----------|
511
+ | cosine_accuracy | 0.7227 |
512
+ | cosine_accuracy_threshold | 0.6947 |
513
+ | cosine_f1 | 0.728 |
514
+ | cosine_f1_threshold | 0.6947 |
515
+ | cosine_precision | 0.6884 |
516
+ | cosine_recall | 0.7724 |
517
+ | cosine_ap | 0.7556 |
518
+ | dot_accuracy | 0.6953 |
519
+ | dot_accuracy_threshold | 729.3518 |
520
+ | dot_f1 | 0.7051 |
521
+ | dot_f1_threshold | 687.4433 |
522
+ | dot_precision | 0.6047 |
523
+ | dot_recall | 0.8455 |
524
+ | dot_ap | 0.697 |
525
+ | manhattan_accuracy | 0.7266 |
526
+ | manhattan_accuracy_threshold | 757.5812 |
527
+ | manhattan_f1 | 0.7308 |
528
+ | manhattan_f1_threshold | 795.5023 |
529
+ | manhattan_precision | 0.6934 |
530
+ | manhattan_recall | 0.7724 |
531
+ | manhattan_ap | 0.7717 |
532
+ | euclidean_accuracy | 0.7305 |
533
+ | euclidean_accuracy_threshold | 25.4394 |
534
+ | euclidean_f1 | 0.7315 |
535
+ | euclidean_f1_threshold | 25.5215 |
536
+ | euclidean_precision | 0.7015 |
537
+ | euclidean_recall | 0.7642 |
538
+ | euclidean_ap | 0.7677 |
539
+ | max_accuracy | 0.7305 |
540
+ | max_accuracy_threshold | 757.5812 |
541
+ | max_f1 | 0.7315 |
542
+ | max_f1_threshold | 795.5023 |
543
+ | max_precision | 0.7015 |
544
+ | max_recall | 0.8455 |
545
+ | **max_ap** | **0.7717** |
546
+
547
+ <!--
548
+ ## Bias, Risks and Limitations
549
+
550
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
551
+ -->
552
+
553
+ <!--
554
+ ### Recommendations
555
+
556
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
557
+ -->
558
+
559
+ ## Training Details
560
+
561
+ ### Training Dataset
562
+
563
+ #### sentence-transformers/stsb
564
+
565
+ * Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
566
+ * Size: 5,749 training samples
567
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
568
+ * Approximate statistics based on the first 1000 samples:
569
+ | | sentence1 | sentence2 | score |
570
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
571
+ | type | string | string | float |
572
+ | details | <ul><li>min: 6 tokens</li><li>mean: 9.81 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 9.74 tokens</li><li>max: 25 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.54</li><li>max: 1.0</li></ul> |
573
+ * Samples:
574
+ | sentence1 | sentence2 | score |
575
+ |:-----------------------------------------------------------|:----------------------------------------------------------------------|:------------------|
576
+ | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
577
+ | <code>A man is playing a large flute.</code> | <code>A man is playing a flute.</code> | <code>0.76</code> |
578
+ | <code>A man is spreading shreded cheese on a pizza.</code> | <code>A man is spreading shredded cheese on an uncooked pizza.</code> | <code>0.76</code> |
579
+ * Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
580
+ ```json
581
+ {
582
+ "scale": 20.0,
583
+ "similarity_fct": "pairwise_angle_sim"
584
+ }
585
+ ```
586
+
587
+ ### Evaluation Dataset
588
+
589
+ #### sentence-transformers/stsb
590
+
591
+ * Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
592
+ * Size: 512 evaluation samples
593
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
594
+ * Approximate statistics based on the first 1000 samples:
595
+ | | sentence1 | sentence2 | score |
596
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
597
+ | type | string | string | float |
598
+ | details | <ul><li>min: 6 tokens</li><li>mean: 11.16 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.17 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
599
+ * Samples:
600
+ | sentence1 | sentence2 | score |
601
+ |:--------------------------------------------------|:------------------------------------------------------|:------------------|
602
+ | <code>A man with a hard hat is dancing.</code> | <code>A man wearing a hard hat is dancing.</code> | <code>1.0</code> |
603
+ | <code>A young child is riding a horse.</code> | <code>A child is riding a horse.</code> | <code>0.95</code> |
604
+ | <code>A man is feeding a mouse to a snake.</code> | <code>The man is feeding a mouse to the snake.</code> | <code>1.0</code> |
605
+ * Loss: [<code>AnglELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#angleloss) with these parameters:
606
+ ```json
607
+ {
608
+ "scale": 20.0,
609
+ "similarity_fct": "pairwise_angle_sim"
610
+ }
611
+ ```
612
+
613
+ ### Training Hyperparameters
614
+ #### Non-Default Hyperparameters
615
+
616
+ - `eval_strategy`: steps
617
+ - `per_device_eval_batch_size`: 256
618
+ - `gradient_accumulation_steps`: 2
619
+ - `learning_rate`: 1.5e-05
620
+ - `weight_decay`: 5e-05
621
+ - `num_train_epochs`: 2
622
+ - `lr_scheduler_type`: cosine_with_min_lr
623
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 2e-06}
624
+ - `warmup_ratio`: 0.2
625
+ - `save_safetensors`: False
626
+ - `fp16`: True
627
+ - `push_to_hub`: True
628
+ - `hub_model_id`: bobox/DeBERTa2-0.9B-ST-stsb-checkpoints-tmp
629
+ - `hub_strategy`: all_checkpoints
630
+ - `batch_sampler`: no_duplicates
631
+
632
+ #### All Hyperparameters
633
+ <details><summary>Click to expand</summary>
634
+
635
+ - `overwrite_output_dir`: False
636
+ - `do_predict`: False
637
+ - `eval_strategy`: steps
638
+ - `prediction_loss_only`: True
639
+ - `per_device_train_batch_size`: 8
640
+ - `per_device_eval_batch_size`: 256
641
+ - `per_gpu_train_batch_size`: None
642
+ - `per_gpu_eval_batch_size`: None
643
+ - `gradient_accumulation_steps`: 2
644
+ - `eval_accumulation_steps`: None
645
+ - `learning_rate`: 1.5e-05
646
+ - `weight_decay`: 5e-05
647
+ - `adam_beta1`: 0.9
648
+ - `adam_beta2`: 0.999
649
+ - `adam_epsilon`: 1e-08
650
+ - `max_grad_norm`: 1.0
651
+ - `num_train_epochs`: 2
652
+ - `max_steps`: -1
653
+ - `lr_scheduler_type`: cosine_with_min_lr
654
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 2e-06}
655
+ - `warmup_ratio`: 0.2
656
+ - `warmup_steps`: 0
657
+ - `log_level`: passive
658
+ - `log_level_replica`: warning
659
+ - `log_on_each_node`: True
660
+ - `logging_nan_inf_filter`: True
661
+ - `save_safetensors`: False
662
+ - `save_on_each_node`: False
663
+ - `save_only_model`: False
664
+ - `restore_callback_states_from_checkpoint`: False
665
+ - `no_cuda`: False
666
+ - `use_cpu`: False
667
+ - `use_mps_device`: False
668
+ - `seed`: 42
669
+ - `data_seed`: None
670
+ - `jit_mode_eval`: False
671
+ - `use_ipex`: False
672
+ - `bf16`: False
673
+ - `fp16`: True
674
+ - `fp16_opt_level`: O1
675
+ - `half_precision_backend`: auto
676
+ - `bf16_full_eval`: False
677
+ - `fp16_full_eval`: False
678
+ - `tf32`: None
679
+ - `local_rank`: 0
680
+ - `ddp_backend`: None
681
+ - `tpu_num_cores`: None
682
+ - `tpu_metrics_debug`: False
683
+ - `debug`: []
684
+ - `dataloader_drop_last`: False
685
+ - `dataloader_num_workers`: 0
686
+ - `dataloader_prefetch_factor`: None
687
+ - `past_index`: -1
688
+ - `disable_tqdm`: False
689
+ - `remove_unused_columns`: True
690
+ - `label_names`: None
691
+ - `load_best_model_at_end`: False
692
+ - `ignore_data_skip`: False
693
+ - `fsdp`: []
694
+ - `fsdp_min_num_params`: 0
695
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
696
+ - `fsdp_transformer_layer_cls_to_wrap`: None
697
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
698
+ - `deepspeed`: None
699
+ - `label_smoothing_factor`: 0.0
700
+ - `optim`: adamw_torch
701
+ - `optim_args`: None
702
+ - `adafactor`: False
703
+ - `group_by_length`: False
704
+ - `length_column_name`: length
705
+ - `ddp_find_unused_parameters`: None
706
+ - `ddp_bucket_cap_mb`: None
707
+ - `ddp_broadcast_buffers`: False
708
+ - `dataloader_pin_memory`: True
709
+ - `dataloader_persistent_workers`: False
710
+ - `skip_memory_metrics`: True
711
+ - `use_legacy_prediction_loop`: False
712
+ - `push_to_hub`: True
713
+ - `resume_from_checkpoint`: None
714
+ - `hub_model_id`: bobox/DeBERTa2-0.9B-ST-stsb-checkpoints-tmp
715
+ - `hub_strategy`: all_checkpoints
716
+ - `hub_private_repo`: False
717
+ - `hub_always_push`: False
718
+ - `gradient_checkpointing`: False
719
+ - `gradient_checkpointing_kwargs`: None
720
+ - `include_inputs_for_metrics`: False
721
+ - `eval_do_concat_batches`: True
722
+ - `fp16_backend`: auto
723
+ - `push_to_hub_model_id`: None
724
+ - `push_to_hub_organization`: None
725
+ - `mp_parameters`:
726
+ - `auto_find_batch_size`: False
727
+ - `full_determinism`: False
728
+ - `torchdynamo`: None
729
+ - `ray_scope`: last
730
+ - `ddp_timeout`: 1800
731
+ - `torch_compile`: False
732
+ - `torch_compile_backend`: None
733
+ - `torch_compile_mode`: None
734
+ - `dispatch_batches`: None
735
+ - `split_batches`: None
736
+ - `include_tokens_per_second`: False
737
+ - `include_num_input_tokens_seen`: False
738
+ - `neftune_noise_alpha`: None
739
+ - `optim_target_modules`: None
740
+ - `batch_eval_metrics`: False
741
+ - `eval_on_start`: False
742
+ - `batch_sampler`: no_duplicates
743
+ - `multi_dataset_batch_sampler`: proportional
744
+
745
+ </details>
746
+
747
+ ### Training Logs
748
+ <details><summary>Click to expand</summary>
749
+
750
+ | Epoch | Step | Training Loss | loss | Qnli-dev_max_ap | allNLI-dev_max_ap | sts-test_spearman_cosine |
751
+ |:------:|:----:|:-------------:|:------:|:---------------:|:-----------------:|:------------------------:|
752
+ | 0.0056 | 2 | 2.6549 | - | - | - | - |
753
+ | 0.0111 | 4 | 2.7355 | - | - | - | - |
754
+ | 0.0167 | 6 | 3.6211 | - | - | - | - |
755
+ | 0.0223 | 8 | 3.0686 | - | - | - | - |
756
+ | 0.0278 | 10 | 3.4113 | - | - | - | - |
757
+ | 0.0334 | 12 | 2.4857 | - | - | - | - |
758
+ | 0.0389 | 14 | 2.4288 | - | - | - | - |
759
+ | 0.0445 | 16 | 2.6203 | - | - | - | - |
760
+ | 0.0501 | 18 | 2.7441 | - | - | - | - |
761
+ | 0.0556 | 20 | 3.4263 | - | - | - | - |
762
+ | 0.0612 | 22 | 2.3565 | - | - | - | - |
763
+ | 0.0668 | 24 | 2.5596 | - | - | - | - |
764
+ | 0.0723 | 26 | 3.0866 | - | - | - | - |
765
+ | 0.0779 | 28 | 3.223 | - | - | - | - |
766
+ | 0.0834 | 30 | 2.012 | - | - | - | - |
767
+ | 0.0890 | 32 | 3.2829 | - | - | - | - |
768
+ | 0.0946 | 34 | 3.9277 | - | - | - | - |
769
+ | 0.1001 | 36 | 2.785 | 2.6652 | 0.7960 | 0.6275 | 0.9294 |
770
+ | 0.1057 | 38 | 3.4966 | - | - | - | - |
771
+ | 0.1113 | 40 | 2.5923 | - | - | - | - |
772
+ | 0.1168 | 42 | 3.4418 | - | - | - | - |
773
+ | 0.1224 | 44 | 2.6519 | - | - | - | - |
774
+ | 0.1280 | 46 | 3.7746 | - | - | - | - |
775
+ | 0.1335 | 48 | 2.6736 | - | - | - | - |
776
+ | 0.1391 | 50 | 3.6764 | - | - | - | - |
777
+ | 0.1446 | 52 | 3.5311 | - | - | - | - |
778
+ | 0.1502 | 54 | 2.5869 | - | - | - | - |
779
+ | 0.1558 | 56 | 3.183 | - | - | - | - |
780
+ | 0.1613 | 58 | 2.747 | - | - | - | - |
781
+ | 0.1669 | 60 | 1.965 | - | - | - | - |
782
+ | 0.1725 | 62 | 2.1785 | - | - | - | - |
783
+ | 0.1780 | 64 | 2.5788 | - | - | - | - |
784
+ | 0.1836 | 66 | 3.1776 | - | - | - | - |
785
+ | 0.1892 | 68 | 2.6464 | - | - | - | - |
786
+ | 0.1947 | 70 | 2.7619 | - | - | - | - |
787
+ | 0.2003 | 72 | 3.0911 | 2.6171 | 0.7923 | 0.6295 | 0.9276 |
788
+ | 0.2058 | 74 | 2.4308 | - | - | - | - |
789
+ | 0.2114 | 76 | 3.2068 | - | - | - | - |
790
+ | 0.2170 | 78 | 2.4081 | - | - | - | - |
791
+ | 0.2225 | 80 | 2.3257 | - | - | - | - |
792
+ | 0.2281 | 82 | 3.0499 | - | - | - | - |
793
+ | 0.2337 | 84 | 3.2518 | - | - | - | - |
794
+ | 0.2392 | 86 | 2.7876 | - | - | - | - |
795
+ | 0.2448 | 88 | 2.7898 | - | - | - | - |
796
+ | 0.2503 | 90 | 2.7116 | - | - | - | - |
797
+ | 0.2559 | 92 | 3.0505 | - | - | - | - |
798
+ | 0.2615 | 94 | 2.5901 | - | - | - | - |
799
+ | 0.2670 | 96 | 1.9563 | - | - | - | - |
800
+ | 0.2726 | 98 | 2.1006 | - | - | - | - |
801
+ | 0.2782 | 100 | 2.1853 | - | - | - | - |
802
+ | 0.2837 | 102 | 2.327 | - | - | - | - |
803
+ | 0.2893 | 104 | 1.9937 | - | - | - | - |
804
+ | 0.2949 | 106 | 2.543 | - | - | - | - |
805
+ | 0.3004 | 108 | 1.9826 | 2.4596 | 0.7919 | 0.6329 | 0.9341 |
806
+ | 0.3060 | 110 | 3.0746 | - | - | - | - |
807
+ | 0.3115 | 112 | 2.4145 | - | - | - | - |
808
+ | 0.3171 | 114 | 2.244 | - | - | - | - |
809
+ | 0.3227 | 116 | 2.78 | - | - | - | - |
810
+ | 0.3282 | 118 | 2.8323 | - | - | - | - |
811
+ | 0.3338 | 120 | 2.4639 | - | - | - | - |
812
+ | 0.3394 | 122 | 2.9216 | - | - | - | - |
813
+ | 0.3449 | 124 | 2.0747 | - | - | - | - |
814
+ | 0.3505 | 126 | 2.7573 | - | - | - | - |
815
+ | 0.3561 | 128 | 3.7019 | - | - | - | - |
816
+ | 0.3616 | 130 | 3.3155 | - | - | - | - |
817
+ | 0.3672 | 132 | 3.625 | - | - | - | - |
818
+ | 0.3727 | 134 | 3.2889 | - | - | - | - |
819
+ | 0.3783 | 136 | 3.5936 | - | - | - | - |
820
+ | 0.3839 | 138 | 3.5932 | - | - | - | - |
821
+ | 0.3894 | 140 | 3.0457 | - | - | - | - |
822
+ | 0.3950 | 142 | 3.093 | - | - | - | - |
823
+ | 0.4006 | 144 | 2.7189 | 2.4698 | 0.7752 | 0.5896 | 0.9346 |
824
+ | 0.4061 | 146 | 3.2578 | - | - | - | - |
825
+ | 0.4117 | 148 | 3.3581 | - | - | - | - |
826
+ | 0.4172 | 150 | 2.9734 | - | - | - | - |
827
+ | 0.4228 | 152 | 3.0514 | - | - | - | - |
828
+ | 0.4284 | 154 | 3.1983 | - | - | - | - |
829
+ | 0.4339 | 156 | 2.9033 | - | - | - | - |
830
+ | 0.4395 | 158 | 2.9279 | - | - | - | - |
831
+ | 0.4451 | 160 | 3.1336 | - | - | - | - |
832
+ | 0.4506 | 162 | 3.1467 | - | - | - | - |
833
+ | 0.4562 | 164 | 3.0381 | - | - | - | - |
834
+ | 0.4618 | 166 | 3.068 | - | - | - | - |
835
+ | 0.4673 | 168 | 3.0261 | - | - | - | - |
836
+ | 0.4729 | 170 | 3.2867 | - | - | - | - |
837
+ | 0.4784 | 172 | 2.8474 | - | - | - | - |
838
+ | 0.4840 | 174 | 2.7982 | - | - | - | - |
839
+ | 0.4896 | 176 | 2.7945 | - | - | - | - |
840
+ | 0.4951 | 178 | 3.1312 | - | - | - | - |
841
+ | 0.5007 | 180 | 2.9704 | 2.4640 | 0.7524 | 0.6033 | 0.9242 |
842
+ | 0.5063 | 182 | 2.9856 | - | - | - | - |
843
+ | 0.5118 | 184 | 3.014 | - | - | - | - |
844
+ | 0.5174 | 186 | 3.0125 | - | - | - | - |
845
+ | 0.5229 | 188 | 2.8149 | - | - | - | - |
846
+ | 0.5285 | 190 | 2.7954 | - | - | - | - |
847
+ | 0.5341 | 192 | 3.078 | - | - | - | - |
848
+ | 0.5396 | 194 | 2.955 | - | - | - | - |
849
+ | 0.5452 | 196 | 2.9468 | - | - | - | - |
850
+ | 0.5508 | 198 | 3.0791 | - | - | - | - |
851
+ | 0.5563 | 200 | 2.998 | - | - | - | - |
852
+ | 0.5619 | 202 | 2.9068 | - | - | - | - |
853
+ | 0.5675 | 204 | 2.8283 | - | - | - | - |
854
+ | 0.5730 | 206 | 2.9216 | - | - | - | - |
855
+ | 0.5786 | 208 | 3.3441 | - | - | - | - |
856
+ | 0.5841 | 210 | 3.0 | - | - | - | - |
857
+ | 0.5897 | 212 | 2.9023 | - | - | - | - |
858
+ | 0.5953 | 214 | 2.8177 | - | - | - | - |
859
+ | 0.6008 | 216 | 2.8262 | 2.4979 | 0.7899 | 0.6037 | 0.9260 |
860
+ | 0.6064 | 218 | 2.7832 | - | - | - | - |
861
+ | 0.6120 | 220 | 3.0085 | - | - | - | - |
862
+ | 0.6175 | 222 | 2.8762 | - | - | - | - |
863
+ | 0.6231 | 224 | 3.147 | - | - | - | - |
864
+ | 0.6287 | 226 | 3.4262 | - | - | - | - |
865
+ | 0.6342 | 228 | 2.8271 | - | - | - | - |
866
+ | 0.6398 | 230 | 2.4024 | - | - | - | - |
867
+ | 0.6453 | 232 | 2.7556 | - | - | - | - |
868
+ | 0.6509 | 234 | 3.4652 | - | - | - | - |
869
+ | 0.6565 | 236 | 2.7235 | - | - | - | - |
870
+ | 0.6620 | 238 | 2.6498 | - | - | - | - |
871
+ | 0.6676 | 240 | 3.0933 | - | - | - | - |
872
+ | 0.6732 | 242 | 3.1193 | - | - | - | - |
873
+ | 0.6787 | 244 | 2.7249 | - | - | - | - |
874
+ | 0.6843 | 246 | 2.8931 | - | - | - | - |
875
+ | 0.6898 | 248 | 2.7913 | - | - | - | - |
876
+ | 0.6954 | 250 | 2.6933 | - | - | - | - |
877
+ | 0.7010 | 252 | 2.5632 | 2.4585 | 0.7700 | 0.6065 | 0.9298 |
878
+ | 0.7065 | 254 | 2.8347 | - | - | - | - |
879
+ | 0.7121 | 256 | 2.3827 | - | - | - | - |
880
+ | 0.7177 | 258 | 2.9065 | - | - | - | - |
881
+ | 0.7232 | 260 | 2.8162 | - | - | - | - |
882
+ | 0.7288 | 262 | 2.5485 | - | - | - | - |
883
+ | 0.7344 | 264 | 2.5751 | - | - | - | - |
884
+ | 0.7399 | 266 | 2.9056 | - | - | - | - |
885
+ | 0.7455 | 268 | 3.1397 | - | - | - | - |
886
+ | 0.7510 | 270 | 3.3107 | - | - | - | - |
887
+ | 0.7566 | 272 | 2.9024 | - | - | - | - |
888
+ | 0.7622 | 274 | 2.2307 | - | - | - | - |
889
+ | 0.7677 | 276 | 3.0097 | - | - | - | - |
890
+ | 0.7733 | 278 | 3.1406 | - | - | - | - |
891
+ | 0.7789 | 280 | 2.6786 | - | - | - | - |
892
+ | 0.7844 | 282 | 2.8882 | - | - | - | - |
893
+ | 0.7900 | 284 | 2.7215 | - | - | - | - |
894
+ | 0.7955 | 286 | 3.4188 | - | - | - | - |
895
+ | 0.8011 | 288 | 2.9901 | 2.4414 | 0.7665 | 0.6023 | 0.9288 |
896
+ | 0.8067 | 290 | 2.5144 | - | - | - | - |
897
+ | 0.8122 | 292 | 3.1932 | - | - | - | - |
898
+ | 0.8178 | 294 | 2.9733 | - | - | - | - |
899
+ | 0.8234 | 296 | 2.6895 | - | - | - | - |
900
+ | 0.8289 | 298 | 2.678 | - | - | - | - |
901
+ | 0.8345 | 300 | 2.5462 | - | - | - | - |
902
+ | 0.8401 | 302 | 2.6911 | - | - | - | - |
903
+ | 0.8456 | 304 | 2.8404 | - | - | - | - |
904
+ | 0.8512 | 306 | 2.5358 | - | - | - | - |
905
+ | 0.8567 | 308 | 3.1245 | - | - | - | - |
906
+ | 0.8623 | 310 | 2.3404 | - | - | - | - |
907
+ | 0.8679 | 312 | 3.0751 | - | - | - | - |
908
+ | 0.8734 | 314 | 2.7005 | - | - | - | - |
909
+ | 0.8790 | 316 | 2.7387 | - | - | - | - |
910
+ | 0.8846 | 318 | 2.7227 | - | - | - | - |
911
+ | 0.8901 | 320 | 2.9085 | - | - | - | - |
912
+ | 0.8957 | 322 | 3.3239 | - | - | - | - |
913
+ | 0.9013 | 324 | 2.4256 | 2.4106 | 0.7644 | 0.6087 | 0.9304 |
914
+ | 0.9068 | 326 | 2.5059 | - | - | - | - |
915
+ | 0.9124 | 328 | 2.5387 | - | - | - | - |
916
+ | 0.9179 | 330 | 2.899 | - | - | - | - |
917
+ | 0.9235 | 332 | 2.7256 | - | - | - | - |
918
+ | 0.9291 | 334 | 2.4862 | - | - | - | - |
919
+ | 0.9346 | 336 | 3.0014 | - | - | - | - |
920
+ | 0.9402 | 338 | 2.4164 | - | - | - | - |
921
+ | 0.9458 | 340 | 2.3148 | - | - | - | - |
922
+ | 0.9513 | 342 | 2.9414 | - | - | - | - |
923
+ | 0.9569 | 344 | 2.4435 | - | - | - | - |
924
+ | 0.9624 | 346 | 2.6286 | - | - | - | - |
925
+ | 0.9680 | 348 | 2.1744 | - | - | - | - |
926
+ | 0.9736 | 350 | 2.5866 | - | - | - | - |
927
+ | 0.9791 | 352 | 2.8333 | - | - | - | - |
928
+ | 0.9847 | 354 | 2.3544 | - | - | - | - |
929
+ | 0.9903 | 356 | 2.5397 | - | - | - | - |
930
+ | 0.9958 | 358 | 3.4058 | - | - | - | - |
931
+ | 1.0014 | 360 | 2.2904 | 2.4089 | 0.7888 | 0.6104 | 0.9338 |
932
+ | 1.0070 | 362 | 2.7925 | - | - | - | - |
933
+ | 1.0125 | 364 | 2.6415 | - | - | - | - |
934
+ | 1.0181 | 366 | 2.724 | - | - | - | - |
935
+ | 1.0236 | 368 | 2.569 | - | - | - | - |
936
+ | 1.0292 | 370 | 2.808 | - | - | - | - |
937
+ | 1.0348 | 372 | 2.4672 | - | - | - | - |
938
+ | 1.0403 | 374 | 2.3964 | - | - | - | - |
939
+ | 1.0459 | 376 | 2.3518 | - | - | - | - |
940
+ | 1.0515 | 378 | 2.7617 | - | - | - | - |
941
+ | 1.0570 | 380 | 2.5651 | - | - | - | - |
942
+ | 1.0626 | 382 | 2.2623 | - | - | - | - |
943
+ | 1.0682 | 384 | 2.2048 | - | - | - | - |
944
+ | 1.0737 | 386 | 2.1426 | - | - | - | - |
945
+ | 1.0793 | 388 | 1.8182 | - | - | - | - |
946
+ | 1.0848 | 390 | 2.3166 | - | - | - | - |
947
+ | 1.0904 | 392 | 2.4101 | - | - | - | - |
948
+ | 1.0960 | 394 | 2.8932 | - | - | - | - |
949
+ | 1.1015 | 396 | 3.0201 | 2.4217 | 0.7851 | 0.6205 | 0.9301 |
950
+ | 1.1071 | 398 | 2.6101 | - | - | - | - |
951
+ | 1.1127 | 400 | 2.3627 | - | - | - | - |
952
+ | 1.1182 | 402 | 2.5402 | - | - | - | - |
953
+ | 1.1238 | 404 | 2.695 | - | - | - | - |
954
+ | 1.1293 | 406 | 3.0563 | - | - | - | - |
955
+ | 1.1349 | 408 | 2.2296 | - | - | - | - |
956
+ | 1.1405 | 410 | 3.057 | - | - | - | - |
957
+ | 1.1460 | 412 | 2.8023 | - | - | - | - |
958
+ | 1.1516 | 414 | 2.6492 | - | - | - | - |
959
+ | 1.1572 | 416 | 2.2406 | - | - | - | - |
960
+ | 1.1627 | 418 | 1.7195 | - | - | - | - |
961
+ | 1.1683 | 420 | 2.2773 | - | - | - | - |
962
+ | 1.1739 | 422 | 2.3639 | - | - | - | - |
963
+ | 1.1794 | 424 | 2.3348 | - | - | - | - |
964
+ | 1.1850 | 426 | 2.6791 | - | - | - | - |
965
+ | 1.1905 | 428 | 2.3621 | - | - | - | - |
966
+ | 1.1961 | 430 | 2.5224 | - | - | - | - |
967
+ | 1.2017 | 432 | 2.4063 | 2.4724 | 0.7628 | 0.6043 | 0.9270 |
968
+ | 1.2072 | 434 | 1.9713 | - | - | - | - |
969
+ | 1.2128 | 436 | 2.4265 | - | - | - | - |
970
+ | 1.2184 | 438 | 2.0827 | - | - | - | - |
971
+ | 1.2239 | 440 | 2.0696 | - | - | - | - |
972
+ | 1.2295 | 442 | 2.7507 | - | - | - | - |
973
+ | 1.2350 | 444 | 2.5436 | - | - | - | - |
974
+ | 1.2406 | 446 | 2.4039 | - | - | - | - |
975
+ | 1.2462 | 448 | 2.4229 | - | - | - | - |
976
+ | 1.2517 | 450 | 2.323 | - | - | - | - |
977
+ | 1.2573 | 452 | 2.6099 | - | - | - | - |
978
+ | 1.2629 | 454 | 2.0329 | - | - | - | - |
979
+ | 1.2684 | 456 | 1.8797 | - | - | - | - |
980
+ | 1.2740 | 458 | 1.4485 | - | - | - | - |
981
+ | 1.2796 | 460 | 1.6794 | - | - | - | - |
982
+ | 1.2851 | 462 | 2.0934 | - | - | - | - |
983
+ | 1.2907 | 464 | 1.9579 | - | - | - | - |
984
+ | 1.2962 | 466 | 1.9288 | - | - | - | - |
985
+ | 1.3018 | 468 | 1.5874 | 2.5056 | 0.7833 | 0.5948 | 0.9345 |
986
+ | 1.3074 | 470 | 1.8715 | - | - | - | - |
987
+ | 1.3129 | 472 | 1.3778 | - | - | - | - |
988
+ | 1.3185 | 474 | 2.2242 | - | - | - | - |
989
+ | 1.3241 | 476 | 2.4031 | - | - | - | - |
990
+ | 1.3296 | 478 | 1.924 | - | - | - | - |
991
+ | 1.3352 | 480 | 1.7895 | - | - | - | - |
992
+ | 1.3408 | 482 | 2.0349 | - | - | - | - |
993
+ | 1.3463 | 484 | 1.8116 | - | - | - | - |
994
+ | 1.3519 | 486 | 2.353 | - | - | - | - |
995
+ | 1.3574 | 488 | 3.4263 | - | - | - | - |
996
+ | 1.3630 | 490 | 4.0606 | - | - | - | - |
997
+ | 1.3686 | 492 | 2.7423 | - | - | - | - |
998
+ | 1.3741 | 494 | 2.8461 | - | - | - | - |
999
+ | 1.3797 | 496 | 3.0742 | - | - | - | - |
1000
+ | 1.3853 | 498 | 2.2054 | - | - | - | - |
1001
+ | 1.3908 | 500 | 2.6009 | - | - | - | - |
1002
+ | 1.3964 | 502 | 2.242 | - | - | - | - |
1003
+ | 1.4019 | 504 | 2.9416 | 2.5288 | 0.7969 | 0.6010 | 0.9323 |
1004
+ | 1.4075 | 506 | 3.8179 | - | - | - | - |
1005
+ | 1.4131 | 508 | 3.0147 | - | - | - | - |
1006
+ | 1.4186 | 510 | 2.2185 | - | - | - | - |
1007
+ | 1.4242 | 512 | 3.0323 | - | - | - | - |
1008
+ | 1.4298 | 514 | 2.6922 | - | - | - | - |
1009
+ | 1.4353 | 516 | 2.6219 | - | - | - | - |
1010
+ | 1.4409 | 518 | 2.4365 | - | - | - | - |
1011
+ | 1.4465 | 520 | 3.1643 | - | - | - | - |
1012
+ | 1.4520 | 522 | 2.5548 | - | - | - | - |
1013
+ | 1.4576 | 524 | 2.3798 | - | - | - | - |
1014
+ | 1.4631 | 526 | 2.6361 | - | - | - | - |
1015
+ | 1.4687 | 528 | 2.6859 | - | - | - | - |
1016
+ | 1.4743 | 530 | 2.6071 | - | - | - | - |
1017
+ | 1.4798 | 532 | 2.2565 | - | - | - | - |
1018
+ | 1.4854 | 534 | 2.2415 | - | - | - | - |
1019
+ | 1.4910 | 536 | 2.4591 | - | - | - | - |
1020
+ | 1.4965 | 538 | 2.6729 | - | - | - | - |
1021
+ | 1.5021 | 540 | 2.3898 | 2.5025 | 0.7881 | 0.5978 | 0.9300 |
1022
+ | 1.5076 | 542 | 2.4614 | - | - | - | - |
1023
+ | 1.5132 | 544 | 2.5447 | - | - | - | - |
1024
+ | 1.5188 | 546 | 2.502 | - | - | - | - |
1025
+ | 1.5243 | 548 | 2.1892 | - | - | - | - |
1026
+ | 1.5299 | 550 | 2.7081 | - | - | - | - |
1027
+ | 1.5355 | 552 | 2.5523 | - | - | - | - |
1028
+ | 1.5410 | 554 | 2.3571 | - | - | - | - |
1029
+ | 1.5466 | 556 | 2.7694 | - | - | - | - |
1030
+ | 1.5522 | 558 | 2.2 | - | - | - | - |
1031
+ | 1.5577 | 560 | 2.4179 | - | - | - | - |
1032
+ | 1.5633 | 562 | 2.3914 | - | - | - | - |
1033
+ | 1.5688 | 564 | 2.1722 | - | - | - | - |
1034
+ | 1.5744 | 566 | 2.345 | - | - | - | - |
1035
+ | 1.5800 | 568 | 3.0069 | - | - | - | - |
1036
+ | 1.5855 | 570 | 2.4231 | - | - | - | - |
1037
+ | 1.5911 | 572 | 2.3597 | - | - | - | - |
1038
+ | 1.5967 | 574 | 2.143 | - | - | - | - |
1039
+ | 1.6022 | 576 | 2.6288 | 2.5368 | 0.7943 | 0.6048 | 0.9265 |
1040
+ | 1.6078 | 578 | 2.3905 | - | - | - | - |
1041
+ | 1.6134 | 580 | 2.1823 | - | - | - | - |
1042
+ | 1.6189 | 582 | 2.367 | - | - | - | - |
1043
+ | 1.6245 | 584 | 2.8189 | - | - | - | - |
1044
+ | 1.6300 | 586 | 2.6536 | - | - | - | - |
1045
+ | 1.6356 | 588 | 2.2134 | - | - | - | - |
1046
+ | 1.6412 | 590 | 1.6949 | - | - | - | - |
1047
+ | 1.6467 | 592 | 2.2029 | - | - | - | - |
1048
+ | 1.6523 | 594 | 3.0223 | - | - | - | - |
1049
+ | 1.6579 | 596 | 2.239 | - | - | - | - |
1050
+ | 1.6634 | 598 | 2.3388 | - | - | - | - |
1051
+ | 1.6690 | 600 | 2.3066 | - | - | - | - |
1052
+ | 1.6745 | 602 | 2.4762 | - | - | - | - |
1053
+ | 1.6801 | 604 | 1.9503 | - | - | - | - |
1054
+ | 1.6857 | 606 | 2.1252 | - | - | - | - |
1055
+ | 1.6912 | 608 | 1.8253 | - | - | - | - |
1056
+ | 1.6968 | 610 | 2.2938 | - | - | - | - |
1057
+ | 1.7024 | 612 | 1.9489 | 2.5747 | 0.7675 | 0.5964 | 0.9267 |
1058
+ | 1.7079 | 614 | 1.9238 | - | - | - | - |
1059
+ | 1.7135 | 616 | 1.8171 | - | - | - | - |
1060
+ | 1.7191 | 618 | 2.2371 | - | - | - | - |
1061
+ | 1.7246 | 620 | 2.4901 | - | - | - | - |
1062
+ | 1.7302 | 622 | 1.8503 | - | - | - | - |
1063
+ | 1.7357 | 624 | 2.017 | - | - | - | - |
1064
+ | 1.7413 | 626 | 2.3069 | - | - | - | - |
1065
+ | 1.7469 | 628 | 2.444 | - | - | - | - |
1066
+ | 1.7524 | 630 | 1.9606 | - | - | - | - |
1067
+ | 1.7580 | 632 | 2.2364 | - | - | - | - |
1068
+ | 1.7636 | 634 | 1.8711 | - | - | - | - |
1069
+ | 1.7691 | 636 | 2.4233 | - | - | - | - |
1070
+ | 1.7747 | 638 | 2.4065 | - | - | - | - |
1071
+ | 1.7803 | 640 | 2.0725 | - | - | - | - |
1072
+ | 1.7858 | 642 | 2.0578 | - | - | - | - |
1073
+ | 1.7914 | 644 | 2.2066 | - | - | - | - |
1074
+ | 1.7969 | 646 | 1.7767 | - | - | - | - |
1075
+ | 1.8025 | 648 | 2.7388 | 2.5685 | 0.7663 | 0.5959 | 0.9292 |
1076
+ | 1.8081 | 650 | 1.854 | - | - | - | - |
1077
+ | 1.8136 | 652 | 2.7337 | - | - | - | - |
1078
+ | 1.8192 | 654 | 2.4477 | - | - | - | - |
1079
+ | 1.8248 | 656 | 2.4818 | - | - | - | - |
1080
+ | 1.8303 | 658 | 1.8592 | - | - | - | - |
1081
+ | 1.8359 | 660 | 1.8396 | - | - | - | - |
1082
+ | 1.8414 | 662 | 2.3893 | - | - | - | - |
1083
+ | 1.8470 | 664 | 2.0139 | - | - | - | - |
1084
+ | 1.8526 | 666 | 2.8837 | - | - | - | - |
1085
+ | 1.8581 | 668 | 2.0342 | - | - | - | - |
1086
+ | 1.8637 | 670 | 1.8857 | - | - | - | - |
1087
+ | 1.8693 | 672 | 2.1147 | - | - | - | - |
1088
+ | 1.8748 | 674 | 1.6263 | - | - | - | - |
1089
+ | 1.8804 | 676 | 2.2987 | - | - | - | - |
1090
+ | 1.8860 | 678 | 1.9678 | - | - | - | - |
1091
+ | 1.8915 | 680 | 1.9999 | - | - | - | - |
1092
+ | 1.8971 | 682 | 2.2802 | - | - | - | - |
1093
+ | 1.9026 | 684 | 1.9666 | 2.5536 | 0.7717 | 0.5967 | 0.9289 |
1094
+ | 1.9082 | 686 | 1.8156 | - | - | - | - |
1095
+ | 1.9138 | 688 | 1.9542 | - | - | - | - |
1096
+ | 1.9193 | 690 | 1.859 | - | - | - | - |
1097
+ | 1.9249 | 692 | 1.6237 | - | - | - | - |
1098
+ | 1.9305 | 694 | 2.3085 | - | - | - | - |
1099
+ | 1.9360 | 696 | 2.1461 | - | - | - | - |
1100
+ | 1.9416 | 698 | 1.7024 | - | - | - | - |
1101
+ | 1.9471 | 700 | 2.2181 | - | - | - | - |
1102
+ | 1.9527 | 702 | 2.4782 | - | - | - | - |
1103
+ | 1.9583 | 704 | 1.7378 | - | - | - | - |
1104
+ | 1.9638 | 706 | 2.0422 | - | - | - | - |
1105
+ | 1.9694 | 708 | 1.7577 | - | - | - | - |
1106
+ | 1.9750 | 710 | 2.0209 | - | - | - | - |
1107
+ | 1.9805 | 712 | 2.0372 | - | - | - | - |
1108
+ | 1.9861 | 714 | 2.0915 | - | - | - | - |
1109
+ | 1.9917 | 716 | 1.603 | - | - | - | - |
1110
+ | 1.9972 | 718 | 1.7111 | - | - | - | - |
1111
+
1112
+ </details>
1113
+
1114
+ ### Framework Versions
1115
+ - Python: 3.10.12
1116
+ - Sentence Transformers: 3.0.1
1117
+ - Transformers: 4.42.4
1118
+ - PyTorch: 2.4.0+cu121
1119
+ - Accelerate: 0.32.1
1120
+ - Datasets: 2.21.0
1121
+ - Tokenizers: 0.19.1
1122
+
1123
+ ## Citation
1124
+
1125
+ ### BibTeX
1126
+
1127
+ #### Sentence Transformers
1128
+ ```bibtex
1129
+ @inproceedings{reimers-2019-sentence-bert,
1130
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1131
+ author = "Reimers, Nils and Gurevych, Iryna",
1132
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1133
+ month = "11",
1134
+ year = "2019",
1135
+ publisher = "Association for Computational Linguistics",
1136
+ url = "https://arxiv.org/abs/1908.10084",
1137
+ }
1138
+ ```
1139
+
1140
+ #### AnglELoss
1141
+ ```bibtex
1142
+ @misc{li2023angleoptimized,
1143
+ title={AnglE-optimized Text Embeddings},
1144
+ author={Xianming Li and Jing Li},
1145
+ year={2023},
1146
+ eprint={2309.12871},
1147
+ archivePrefix={arXiv},
1148
+ primaryClass={cs.CL}
1149
+ }
1150
+ ```
1151
+
1152
+ <!--
1153
+ ## Glossary
1154
+
1155
+ *Clearly define terms in order to be accessible across audiences.*
1156
+ -->
1157
+
1158
+ <!--
1159
+ ## Model Card Authors
1160
+
1161
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1162
+ -->
1163
+
1164
+ <!--
1165
+ ## Model Card Contact
1166
+
1167
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1168
+ -->
checkpoint-718/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-718/config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BXresearch/DeBERTa2-0.9B-ST-v2",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_head_size": 64,
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "conv_act": "gelu",
9
+ "conv_kernel_size": 3,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1536,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 6144,
15
+ "layer_norm_eps": 1e-07,
16
+ "max_position_embeddings": 512,
17
+ "max_relative_positions": -1,
18
+ "model_type": "deberta-v2",
19
+ "norm_rel_ebd": "layer_norm",
20
+ "num_attention_heads": 24,
21
+ "num_hidden_layers": 24,
22
+ "pad_token_id": 0,
23
+ "pooler_dropout": 0,
24
+ "pooler_hidden_act": "gelu",
25
+ "pooler_hidden_size": 1536,
26
+ "pos_att_type": [
27
+ "p2c",
28
+ "c2p"
29
+ ],
30
+ "position_biased_input": false,
31
+ "position_buckets": 256,
32
+ "relative_attention": true,
33
+ "share_att_key": true,
34
+ "torch_dtype": "float32",
35
+ "transformers_version": "4.42.4",
36
+ "type_vocab_size": 0,
37
+ "vocab_size": 128100
38
+ }
checkpoint-718/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.4",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-718/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
checkpoint-718/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2077b4aeeb8c639a99538fe925b2e299b7ef4a93d62809ca3beb499168daaad5
3
+ size 2719986554
checkpoint-718/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfd030d87bcc5de69abef9c28299d42eac6e5f9c6dd9f22ac79c1d3cf08c8fd4
3
+ size 3538506546
checkpoint-718/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df15c7833a5d7ec12ab626d2bf370d0017077affcec1fc782211ebb9259277d8
3
+ size 14244
checkpoint-718/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3ba34c689a5c7f6911a61d4eb7fbe1c72f890b3ee53957def810608f19e046c
3
+ size 1064
checkpoint-718/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-718/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-718/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5598d5e96f339a8d980c15f9afd405a2e5e1be7db41de3ed13b0f03fac1e8c17
3
+ size 2447305
checkpoint-718/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-718/tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "sp_model_kwargs": {},
58
+ "split_by_punct": false,
59
+ "stride": 0,
60
+ "tokenizer_class": "DebertaV2Tokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]",
64
+ "vocab_type": "spm"
65
+ }
checkpoint-718/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-718/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a34278fc03504ad1fef70b2881926396d8466261a43c216e270dc5c005083d1c
3
+ size 5560