radoslavralev commited on
Commit
22d3177
·
verified ·
1 Parent(s): 0d756ea

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,658 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:21470
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: thenlper/gte-small
11
+ widget:
12
+ - source_sentence: This positive resistance model is a different way of analyzing
13
+ feedback oscillator operation.
14
+ sentences:
15
+ - This positive resistance model is a different way of analyzing feedback oscillator
16
+ operation.
17
+ - This negative resistance model is an alternate way of analyzing feedback oscillator
18
+ operation.
19
+ - I am BE 8th sem. CSE student. Which path should I choose as a career or which
20
+ course I should do to get a good job in future within my country?
21
+ - source_sentence: Danny Danny Kortchmar played guitar , Charles Larkey played bass
22
+ and Gordon played drums producing with Lou Adler .
23
+ sentences:
24
+ - What is the main reason for all the problems within India?
25
+ - Gordon played guitar , Danny Kortchmar played bass and Lou Adler played drums
26
+ with Charles Larkey producing .
27
+ - Danny Danny Kortchmar played guitar , Charles Larkey played bass and Gordon played
28
+ drums producing with Lou Adler .
29
+ - source_sentence: The Ngage isn't still lacking in earbuds.
30
+ sentences:
31
+ - What is Queen's University's acceptance rate for international students on campus?
32
+ - The Ngage is still lacking in earbuds.
33
+ - The Ngage isn't still lacking in earbuds.
34
+ - source_sentence: Previously reported figures were consistently revised down.
35
+ sentences:
36
+ - Previously reported figures were consistently revised down.
37
+ - What are the side effects for using Proactiv on the face? How are the side effects
38
+ treated?
39
+ - Previously reported numbers were infrequently revised down.
40
+ - source_sentence: What is the fastest way to get a PAN card within India?
41
+ sentences:
42
+ - He has also used the OpenMusic software (designed at IRCAM ) to create computer-generated
43
+ music.
44
+ - What is the fastest way to get a PAN card outside India?
45
+ - What is the fastest way to get a PAN card within India?
46
+ pipeline_tag: sentence-similarity
47
+ library_name: sentence-transformers
48
+ metrics:
49
+ - cosine_accuracy@1
50
+ - cosine_accuracy@3
51
+ - cosine_accuracy@5
52
+ - cosine_accuracy@10
53
+ - cosine_precision@1
54
+ - cosine_precision@3
55
+ - cosine_precision@5
56
+ - cosine_precision@10
57
+ - cosine_recall@1
58
+ - cosine_recall@3
59
+ - cosine_recall@5
60
+ - cosine_recall@10
61
+ - cosine_ndcg@10
62
+ - cosine_mrr@10
63
+ - cosine_map@100
64
+ model-index:
65
+ - name: SentenceTransformer based on thenlper/gte-small
66
+ results:
67
+ - task:
68
+ type: information-retrieval
69
+ name: Information Retrieval
70
+ dataset:
71
+ name: NanoMSMARCO
72
+ type: NanoMSMARCO
73
+ metrics:
74
+ - type: cosine_accuracy@1
75
+ value: 0.28
76
+ name: Cosine Accuracy@1
77
+ - type: cosine_accuracy@3
78
+ value: 0.48
79
+ name: Cosine Accuracy@3
80
+ - type: cosine_accuracy@5
81
+ value: 0.52
82
+ name: Cosine Accuracy@5
83
+ - type: cosine_accuracy@10
84
+ value: 0.58
85
+ name: Cosine Accuracy@10
86
+ - type: cosine_precision@1
87
+ value: 0.28
88
+ name: Cosine Precision@1
89
+ - type: cosine_precision@3
90
+ value: 0.15999999999999998
91
+ name: Cosine Precision@3
92
+ - type: cosine_precision@5
93
+ value: 0.10400000000000001
94
+ name: Cosine Precision@5
95
+ - type: cosine_precision@10
96
+ value: 0.057999999999999996
97
+ name: Cosine Precision@10
98
+ - type: cosine_recall@1
99
+ value: 0.28
100
+ name: Cosine Recall@1
101
+ - type: cosine_recall@3
102
+ value: 0.48
103
+ name: Cosine Recall@3
104
+ - type: cosine_recall@5
105
+ value: 0.52
106
+ name: Cosine Recall@5
107
+ - type: cosine_recall@10
108
+ value: 0.58
109
+ name: Cosine Recall@10
110
+ - type: cosine_ndcg@10
111
+ value: 0.4281391945817123
112
+ name: Cosine Ndcg@10
113
+ - type: cosine_mrr@10
114
+ value: 0.3795238095238095
115
+ name: Cosine Mrr@10
116
+ - type: cosine_map@100
117
+ value: 0.39018847344323304
118
+ name: Cosine Map@100
119
+ - task:
120
+ type: information-retrieval
121
+ name: Information Retrieval
122
+ dataset:
123
+ name: NanoNQ
124
+ type: NanoNQ
125
+ metrics:
126
+ - type: cosine_accuracy@1
127
+ value: 0.32
128
+ name: Cosine Accuracy@1
129
+ - type: cosine_accuracy@3
130
+ value: 0.6
131
+ name: Cosine Accuracy@3
132
+ - type: cosine_accuracy@5
133
+ value: 0.66
134
+ name: Cosine Accuracy@5
135
+ - type: cosine_accuracy@10
136
+ value: 0.74
137
+ name: Cosine Accuracy@10
138
+ - type: cosine_precision@1
139
+ value: 0.32
140
+ name: Cosine Precision@1
141
+ - type: cosine_precision@3
142
+ value: 0.2
143
+ name: Cosine Precision@3
144
+ - type: cosine_precision@5
145
+ value: 0.132
146
+ name: Cosine Precision@5
147
+ - type: cosine_precision@10
148
+ value: 0.07400000000000001
149
+ name: Cosine Precision@10
150
+ - type: cosine_recall@1
151
+ value: 0.3
152
+ name: Cosine Recall@1
153
+ - type: cosine_recall@3
154
+ value: 0.55
155
+ name: Cosine Recall@3
156
+ - type: cosine_recall@5
157
+ value: 0.61
158
+ name: Cosine Recall@5
159
+ - type: cosine_recall@10
160
+ value: 0.68
161
+ name: Cosine Recall@10
162
+ - type: cosine_ndcg@10
163
+ value: 0.5108521344166539
164
+ name: Cosine Ndcg@10
165
+ - type: cosine_mrr@10
166
+ value: 0.4791904761904762
167
+ name: Cosine Mrr@10
168
+ - type: cosine_map@100
169
+ value: 0.452598225251627
170
+ name: Cosine Map@100
171
+ - task:
172
+ type: nano-beir
173
+ name: Nano BEIR
174
+ dataset:
175
+ name: NanoBEIR mean
176
+ type: NanoBEIR_mean
177
+ metrics:
178
+ - type: cosine_accuracy@1
179
+ value: 0.30000000000000004
180
+ name: Cosine Accuracy@1
181
+ - type: cosine_accuracy@3
182
+ value: 0.54
183
+ name: Cosine Accuracy@3
184
+ - type: cosine_accuracy@5
185
+ value: 0.5900000000000001
186
+ name: Cosine Accuracy@5
187
+ - type: cosine_accuracy@10
188
+ value: 0.6599999999999999
189
+ name: Cosine Accuracy@10
190
+ - type: cosine_precision@1
191
+ value: 0.30000000000000004
192
+ name: Cosine Precision@1
193
+ - type: cosine_precision@3
194
+ value: 0.18
195
+ name: Cosine Precision@3
196
+ - type: cosine_precision@5
197
+ value: 0.11800000000000001
198
+ name: Cosine Precision@5
199
+ - type: cosine_precision@10
200
+ value: 0.066
201
+ name: Cosine Precision@10
202
+ - type: cosine_recall@1
203
+ value: 0.29000000000000004
204
+ name: Cosine Recall@1
205
+ - type: cosine_recall@3
206
+ value: 0.515
207
+ name: Cosine Recall@3
208
+ - type: cosine_recall@5
209
+ value: 0.565
210
+ name: Cosine Recall@5
211
+ - type: cosine_recall@10
212
+ value: 0.63
213
+ name: Cosine Recall@10
214
+ - type: cosine_ndcg@10
215
+ value: 0.4694956644991831
216
+ name: Cosine Ndcg@10
217
+ - type: cosine_mrr@10
218
+ value: 0.4293571428571429
219
+ name: Cosine Mrr@10
220
+ - type: cosine_map@100
221
+ value: 0.42139334934743
222
+ name: Cosine Map@100
223
+ ---
224
+
225
+ # SentenceTransformer based on thenlper/gte-small
226
+
227
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-small](https://huggingface.co/thenlper/gte-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
228
+
229
+ ## Model Details
230
+
231
+ ### Model Description
232
+ - **Model Type:** Sentence Transformer
233
+ - **Base model:** [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) <!-- at revision 17e1f347d17fe144873b1201da91788898c639cd -->
234
+ - **Maximum Sequence Length:** 128 tokens
235
+ - **Output Dimensionality:** 384 dimensions
236
+ - **Similarity Function:** Cosine Similarity
237
+ <!-- - **Training Dataset:** Unknown -->
238
+ <!-- - **Language:** Unknown -->
239
+ <!-- - **License:** Unknown -->
240
+
241
+ ### Model Sources
242
+
243
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
244
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
245
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
246
+
247
+ ### Full Model Architecture
248
+
249
+ ```
250
+ SentenceTransformer(
251
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
252
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
253
+ (2): Normalize()
254
+ )
255
+ ```
256
+
257
+ ## Usage
258
+
259
+ ### Direct Usage (Sentence Transformers)
260
+
261
+ First install the Sentence Transformers library:
262
+
263
+ ```bash
264
+ pip install -U sentence-transformers
265
+ ```
266
+
267
+ Then you can load this model and run inference.
268
+ ```python
269
+ from sentence_transformers import SentenceTransformer
270
+
271
+ # Download from the 🤗 Hub
272
+ model = SentenceTransformer("redis/unified-negatives")
273
+ # Run inference
274
+ sentences = [
275
+ 'What is the fastest way to get a PAN card within India?',
276
+ 'What is the fastest way to get a PAN card within India?',
277
+ 'What is the fastest way to get a PAN card outside India?',
278
+ ]
279
+ embeddings = model.encode(sentences)
280
+ print(embeddings.shape)
281
+ # [3, 384]
282
+
283
+ # Get the similarity scores for the embeddings
284
+ similarities = model.similarity(embeddings, embeddings)
285
+ print(similarities)
286
+ # tensor([[1.0000, 1.0000, 0.2943],
287
+ # [1.0000, 1.0000, 0.2943],
288
+ # [0.2943, 0.2943, 1.0000]])
289
+ ```
290
+
291
+ <!--
292
+ ### Direct Usage (Transformers)
293
+
294
+ <details><summary>Click to see the direct usage in Transformers</summary>
295
+
296
+ </details>
297
+ -->
298
+
299
+ <!--
300
+ ### Downstream Usage (Sentence Transformers)
301
+
302
+ You can finetune this model on your own dataset.
303
+
304
+ <details><summary>Click to expand</summary>
305
+
306
+ </details>
307
+ -->
308
+
309
+ <!--
310
+ ### Out-of-Scope Use
311
+
312
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
313
+ -->
314
+
315
+ ## Evaluation
316
+
317
+ ### Metrics
318
+
319
+ #### Information Retrieval
320
+
321
+ * Datasets: `NanoMSMARCO` and `NanoNQ`
322
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
323
+
324
+ | Metric | NanoMSMARCO | NanoNQ |
325
+ |:--------------------|:------------|:-----------|
326
+ | cosine_accuracy@1 | 0.28 | 0.32 |
327
+ | cosine_accuracy@3 | 0.48 | 0.6 |
328
+ | cosine_accuracy@5 | 0.52 | 0.66 |
329
+ | cosine_accuracy@10 | 0.58 | 0.74 |
330
+ | cosine_precision@1 | 0.28 | 0.32 |
331
+ | cosine_precision@3 | 0.16 | 0.2 |
332
+ | cosine_precision@5 | 0.104 | 0.132 |
333
+ | cosine_precision@10 | 0.058 | 0.074 |
334
+ | cosine_recall@1 | 0.28 | 0.3 |
335
+ | cosine_recall@3 | 0.48 | 0.55 |
336
+ | cosine_recall@5 | 0.52 | 0.61 |
337
+ | cosine_recall@10 | 0.58 | 0.68 |
338
+ | **cosine_ndcg@10** | **0.4281** | **0.5109** |
339
+ | cosine_mrr@10 | 0.3795 | 0.4792 |
340
+ | cosine_map@100 | 0.3902 | 0.4526 |
341
+
342
+ #### Nano BEIR
343
+
344
+ * Dataset: `NanoBEIR_mean`
345
+ * Evaluated with [<code>NanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.NanoBEIREvaluator) with these parameters:
346
+ ```json
347
+ {
348
+ "dataset_names": [
349
+ "msmarco",
350
+ "nq"
351
+ ],
352
+ "dataset_id": "lightonai/NanoBEIR-en"
353
+ }
354
+ ```
355
+
356
+ | Metric | Value |
357
+ |:--------------------|:-----------|
358
+ | cosine_accuracy@1 | 0.3 |
359
+ | cosine_accuracy@3 | 0.54 |
360
+ | cosine_accuracy@5 | 0.59 |
361
+ | cosine_accuracy@10 | 0.66 |
362
+ | cosine_precision@1 | 0.3 |
363
+ | cosine_precision@3 | 0.18 |
364
+ | cosine_precision@5 | 0.118 |
365
+ | cosine_precision@10 | 0.066 |
366
+ | cosine_recall@1 | 0.29 |
367
+ | cosine_recall@3 | 0.515 |
368
+ | cosine_recall@5 | 0.565 |
369
+ | cosine_recall@10 | 0.63 |
370
+ | **cosine_ndcg@10** | **0.4695** |
371
+ | cosine_mrr@10 | 0.4294 |
372
+ | cosine_map@100 | 0.4214 |
373
+
374
+ <!--
375
+ ## Bias, Risks and Limitations
376
+
377
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
378
+ -->
379
+
380
+ <!--
381
+ ### Recommendations
382
+
383
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
384
+ -->
385
+
386
+ ## Training Details
387
+
388
+ ### Training Dataset
389
+
390
+ #### Unnamed Dataset
391
+
392
+ * Size: 21,470 training samples
393
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
394
+ * Approximate statistics based on the first 1000 samples:
395
+ | | anchor | positive | negative |
396
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
397
+ | type | string | string | string |
398
+ | details | <ul><li>min: 5 tokens</li><li>mean: 19.91 tokens</li><li>max: 101 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.91 tokens</li><li>max: 101 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 19.91 tokens</li><li>max: 101 tokens</li></ul> |
399
+ * Samples:
400
+ | anchor | positive | negative |
401
+ |:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
402
+ | <code>The pale coloration provides camouflage for the beetle on the light sand.</code> | <code>The pale coloration provides camouflage for the beetle on the light sand.</code> | <code>The pale coloration helps the beetle stand out on the light sand.</code> |
403
+ | <code>It is found from Fennoscandinavia to the Pyrenees , Italy and Greece and from Britain to Russia and Ukraine .</code> | <code>It is found from Fennoscandinavia to the Pyrenees , Italy and Greece and from Britain to Russia and Ukraine .</code> | <code>It is located from Fennoscandinavia to the Pyrenees , Great Britain and Greece and from Italy to Russia and Ukraine .</code> |
404
+ | <code>Is Swami Vivekananda's speech at parliament of world's religions, Chicago overrated in Chicago?</code> | <code>Is Swami Vivekananda's speech at parliament of world's religions, Chicago overrated in Chicago?</code> | <code>Is Swami Vivekananda's speech at parliament of world's religions, Chicago overrated outside Chicago?</code> |
405
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
406
+ ```json
407
+ {
408
+ "scale": 7.0,
409
+ "similarity_fct": "cos_sim",
410
+ "gather_across_devices": false
411
+ }
412
+ ```
413
+
414
+ ### Evaluation Dataset
415
+
416
+ #### Unnamed Dataset
417
+
418
+ * Size: 2,386 evaluation samples
419
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
420
+ * Approximate statistics based on the first 1000 samples:
421
+ | | anchor | positive | negative |
422
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
423
+ | type | string | string | string |
424
+ | details | <ul><li>min: 6 tokens</li><li>mean: 19.42 tokens</li><li>max: 74 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.42 tokens</li><li>max: 74 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 19.41 tokens</li><li>max: 74 tokens</li></ul> |
425
+ * Samples:
426
+ | anchor | positive | negative |
427
+ |:---------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|
428
+ | <code>He died at Fort Edward on August 18 , 1861 , and was buried at the Union Cemetery in Sandy Hill .</code> | <code>He died at Fort Edward on August 18 , 1861 , and was buried at the Union Cemetery in Sandy Hill .</code> | <code>He died at Sandy Hill on August 18 , 1861 , and was buried at the Union Cemetery in Fort Edward .</code> |
429
+ | <code>It was this cooperation which led to the development of the satellite AIS system.</code> | <code>It was this cooperation which led to the development of the satellite AIS system.</code> | <code>It was this cooperation which led to the halting of development of the satellite AIS system.</code> |
430
+ | <code>What is the best field of engineering on campus?</code> | <code>What is the best field of engineering on campus?</code> | <code>What is the best field of engineering off campus?</code> |
431
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
432
+ ```json
433
+ {
434
+ "scale": 7.0,
435
+ "similarity_fct": "cos_sim",
436
+ "gather_across_devices": false
437
+ }
438
+ ```
439
+
440
+ ### Training Hyperparameters
441
+ #### Non-Default Hyperparameters
442
+
443
+ - `eval_strategy`: steps
444
+ - `per_device_train_batch_size`: 128
445
+ - `per_device_eval_batch_size`: 128
446
+ - `learning_rate`: 1e-06
447
+ - `weight_decay`: 0.001
448
+ - `max_steps`: 3000
449
+ - `warmup_ratio`: 0.1
450
+ - `fp16`: True
451
+ - `dataloader_drop_last`: True
452
+ - `dataloader_num_workers`: 1
453
+ - `dataloader_prefetch_factor`: 1
454
+ - `load_best_model_at_end`: True
455
+ - `optim`: adamw_torch
456
+ - `ddp_find_unused_parameters`: False
457
+ - `push_to_hub`: True
458
+ - `hub_model_id`: redis/unified-negatives
459
+ - `eval_on_start`: True
460
+
461
+ #### All Hyperparameters
462
+ <details><summary>Click to expand</summary>
463
+
464
+ - `overwrite_output_dir`: False
465
+ - `do_predict`: False
466
+ - `eval_strategy`: steps
467
+ - `prediction_loss_only`: True
468
+ - `per_device_train_batch_size`: 128
469
+ - `per_device_eval_batch_size`: 128
470
+ - `per_gpu_train_batch_size`: None
471
+ - `per_gpu_eval_batch_size`: None
472
+ - `gradient_accumulation_steps`: 1
473
+ - `eval_accumulation_steps`: None
474
+ - `torch_empty_cache_steps`: None
475
+ - `learning_rate`: 1e-06
476
+ - `weight_decay`: 0.001
477
+ - `adam_beta1`: 0.9
478
+ - `adam_beta2`: 0.999
479
+ - `adam_epsilon`: 1e-08
480
+ - `max_grad_norm`: 1.0
481
+ - `num_train_epochs`: 3.0
482
+ - `max_steps`: 3000
483
+ - `lr_scheduler_type`: linear
484
+ - `lr_scheduler_kwargs`: {}
485
+ - `warmup_ratio`: 0.1
486
+ - `warmup_steps`: 0
487
+ - `log_level`: passive
488
+ - `log_level_replica`: warning
489
+ - `log_on_each_node`: True
490
+ - `logging_nan_inf_filter`: True
491
+ - `save_safetensors`: True
492
+ - `save_on_each_node`: False
493
+ - `save_only_model`: False
494
+ - `restore_callback_states_from_checkpoint`: False
495
+ - `no_cuda`: False
496
+ - `use_cpu`: False
497
+ - `use_mps_device`: False
498
+ - `seed`: 42
499
+ - `data_seed`: None
500
+ - `jit_mode_eval`: False
501
+ - `bf16`: False
502
+ - `fp16`: True
503
+ - `fp16_opt_level`: O1
504
+ - `half_precision_backend`: auto
505
+ - `bf16_full_eval`: False
506
+ - `fp16_full_eval`: False
507
+ - `tf32`: None
508
+ - `local_rank`: 0
509
+ - `ddp_backend`: None
510
+ - `tpu_num_cores`: None
511
+ - `tpu_metrics_debug`: False
512
+ - `debug`: []
513
+ - `dataloader_drop_last`: True
514
+ - `dataloader_num_workers`: 1
515
+ - `dataloader_prefetch_factor`: 1
516
+ - `past_index`: -1
517
+ - `disable_tqdm`: False
518
+ - `remove_unused_columns`: True
519
+ - `label_names`: None
520
+ - `load_best_model_at_end`: True
521
+ - `ignore_data_skip`: False
522
+ - `fsdp`: []
523
+ - `fsdp_min_num_params`: 0
524
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
525
+ - `fsdp_transformer_layer_cls_to_wrap`: None
526
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
527
+ - `parallelism_config`: None
528
+ - `deepspeed`: None
529
+ - `label_smoothing_factor`: 0.0
530
+ - `optim`: adamw_torch
531
+ - `optim_args`: None
532
+ - `adafactor`: False
533
+ - `group_by_length`: False
534
+ - `length_column_name`: length
535
+ - `project`: huggingface
536
+ - `trackio_space_id`: trackio
537
+ - `ddp_find_unused_parameters`: False
538
+ - `ddp_bucket_cap_mb`: None
539
+ - `ddp_broadcast_buffers`: False
540
+ - `dataloader_pin_memory`: True
541
+ - `dataloader_persistent_workers`: False
542
+ - `skip_memory_metrics`: True
543
+ - `use_legacy_prediction_loop`: False
544
+ - `push_to_hub`: True
545
+ - `resume_from_checkpoint`: None
546
+ - `hub_model_id`: redis/unified-negatives
547
+ - `hub_strategy`: every_save
548
+ - `hub_private_repo`: None
549
+ - `hub_always_push`: False
550
+ - `hub_revision`: None
551
+ - `gradient_checkpointing`: False
552
+ - `gradient_checkpointing_kwargs`: None
553
+ - `include_inputs_for_metrics`: False
554
+ - `include_for_metrics`: []
555
+ - `eval_do_concat_batches`: True
556
+ - `fp16_backend`: auto
557
+ - `push_to_hub_model_id`: None
558
+ - `push_to_hub_organization`: None
559
+ - `mp_parameters`:
560
+ - `auto_find_batch_size`: False
561
+ - `full_determinism`: False
562
+ - `torchdynamo`: None
563
+ - `ray_scope`: last
564
+ - `ddp_timeout`: 1800
565
+ - `torch_compile`: False
566
+ - `torch_compile_backend`: None
567
+ - `torch_compile_mode`: None
568
+ - `include_tokens_per_second`: False
569
+ - `include_num_input_tokens_seen`: no
570
+ - `neftune_noise_alpha`: None
571
+ - `optim_target_modules`: None
572
+ - `batch_eval_metrics`: False
573
+ - `eval_on_start`: True
574
+ - `use_liger_kernel`: False
575
+ - `liger_kernel_config`: None
576
+ - `eval_use_gather_object`: False
577
+ - `average_tokens_across_devices`: True
578
+ - `prompts`: None
579
+ - `batch_sampler`: batch_sampler
580
+ - `multi_dataset_batch_sampler`: proportional
581
+ - `router_mapping`: {}
582
+ - `learning_rate_mapping`: {}
583
+
584
+ </details>
585
+
586
+ ### Training Logs
587
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_cosine_ndcg@10 | NanoNQ_cosine_ndcg@10 | NanoBEIR_mean_cosine_ndcg@10 |
588
+ |:-------:|:----:|:-------------:|:---------------:|:--------------------------:|:---------------------:|:----------------------------:|
589
+ | 0 | 0 | - | 3.6734 | 0.6259 | 0.6583 | 0.6421 |
590
+ | 1.4970 | 250 | 3.8677 | 3.3900 | 0.6334 | 0.6510 | 0.6422 |
591
+ | 2.9940 | 500 | 3.188 | 1.8654 | 0.5772 | 0.6252 | 0.6012 |
592
+ | 4.4910 | 750 | 1.4714 | 0.6890 | 0.4032 | 0.5437 | 0.4735 |
593
+ | 5.9880 | 1000 | 0.8535 | 0.5511 | 0.3617 | 0.5197 | 0.4407 |
594
+ | 7.4850 | 1250 | 0.7547 | 0.5268 | 0.3469 | 0.5346 | 0.4407 |
595
+ | 8.9820 | 1500 | 0.716 | 0.5123 | 0.3684 | 0.5223 | 0.4454 |
596
+ | 10.4790 | 1750 | 0.6939 | 0.5039 | 0.3846 | 0.5179 | 0.4512 |
597
+ | 11.9760 | 2000 | 0.6789 | 0.4986 | 0.4120 | 0.5280 | 0.4700 |
598
+ | 13.4731 | 2250 | 0.6681 | 0.4953 | 0.4148 | 0.5189 | 0.4669 |
599
+ | 14.9701 | 2500 | 0.662 | 0.4918 | 0.4224 | 0.5109 | 0.4666 |
600
+ | 16.4671 | 2750 | 0.6575 | 0.4905 | 0.4224 | 0.5109 | 0.4666 |
601
+ | 17.9641 | 3000 | 0.6555 | 0.4900 | 0.4281 | 0.5109 | 0.4695 |
602
+
603
+
604
+ ### Framework Versions
605
+ - Python: 3.10.18
606
+ - Sentence Transformers: 5.2.0
607
+ - Transformers: 4.57.3
608
+ - PyTorch: 2.9.1+cu128
609
+ - Accelerate: 1.12.0
610
+ - Datasets: 2.21.0
611
+ - Tokenizers: 0.22.1
612
+
613
+ ## Citation
614
+
615
+ ### BibTeX
616
+
617
+ #### Sentence Transformers
618
+ ```bibtex
619
+ @inproceedings{reimers-2019-sentence-bert,
620
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
621
+ author = "Reimers, Nils and Gurevych, Iryna",
622
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
623
+ month = "11",
624
+ year = "2019",
625
+ publisher = "Association for Computational Linguistics",
626
+ url = "https://arxiv.org/abs/1908.10084",
627
+ }
628
+ ```
629
+
630
+ #### MultipleNegativesRankingLoss
631
+ ```bibtex
632
+ @misc{henderson2017efficient,
633
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
634
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
635
+ year={2017},
636
+ eprint={1705.00652},
637
+ archivePrefix={arXiv},
638
+ primaryClass={cs.CL}
639
+ }
640
+ ```
641
+
642
+ <!--
643
+ ## Glossary
644
+
645
+ *Clearly define terms in order to be accessible across audiences.*
646
+ -->
647
+
648
+ <!--
649
+ ## Model Card Authors
650
+
651
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
652
+ -->
653
+
654
+ <!--
655
+ ## Model Card Contact
656
+
657
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
658
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.3",
6
+ "pytorch": "2.9.1+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }