ashercn97 commited on
Commit
311e0e3
·
verified ·
1 Parent(s): 9a1f826

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +1944 -0
  2. config.json +43 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +35 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +48 -0
README.md ADDED
@@ -0,0 +1,1944 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Linq-AI-Research/Linq-Embed-Mistral
4
+ tags:
5
+ - bnb-my-repo
6
+ - mteb
7
+ - transformers
8
+ - sentence-transformers
9
+ model-index:
10
+ - name: Linq-Embed-Mistral
11
+ results:
12
+ - task:
13
+ type: Classification
14
+ dataset:
15
+ type: mteb/amazon_counterfactual
16
+ name: MTEB AmazonCounterfactualClassification (en)
17
+ config: en
18
+ split: test
19
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
20
+ metrics:
21
+ - type: accuracy
22
+ value: 84.43283582089552
23
+ - type: ap
24
+ value: 50.39222584035829
25
+ - type: f1
26
+ value: 78.47906270064071
27
+ - task:
28
+ type: Classification
29
+ dataset:
30
+ type: mteb/amazon_polarity
31
+ name: MTEB AmazonPolarityClassification
32
+ config: default
33
+ split: test
34
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
35
+ metrics:
36
+ - type: accuracy
37
+ value: 95.70445
38
+ - type: ap
39
+ value: 94.28273900595173
40
+ - type: f1
41
+ value: 95.70048412173735
42
+ - task:
43
+ type: Classification
44
+ dataset:
45
+ type: mteb/amazon_reviews_multi
46
+ name: MTEB AmazonReviewsClassification (en)
47
+ config: en
48
+ split: test
49
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
50
+ metrics:
51
+ - type: accuracy
52
+ value: 57.644000000000005
53
+ - type: f1
54
+ value: 56.993648296704876
55
+ - task:
56
+ type: Retrieval
57
+ dataset:
58
+ type: mteb/arguana
59
+ name: MTEB ArguAna
60
+ config: default
61
+ split: test
62
+ revision: c22ab2a51041ffd869aaddef7af8d8215647e41a
63
+ metrics:
64
+ - type: map_at_1
65
+ value: 45.804
66
+ - type: map_at_10
67
+ value: 61.742
68
+ - type: map_at_100
69
+ value: 62.07899999999999
70
+ - type: map_at_1000
71
+ value: 62.08
72
+ - type: map_at_3
73
+ value: 57.717
74
+ - type: map_at_5
75
+ value: 60.27
76
+ - type: mrr_at_1
77
+ value: 47.226
78
+ - type: mrr_at_10
79
+ value: 62.256
80
+ - type: mrr_at_100
81
+ value: 62.601
82
+ - type: mrr_at_1000
83
+ value: 62.601
84
+ - type: mrr_at_3
85
+ value: 58.203
86
+ - type: mrr_at_5
87
+ value: 60.767
88
+ - type: ndcg_at_1
89
+ value: 45.804
90
+ - type: ndcg_at_10
91
+ value: 69.649
92
+ - type: ndcg_at_100
93
+ value: 70.902
94
+ - type: ndcg_at_1000
95
+ value: 70.91199999999999
96
+ - type: ndcg_at_3
97
+ value: 61.497
98
+ - type: ndcg_at_5
99
+ value: 66.097
100
+ - type: precision_at_1
101
+ value: 45.804
102
+ - type: precision_at_10
103
+ value: 9.452
104
+ - type: precision_at_100
105
+ value: 0.996
106
+ - type: precision_at_1000
107
+ value: 0.1
108
+ - type: precision_at_3
109
+ value: 24.135
110
+ - type: precision_at_5
111
+ value: 16.714000000000002
112
+ - type: recall_at_1
113
+ value: 45.804
114
+ - type: recall_at_10
115
+ value: 94.523
116
+ - type: recall_at_100
117
+ value: 99.57300000000001
118
+ - type: recall_at_1000
119
+ value: 99.644
120
+ - type: recall_at_3
121
+ value: 72.404
122
+ - type: recall_at_5
123
+ value: 83.57
124
+ - task:
125
+ type: Clustering
126
+ dataset:
127
+ type: mteb/arxiv-clustering-p2p
128
+ name: MTEB ArxivClusteringP2P
129
+ config: default
130
+ split: test
131
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
132
+ metrics:
133
+ - type: v_measure
134
+ value: 51.47612678878609
135
+ - task:
136
+ type: Clustering
137
+ dataset:
138
+ type: mteb/arxiv-clustering-s2s
139
+ name: MTEB ArxivClusteringS2S
140
+ config: default
141
+ split: test
142
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
143
+ metrics:
144
+ - type: v_measure
145
+ value: 47.2977392340418
146
+ - task:
147
+ type: Reranking
148
+ dataset:
149
+ type: mteb/askubuntudupquestions-reranking
150
+ name: MTEB AskUbuntuDupQuestions
151
+ config: default
152
+ split: test
153
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
154
+ metrics:
155
+ - type: map
156
+ value: 66.82016765243456
157
+ - type: mrr
158
+ value: 79.55227982236292
159
+ - task:
160
+ type: STS
161
+ dataset:
162
+ type: mteb/biosses-sts
163
+ name: MTEB BIOSSES
164
+ config: default
165
+ split: test
166
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
167
+ metrics:
168
+ - type: cos_sim_pearson
169
+ value: 89.15068664186332
170
+ - type: cos_sim_spearman
171
+ value: 86.4013663041054
172
+ - type: euclidean_pearson
173
+ value: 87.36391302921588
174
+ - type: euclidean_spearman
175
+ value: 86.4013663041054
176
+ - type: manhattan_pearson
177
+ value: 87.46116676558589
178
+ - type: manhattan_spearman
179
+ value: 86.78149544753352
180
+ - task:
181
+ type: Classification
182
+ dataset:
183
+ type: mteb/banking77
184
+ name: MTEB Banking77Classification
185
+ config: default
186
+ split: test
187
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
188
+ metrics:
189
+ - type: accuracy
190
+ value: 87.88311688311688
191
+ - type: f1
192
+ value: 87.82368154811464
193
+ - task:
194
+ type: Clustering
195
+ dataset:
196
+ type: mteb/biorxiv-clustering-p2p
197
+ name: MTEB BiorxivClusteringP2P
198
+ config: default
199
+ split: test
200
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
201
+ metrics:
202
+ - type: v_measure
203
+ value: 42.72860396750569
204
+ - task:
205
+ type: Clustering
206
+ dataset:
207
+ type: mteb/biorxiv-clustering-s2s
208
+ name: MTEB BiorxivClusteringS2S
209
+ config: default
210
+ split: test
211
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
212
+ metrics:
213
+ - type: v_measure
214
+ value: 39.58412067938718
215
+ - task:
216
+ type: Retrieval
217
+ dataset:
218
+ type: mteb/cqadupstack
219
+ name: MTEB CQADupstackRetrieval
220
+ config: default
221
+ split: test
222
+ revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
223
+ metrics:
224
+ - type: map_at_1
225
+ value: 30.082666666666665
226
+ - type: map_at_10
227
+ value: 41.13875
228
+ - type: map_at_100
229
+ value: 42.45525
230
+ - type: map_at_1000
231
+ value: 42.561249999999994
232
+ - type: map_at_3
233
+ value: 37.822750000000006
234
+ - type: map_at_5
235
+ value: 39.62658333333333
236
+ - type: mrr_at_1
237
+ value: 35.584
238
+ - type: mrr_at_10
239
+ value: 45.4675
240
+ - type: mrr_at_100
241
+ value: 46.31016666666667
242
+ - type: mrr_at_1000
243
+ value: 46.35191666666666
244
+ - type: mrr_at_3
245
+ value: 42.86674999999999
246
+ - type: mrr_at_5
247
+ value: 44.31341666666666
248
+ - type: ndcg_at_1
249
+ value: 35.584
250
+ - type: ndcg_at_10
251
+ value: 47.26516666666667
252
+ - type: ndcg_at_100
253
+ value: 52.49108333333332
254
+ - type: ndcg_at_1000
255
+ value: 54.24575
256
+ - type: ndcg_at_3
257
+ value: 41.83433333333334
258
+ - type: ndcg_at_5
259
+ value: 44.29899999999999
260
+ - type: precision_at_1
261
+ value: 35.584
262
+ - type: precision_at_10
263
+ value: 8.390333333333334
264
+ - type: precision_at_100
265
+ value: 1.2941666666666667
266
+ - type: precision_at_1000
267
+ value: 0.16308333333333336
268
+ - type: precision_at_3
269
+ value: 19.414583333333333
270
+ - type: precision_at_5
271
+ value: 13.751
272
+ - type: recall_at_1
273
+ value: 30.082666666666665
274
+ - type: recall_at_10
275
+ value: 60.88875
276
+ - type: recall_at_100
277
+ value: 83.35141666666667
278
+ - type: recall_at_1000
279
+ value: 95.0805
280
+ - type: recall_at_3
281
+ value: 45.683749999999996
282
+ - type: recall_at_5
283
+ value: 52.08208333333333
284
+ - task:
285
+ type: Retrieval
286
+ dataset:
287
+ type: mteb/climate-fever
288
+ name: MTEB ClimateFEVER
289
+ config: default
290
+ split: test
291
+ revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380
292
+ metrics:
293
+ - type: map_at_1
294
+ value: 16.747
295
+ - type: map_at_10
296
+ value: 29.168
297
+ - type: map_at_100
298
+ value: 31.304
299
+ - type: map_at_1000
300
+ value: 31.496000000000002
301
+ - type: map_at_3
302
+ value: 24.57
303
+ - type: map_at_5
304
+ value: 26.886
305
+ - type: mrr_at_1
306
+ value: 37.524
307
+ - type: mrr_at_10
308
+ value: 50.588
309
+ - type: mrr_at_100
310
+ value: 51.28
311
+ - type: mrr_at_1000
312
+ value: 51.29899999999999
313
+ - type: mrr_at_3
314
+ value: 47.438
315
+ - type: mrr_at_5
316
+ value: 49.434
317
+ - type: ndcg_at_1
318
+ value: 37.524
319
+ - type: ndcg_at_10
320
+ value: 39.11
321
+ - type: ndcg_at_100
322
+ value: 46.373999999999995
323
+ - type: ndcg_at_1000
324
+ value: 49.370999999999995
325
+ - type: ndcg_at_3
326
+ value: 32.964
327
+ - type: ndcg_at_5
328
+ value: 35.028
329
+ - type: precision_at_1
330
+ value: 37.524
331
+ - type: precision_at_10
332
+ value: 12.137
333
+ - type: precision_at_100
334
+ value: 1.9929999999999999
335
+ - type: precision_at_1000
336
+ value: 0.256
337
+ - type: precision_at_3
338
+ value: 24.886
339
+ - type: precision_at_5
340
+ value: 18.762
341
+ - type: recall_at_1
342
+ value: 16.747
343
+ - type: recall_at_10
344
+ value: 45.486
345
+ - type: recall_at_100
346
+ value: 69.705
347
+ - type: recall_at_1000
348
+ value: 86.119
349
+ - type: recall_at_3
350
+ value: 30.070999999999998
351
+ - type: recall_at_5
352
+ value: 36.565
353
+ - task:
354
+ type: Retrieval
355
+ dataset:
356
+ type: mteb/dbpedia
357
+ name: MTEB DBPedia
358
+ config: default
359
+ split: test
360
+ revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659
361
+ metrics:
362
+ - type: map_at_1
363
+ value: 10.495000000000001
364
+ - type: map_at_10
365
+ value: 24.005000000000003
366
+ - type: map_at_100
367
+ value: 34.37
368
+ - type: map_at_1000
369
+ value: 36.268
370
+ - type: map_at_3
371
+ value: 16.694
372
+ - type: map_at_5
373
+ value: 19.845
374
+ - type: mrr_at_1
375
+ value: 75.5
376
+ - type: mrr_at_10
377
+ value: 82.458
378
+ - type: mrr_at_100
379
+ value: 82.638
380
+ - type: mrr_at_1000
381
+ value: 82.64
382
+ - type: mrr_at_3
383
+ value: 81.25
384
+ - type: mrr_at_5
385
+ value: 82.125
386
+ - type: ndcg_at_1
387
+ value: 64.625
388
+ - type: ndcg_at_10
389
+ value: 51.322
390
+ - type: ndcg_at_100
391
+ value: 55.413999999999994
392
+ - type: ndcg_at_1000
393
+ value: 62.169
394
+ - type: ndcg_at_3
395
+ value: 56.818999999999996
396
+ - type: ndcg_at_5
397
+ value: 54.32900000000001
398
+ - type: precision_at_1
399
+ value: 75.5
400
+ - type: precision_at_10
401
+ value: 40.849999999999994
402
+ - type: precision_at_100
403
+ value: 12.882
404
+ - type: precision_at_1000
405
+ value: 2.394
406
+ - type: precision_at_3
407
+ value: 59.667
408
+ - type: precision_at_5
409
+ value: 52.2
410
+ - type: recall_at_1
411
+ value: 10.495000000000001
412
+ - type: recall_at_10
413
+ value: 29.226000000000003
414
+ - type: recall_at_100
415
+ value: 59.614
416
+ - type: recall_at_1000
417
+ value: 81.862
418
+ - type: recall_at_3
419
+ value: 17.97
420
+ - type: recall_at_5
421
+ value: 22.438
422
+ - task:
423
+ type: Classification
424
+ dataset:
425
+ type: mteb/emotion
426
+ name: MTEB EmotionClassification
427
+ config: default
428
+ split: test
429
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
430
+ metrics:
431
+ - type: accuracy
432
+ value: 51.82
433
+ - type: f1
434
+ value: 47.794956731921054
435
+ - task:
436
+ type: Retrieval
437
+ dataset:
438
+ type: mteb/fever
439
+ name: MTEB FEVER
440
+ config: default
441
+ split: test
442
+ revision: bea83ef9e8fb933d90a2f1d5515737465d613e12
443
+ metrics:
444
+ - type: map_at_1
445
+ value: 82.52199999999999
446
+ - type: map_at_10
447
+ value: 89.794
448
+ - type: map_at_100
449
+ value: 89.962
450
+ - type: map_at_1000
451
+ value: 89.972
452
+ - type: map_at_3
453
+ value: 88.95100000000001
454
+ - type: map_at_5
455
+ value: 89.524
456
+ - type: mrr_at_1
457
+ value: 88.809
458
+ - type: mrr_at_10
459
+ value: 93.554
460
+ - type: mrr_at_100
461
+ value: 93.577
462
+ - type: mrr_at_1000
463
+ value: 93.577
464
+ - type: mrr_at_3
465
+ value: 93.324
466
+ - type: mrr_at_5
467
+ value: 93.516
468
+ - type: ndcg_at_1
469
+ value: 88.809
470
+ - type: ndcg_at_10
471
+ value: 92.419
472
+ - type: ndcg_at_100
473
+ value: 92.95
474
+ - type: ndcg_at_1000
475
+ value: 93.10000000000001
476
+ - type: ndcg_at_3
477
+ value: 91.45299999999999
478
+ - type: ndcg_at_5
479
+ value: 92.05
480
+ - type: precision_at_1
481
+ value: 88.809
482
+ - type: precision_at_10
483
+ value: 10.911999999999999
484
+ - type: precision_at_100
485
+ value: 1.143
486
+ - type: precision_at_1000
487
+ value: 0.117
488
+ - type: precision_at_3
489
+ value: 34.623
490
+ - type: precision_at_5
491
+ value: 21.343999999999998
492
+ - type: recall_at_1
493
+ value: 82.52199999999999
494
+ - type: recall_at_10
495
+ value: 96.59400000000001
496
+ - type: recall_at_100
497
+ value: 98.55699999999999
498
+ - type: recall_at_1000
499
+ value: 99.413
500
+ - type: recall_at_3
501
+ value: 94.02199999999999
502
+ - type: recall_at_5
503
+ value: 95.582
504
+ - task:
505
+ type: Retrieval
506
+ dataset:
507
+ type: mteb/fiqa
508
+ name: MTEB FiQA2018
509
+ config: default
510
+ split: test
511
+ revision: 27a168819829fe9bcd655c2df245fb19452e8e06
512
+ metrics:
513
+ - type: map_at_1
514
+ value: 32.842
515
+ - type: map_at_10
516
+ value: 53.147
517
+ - type: map_at_100
518
+ value: 55.265
519
+ - type: map_at_1000
520
+ value: 55.37
521
+ - type: map_at_3
522
+ value: 46.495
523
+ - type: map_at_5
524
+ value: 50.214999999999996
525
+ - type: mrr_at_1
526
+ value: 61.574
527
+ - type: mrr_at_10
528
+ value: 68.426
529
+ - type: mrr_at_100
530
+ value: 68.935
531
+ - type: mrr_at_1000
532
+ value: 68.95400000000001
533
+ - type: mrr_at_3
534
+ value: 66.307
535
+ - type: mrr_at_5
536
+ value: 67.611
537
+ - type: ndcg_at_1
538
+ value: 61.574
539
+ - type: ndcg_at_10
540
+ value: 61.205
541
+ - type: ndcg_at_100
542
+ value: 67.25999999999999
543
+ - type: ndcg_at_1000
544
+ value: 68.657
545
+ - type: ndcg_at_3
546
+ value: 56.717
547
+ - type: ndcg_at_5
548
+ value: 58.196999999999996
549
+ - type: precision_at_1
550
+ value: 61.574
551
+ - type: precision_at_10
552
+ value: 16.852
553
+ - type: precision_at_100
554
+ value: 2.33
555
+ - type: precision_at_1000
556
+ value: 0.256
557
+ - type: precision_at_3
558
+ value: 37.5
559
+ - type: precision_at_5
560
+ value: 27.468999999999998
561
+ - type: recall_at_1
562
+ value: 32.842
563
+ - type: recall_at_10
564
+ value: 68.157
565
+ - type: recall_at_100
566
+ value: 89.5
567
+ - type: recall_at_1000
568
+ value: 97.68599999999999
569
+ - type: recall_at_3
570
+ value: 50.783
571
+ - type: recall_at_5
572
+ value: 58.672000000000004
573
+ - task:
574
+ type: Retrieval
575
+ dataset:
576
+ type: mteb/hotpotqa
577
+ name: MTEB HotpotQA
578
+ config: default
579
+ split: test
580
+ revision: ab518f4d6fcca38d87c25209f94beba119d02014
581
+ metrics:
582
+ - type: map_at_1
583
+ value: 39.068000000000005
584
+ - type: map_at_10
585
+ value: 69.253
586
+ - type: map_at_100
587
+ value: 70.036
588
+ - type: map_at_1000
589
+ value: 70.081
590
+ - type: map_at_3
591
+ value: 65.621
592
+ - type: map_at_5
593
+ value: 67.976
594
+ - type: mrr_at_1
595
+ value: 78.13600000000001
596
+ - type: mrr_at_10
597
+ value: 84.328
598
+ - type: mrr_at_100
599
+ value: 84.515
600
+ - type: mrr_at_1000
601
+ value: 84.52300000000001
602
+ - type: mrr_at_3
603
+ value: 83.52199999999999
604
+ - type: mrr_at_5
605
+ value: 84.019
606
+ - type: ndcg_at_1
607
+ value: 78.13600000000001
608
+ - type: ndcg_at_10
609
+ value: 76.236
610
+ - type: ndcg_at_100
611
+ value: 78.891
612
+ - type: ndcg_at_1000
613
+ value: 79.73400000000001
614
+ - type: ndcg_at_3
615
+ value: 71.258
616
+ - type: ndcg_at_5
617
+ value: 74.129
618
+ - type: precision_at_1
619
+ value: 78.13600000000001
620
+ - type: precision_at_10
621
+ value: 16.347
622
+ - type: precision_at_100
623
+ value: 1.839
624
+ - type: precision_at_1000
625
+ value: 0.19499999999999998
626
+ - type: precision_at_3
627
+ value: 47.189
628
+ - type: precision_at_5
629
+ value: 30.581999999999997
630
+ - type: recall_at_1
631
+ value: 39.068000000000005
632
+ - type: recall_at_10
633
+ value: 81.735
634
+ - type: recall_at_100
635
+ value: 91.945
636
+ - type: recall_at_1000
637
+ value: 97.44800000000001
638
+ - type: recall_at_3
639
+ value: 70.783
640
+ - type: recall_at_5
641
+ value: 76.455
642
+ - task:
643
+ type: Classification
644
+ dataset:
645
+ type: mteb/imdb
646
+ name: MTEB ImdbClassification
647
+ config: default
648
+ split: test
649
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
650
+ metrics:
651
+ - type: accuracy
652
+ value: 94.7764
653
+ - type: ap
654
+ value: 92.67841294818406
655
+ - type: f1
656
+ value: 94.77375157383646
657
+ - task:
658
+ type: Retrieval
659
+ dataset:
660
+ type: mteb/msmarco
661
+ name: MTEB MSMARCO
662
+ config: default
663
+ split: dev
664
+ revision: c5a29a104738b98a9e76336939199e264163d4a0
665
+ metrics:
666
+ - type: map_at_1
667
+ value: 24.624
668
+ - type: map_at_10
669
+ value: 37.861
670
+ - type: map_at_100
671
+ value: 39.011
672
+ - type: map_at_1000
673
+ value: 39.052
674
+ - type: map_at_3
675
+ value: 33.76
676
+ - type: map_at_5
677
+ value: 36.153
678
+ - type: mrr_at_1
679
+ value: 25.358000000000004
680
+ - type: mrr_at_10
681
+ value: 38.5
682
+ - type: mrr_at_100
683
+ value: 39.572
684
+ - type: mrr_at_1000
685
+ value: 39.607
686
+ - type: mrr_at_3
687
+ value: 34.491
688
+ - type: mrr_at_5
689
+ value: 36.83
690
+ - type: ndcg_at_1
691
+ value: 25.358000000000004
692
+ - type: ndcg_at_10
693
+ value: 45.214999999999996
694
+ - type: ndcg_at_100
695
+ value: 50.56
696
+ - type: ndcg_at_1000
697
+ value: 51.507999999999996
698
+ - type: ndcg_at_3
699
+ value: 36.925999999999995
700
+ - type: ndcg_at_5
701
+ value: 41.182
702
+ - type: precision_at_1
703
+ value: 25.358000000000004
704
+ - type: precision_at_10
705
+ value: 7.090000000000001
706
+ - type: precision_at_100
707
+ value: 0.9740000000000001
708
+ - type: precision_at_1000
709
+ value: 0.106
710
+ - type: precision_at_3
711
+ value: 15.697
712
+ - type: precision_at_5
713
+ value: 11.599
714
+ - type: recall_at_1
715
+ value: 24.624
716
+ - type: recall_at_10
717
+ value: 67.78699999999999
718
+ - type: recall_at_100
719
+ value: 92.11200000000001
720
+ - type: recall_at_1000
721
+ value: 99.208
722
+ - type: recall_at_3
723
+ value: 45.362
724
+ - type: recall_at_5
725
+ value: 55.58
726
+ - task:
727
+ type: Classification
728
+ dataset:
729
+ type: mteb/mtop_domain
730
+ name: MTEB MTOPDomainClassification (en)
731
+ config: en
732
+ split: test
733
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
734
+ metrics:
735
+ - type: accuracy
736
+ value: 96.83310533515733
737
+ - type: f1
738
+ value: 96.57069781347995
739
+ - task:
740
+ type: Classification
741
+ dataset:
742
+ type: mteb/mtop_intent
743
+ name: MTEB MTOPIntentClassification (en)
744
+ config: en
745
+ split: test
746
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
747
+ metrics:
748
+ - type: accuracy
749
+ value: 89.5690834473324
750
+ - type: f1
751
+ value: 73.7275204564728
752
+ - task:
753
+ type: Classification
754
+ dataset:
755
+ type: mteb/amazon_massive_intent
756
+ name: MTEB MassiveIntentClassification (en)
757
+ config: en
758
+ split: test
759
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
760
+ metrics:
761
+ - type: accuracy
762
+ value: 82.67316745124411
763
+ - type: f1
764
+ value: 79.70626515721662
765
+ - task:
766
+ type: Classification
767
+ dataset:
768
+ type: mteb/amazon_massive_scenario
769
+ name: MTEB MassiveScenarioClassification (en)
770
+ config: en
771
+ split: test
772
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
773
+ metrics:
774
+ - type: accuracy
775
+ value: 85.01344989912575
776
+ - type: f1
777
+ value: 84.45181022816965
778
+ - task:
779
+ type: Clustering
780
+ dataset:
781
+ type: mteb/medrxiv-clustering-p2p
782
+ name: MTEB MedrxivClusteringP2P
783
+ config: default
784
+ split: test
785
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
786
+ metrics:
787
+ - type: v_measure
788
+ value: 37.843426126777295
789
+ - task:
790
+ type: Clustering
791
+ dataset:
792
+ type: mteb/medrxiv-clustering-s2s
793
+ name: MTEB MedrxivClusteringS2S
794
+ config: default
795
+ split: test
796
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
797
+ metrics:
798
+ - type: v_measure
799
+ value: 36.651728547241476
800
+ - task:
801
+ type: Reranking
802
+ dataset:
803
+ type: mteb/mind_small
804
+ name: MTEB MindSmallReranking
805
+ config: default
806
+ split: test
807
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
808
+ metrics:
809
+ - type: map
810
+ value: 32.05750522793288
811
+ - type: mrr
812
+ value: 33.28067556869468
813
+ - task:
814
+ type: Retrieval
815
+ dataset:
816
+ type: mteb/nfcorpus
817
+ name: MTEB NFCorpus
818
+ config: default
819
+ split: test
820
+ revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
821
+ metrics:
822
+ - type: map_at_1
823
+ value: 6.744
824
+ - type: map_at_10
825
+ value: 16.235
826
+ - type: map_at_100
827
+ value: 20.767
828
+ - type: map_at_1000
829
+ value: 22.469
830
+ - type: map_at_3
831
+ value: 11.708
832
+ - type: map_at_5
833
+ value: 13.924
834
+ - type: mrr_at_1
835
+ value: 55.728
836
+ - type: mrr_at_10
837
+ value: 63.869
838
+ - type: mrr_at_100
839
+ value: 64.322
840
+ - type: mrr_at_1000
841
+ value: 64.342
842
+ - type: mrr_at_3
843
+ value: 62.022999999999996
844
+ - type: mrr_at_5
845
+ value: 63.105999999999995
846
+ - type: ndcg_at_1
847
+ value: 53.096
848
+ - type: ndcg_at_10
849
+ value: 41.618
850
+ - type: ndcg_at_100
851
+ value: 38.562999999999995
852
+ - type: ndcg_at_1000
853
+ value: 47.006
854
+ - type: ndcg_at_3
855
+ value: 47.657
856
+ - type: ndcg_at_5
857
+ value: 45.562999999999995
858
+ - type: precision_at_1
859
+ value: 55.108000000000004
860
+ - type: precision_at_10
861
+ value: 30.464000000000002
862
+ - type: precision_at_100
863
+ value: 9.737
864
+ - type: precision_at_1000
865
+ value: 2.2720000000000002
866
+ - type: precision_at_3
867
+ value: 44.376
868
+ - type: precision_at_5
869
+ value: 39.505
870
+ - type: recall_at_1
871
+ value: 6.744
872
+ - type: recall_at_10
873
+ value: 21.11
874
+ - type: recall_at_100
875
+ value: 39.69
876
+ - type: recall_at_1000
877
+ value: 70.44
878
+ - type: recall_at_3
879
+ value: 13.120000000000001
880
+ - type: recall_at_5
881
+ value: 16.669
882
+ - task:
883
+ type: Retrieval
884
+ dataset:
885
+ type: mteb/nq
886
+ name: MTEB NQ
887
+ config: default
888
+ split: test
889
+ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31
890
+ metrics:
891
+ - type: map_at_1
892
+ value: 46.263
893
+ - type: map_at_10
894
+ value: 63.525
895
+ - type: map_at_100
896
+ value: 64.142
897
+ - type: map_at_1000
898
+ value: 64.14800000000001
899
+ - type: map_at_3
900
+ value: 59.653
901
+ - type: map_at_5
902
+ value: 62.244
903
+ - type: mrr_at_1
904
+ value: 51.796
905
+ - type: mrr_at_10
906
+ value: 65.764
907
+ - type: mrr_at_100
908
+ value: 66.155
909
+ - type: mrr_at_1000
910
+ value: 66.158
911
+ - type: mrr_at_3
912
+ value: 63.05500000000001
913
+ - type: mrr_at_5
914
+ value: 64.924
915
+ - type: ndcg_at_1
916
+ value: 51.766999999999996
917
+ - type: ndcg_at_10
918
+ value: 70.626
919
+ - type: ndcg_at_100
920
+ value: 72.905
921
+ - type: ndcg_at_1000
922
+ value: 73.021
923
+ - type: ndcg_at_3
924
+ value: 63.937999999999995
925
+ - type: ndcg_at_5
926
+ value: 68.00699999999999
927
+ - type: precision_at_1
928
+ value: 51.766999999999996
929
+ - type: precision_at_10
930
+ value: 10.768
931
+ - type: precision_at_100
932
+ value: 1.203
933
+ - type: precision_at_1000
934
+ value: 0.121
935
+ - type: precision_at_3
936
+ value: 28.409000000000002
937
+ - type: precision_at_5
938
+ value: 19.502
939
+ - type: recall_at_1
940
+ value: 46.263
941
+ - type: recall_at_10
942
+ value: 89.554
943
+ - type: recall_at_100
944
+ value: 98.914
945
+ - type: recall_at_1000
946
+ value: 99.754
947
+ - type: recall_at_3
948
+ value: 72.89999999999999
949
+ - type: recall_at_5
950
+ value: 82.1
951
+ - task:
952
+ type: Retrieval
953
+ dataset:
954
+ type: mteb/quora
955
+ name: MTEB QuoraRetrieval
956
+ config: default
957
+ split: test
958
+ revision: e4e08e0b7dbe3c8700f0daef558ff32256715259
959
+ metrics:
960
+ - type: map_at_1
961
+ value: 72.748
962
+ - type: map_at_10
963
+ value: 86.87700000000001
964
+ - type: map_at_100
965
+ value: 87.46199999999999
966
+ - type: map_at_1000
967
+ value: 87.47399999999999
968
+ - type: map_at_3
969
+ value: 83.95700000000001
970
+ - type: map_at_5
971
+ value: 85.82300000000001
972
+ - type: mrr_at_1
973
+ value: 83.62
974
+ - type: mrr_at_10
975
+ value: 89.415
976
+ - type: mrr_at_100
977
+ value: 89.484
978
+ - type: mrr_at_1000
979
+ value: 89.484
980
+ - type: mrr_at_3
981
+ value: 88.633
982
+ - type: mrr_at_5
983
+ value: 89.176
984
+ - type: ndcg_at_1
985
+ value: 83.62
986
+ - type: ndcg_at_10
987
+ value: 90.27
988
+ - type: ndcg_at_100
989
+ value: 91.23599999999999
990
+ - type: ndcg_at_1000
991
+ value: 91.293
992
+ - type: ndcg_at_3
993
+ value: 87.69500000000001
994
+ - type: ndcg_at_5
995
+ value: 89.171
996
+ - type: precision_at_1
997
+ value: 83.62
998
+ - type: precision_at_10
999
+ value: 13.683
1000
+ - type: precision_at_100
1001
+ value: 1.542
1002
+ - type: precision_at_1000
1003
+ value: 0.157
1004
+ - type: precision_at_3
1005
+ value: 38.363
1006
+ - type: precision_at_5
1007
+ value: 25.196
1008
+ - type: recall_at_1
1009
+ value: 72.748
1010
+ - type: recall_at_10
1011
+ value: 96.61699999999999
1012
+ - type: recall_at_100
1013
+ value: 99.789
1014
+ - type: recall_at_1000
1015
+ value: 99.997
1016
+ - type: recall_at_3
1017
+ value: 89.21
1018
+ - type: recall_at_5
1019
+ value: 93.418
1020
+ - task:
1021
+ type: Clustering
1022
+ dataset:
1023
+ type: mteb/reddit-clustering
1024
+ name: MTEB RedditClustering
1025
+ config: default
1026
+ split: test
1027
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1028
+ metrics:
1029
+ - type: v_measure
1030
+ value: 61.51909029379199
1031
+ - task:
1032
+ type: Clustering
1033
+ dataset:
1034
+ type: mteb/reddit-clustering-p2p
1035
+ name: MTEB RedditClusteringP2P
1036
+ config: default
1037
+ split: test
1038
+ revision: 385e3cb46b4cfa89021f56c4380204149d0efe33
1039
+ metrics:
1040
+ - type: v_measure
1041
+ value: 68.24483162045645
1042
+ - task:
1043
+ type: Retrieval
1044
+ dataset:
1045
+ type: mteb/scidocs
1046
+ name: MTEB SCIDOCS
1047
+ config: default
1048
+ split: test
1049
+ revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88
1050
+ metrics:
1051
+ - type: map_at_1
1052
+ value: 4.793
1053
+ - type: map_at_10
1054
+ value: 13.092
1055
+ - type: map_at_100
1056
+ value: 15.434000000000001
1057
+ - type: map_at_1000
1058
+ value: 15.748999999999999
1059
+ - type: map_at_3
1060
+ value: 9.139
1061
+ - type: map_at_5
1062
+ value: 11.033
1063
+ - type: mrr_at_1
1064
+ value: 23.599999999999998
1065
+ - type: mrr_at_10
1066
+ value: 35.892
1067
+ - type: mrr_at_100
1068
+ value: 36.962
1069
+ - type: mrr_at_1000
1070
+ value: 37.009
1071
+ - type: mrr_at_3
1072
+ value: 32.550000000000004
1073
+ - type: mrr_at_5
1074
+ value: 34.415
1075
+ - type: ndcg_at_1
1076
+ value: 23.599999999999998
1077
+ - type: ndcg_at_10
1078
+ value: 21.932
1079
+ - type: ndcg_at_100
1080
+ value: 30.433
1081
+ - type: ndcg_at_1000
1082
+ value: 35.668
1083
+ - type: ndcg_at_3
1084
+ value: 20.483999999999998
1085
+ - type: ndcg_at_5
1086
+ value: 17.964
1087
+ - type: precision_at_1
1088
+ value: 23.599999999999998
1089
+ - type: precision_at_10
1090
+ value: 11.63
1091
+ - type: precision_at_100
1092
+ value: 2.383
1093
+ - type: precision_at_1000
1094
+ value: 0.363
1095
+ - type: precision_at_3
1096
+ value: 19.567
1097
+ - type: precision_at_5
1098
+ value: 16.06
1099
+ - type: recall_at_1
1100
+ value: 4.793
1101
+ - type: recall_at_10
1102
+ value: 23.558
1103
+ - type: recall_at_100
1104
+ value: 48.376999999999995
1105
+ - type: recall_at_1000
1106
+ value: 73.75699999999999
1107
+ - type: recall_at_3
1108
+ value: 11.903
1109
+ - type: recall_at_5
1110
+ value: 16.278000000000002
1111
+ - task:
1112
+ type: STS
1113
+ dataset:
1114
+ type: mteb/sickr-sts
1115
+ name: MTEB SICK-R
1116
+ config: default
1117
+ split: test
1118
+ revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
1119
+ metrics:
1120
+ - type: cos_sim_pearson
1121
+ value: 87.31937967632581
1122
+ - type: cos_sim_spearman
1123
+ value: 84.30523596401186
1124
+ - type: euclidean_pearson
1125
+ value: 84.19537987069458
1126
+ - type: euclidean_spearman
1127
+ value: 84.30522052876
1128
+ - type: manhattan_pearson
1129
+ value: 84.16420807244911
1130
+ - type: manhattan_spearman
1131
+ value: 84.28515410219309
1132
+ - task:
1133
+ type: STS
1134
+ dataset:
1135
+ type: mteb/sts12-sts
1136
+ name: MTEB STS12
1137
+ config: default
1138
+ split: test
1139
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1140
+ metrics:
1141
+ - type: cos_sim_pearson
1142
+ value: 86.17180810119646
1143
+ - type: cos_sim_spearman
1144
+ value: 78.44413657529002
1145
+ - type: euclidean_pearson
1146
+ value: 81.69054139101816
1147
+ - type: euclidean_spearman
1148
+ value: 78.44412412142488
1149
+ - type: manhattan_pearson
1150
+ value: 82.04975789626462
1151
+ - type: manhattan_spearman
1152
+ value: 78.78390856857253
1153
+ - task:
1154
+ type: STS
1155
+ dataset:
1156
+ type: mteb/sts13-sts
1157
+ name: MTEB STS13
1158
+ config: default
1159
+ split: test
1160
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1161
+ metrics:
1162
+ - type: cos_sim_pearson
1163
+ value: 88.35737871089687
1164
+ - type: cos_sim_spearman
1165
+ value: 88.26850223126127
1166
+ - type: euclidean_pearson
1167
+ value: 87.44100858335746
1168
+ - type: euclidean_spearman
1169
+ value: 88.26850223126127
1170
+ - type: manhattan_pearson
1171
+ value: 87.61572015772133
1172
+ - type: manhattan_spearman
1173
+ value: 88.56229552813319
1174
+ - task:
1175
+ type: STS
1176
+ dataset:
1177
+ type: mteb/sts14-sts
1178
+ name: MTEB STS14
1179
+ config: default
1180
+ split: test
1181
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
1182
+ metrics:
1183
+ - type: cos_sim_pearson
1184
+ value: 86.8395966764906
1185
+ - type: cos_sim_spearman
1186
+ value: 84.49441798385489
1187
+ - type: euclidean_pearson
1188
+ value: 85.3259176121388
1189
+ - type: euclidean_spearman
1190
+ value: 84.49442124804686
1191
+ - type: manhattan_pearson
1192
+ value: 85.35153862806513
1193
+ - type: manhattan_spearman
1194
+ value: 84.60094577432503
1195
+ - task:
1196
+ type: STS
1197
+ dataset:
1198
+ type: mteb/sts15-sts
1199
+ name: MTEB STS15
1200
+ config: default
1201
+ split: test
1202
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
1203
+ metrics:
1204
+ - type: cos_sim_pearson
1205
+ value: 90.14048269057345
1206
+ - type: cos_sim_spearman
1207
+ value: 90.27866978947013
1208
+ - type: euclidean_pearson
1209
+ value: 89.35308361940393
1210
+ - type: euclidean_spearman
1211
+ value: 90.27866978947013
1212
+ - type: manhattan_pearson
1213
+ value: 89.37601244066997
1214
+ - type: manhattan_spearman
1215
+ value: 90.42707449698062
1216
+ - task:
1217
+ type: STS
1218
+ dataset:
1219
+ type: mteb/sts16-sts
1220
+ name: MTEB STS16
1221
+ config: default
1222
+ split: test
1223
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
1224
+ metrics:
1225
+ - type: cos_sim_pearson
1226
+ value: 86.8522678865688
1227
+ - type: cos_sim_spearman
1228
+ value: 87.37396401580446
1229
+ - type: euclidean_pearson
1230
+ value: 86.37219665505377
1231
+ - type: euclidean_spearman
1232
+ value: 87.37396385867791
1233
+ - type: manhattan_pearson
1234
+ value: 86.44628823799896
1235
+ - type: manhattan_spearman
1236
+ value: 87.49116026788859
1237
+ - task:
1238
+ type: STS
1239
+ dataset:
1240
+ type: mteb/sts17-crosslingual-sts
1241
+ name: MTEB STS17 (en-en)
1242
+ config: en-en
1243
+ split: test
1244
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
1245
+ metrics:
1246
+ - type: cos_sim_pearson
1247
+ value: 92.94248481968916
1248
+ - type: cos_sim_spearman
1249
+ value: 92.68185242943188
1250
+ - type: euclidean_pearson
1251
+ value: 92.33802342092979
1252
+ - type: euclidean_spearman
1253
+ value: 92.68185242943188
1254
+ - type: manhattan_pearson
1255
+ value: 92.2011323340474
1256
+ - type: manhattan_spearman
1257
+ value: 92.43364757640346
1258
+ - task:
1259
+ type: STS
1260
+ dataset:
1261
+ type: mteb/sts22-crosslingual-sts
1262
+ name: MTEB STS22 (en)
1263
+ config: en
1264
+ split: test
1265
+ revision: eea2b4fe26a775864c896887d910b76a8098ad3f
1266
+ metrics:
1267
+ - type: cos_sim_pearson
1268
+ value: 70.2918782293091
1269
+ - type: cos_sim_spearman
1270
+ value: 68.61986257003369
1271
+ - type: euclidean_pearson
1272
+ value: 70.51920905899138
1273
+ - type: euclidean_spearman
1274
+ value: 68.61986257003369
1275
+ - type: manhattan_pearson
1276
+ value: 70.64673843811433
1277
+ - type: manhattan_spearman
1278
+ value: 68.86711466517345
1279
+ - task:
1280
+ type: STS
1281
+ dataset:
1282
+ type: mteb/stsbenchmark-sts
1283
+ name: MTEB STSBenchmark
1284
+ config: default
1285
+ split: test
1286
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
1287
+ metrics:
1288
+ - type: cos_sim_pearson
1289
+ value: 88.62956838105524
1290
+ - type: cos_sim_spearman
1291
+ value: 88.80650007123052
1292
+ - type: euclidean_pearson
1293
+ value: 88.37976252122822
1294
+ - type: euclidean_spearman
1295
+ value: 88.80650007123052
1296
+ - type: manhattan_pearson
1297
+ value: 88.49866938476616
1298
+ - type: manhattan_spearman
1299
+ value: 89.02489665452616
1300
+ - task:
1301
+ type: Reranking
1302
+ dataset:
1303
+ type: mteb/scidocs-reranking
1304
+ name: MTEB SciDocsRR
1305
+ config: default
1306
+ split: test
1307
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
1308
+ metrics:
1309
+ - type: map
1310
+ value: 86.40175229911527
1311
+ - type: mrr
1312
+ value: 96.61958230585682
1313
+ - task:
1314
+ type: Retrieval
1315
+ dataset:
1316
+ type: mteb/scifact
1317
+ name: MTEB SciFact
1318
+ config: default
1319
+ split: test
1320
+ revision: 0228b52cf27578f30900b9e5271d331663a030d7
1321
+ metrics:
1322
+ - type: map_at_1
1323
+ value: 63.05
1324
+ - type: map_at_10
1325
+ value: 73.844
1326
+ - type: map_at_100
1327
+ value: 74.313
1328
+ - type: map_at_1000
1329
+ value: 74.321
1330
+ - type: map_at_3
1331
+ value: 71.17999999999999
1332
+ - type: map_at_5
1333
+ value: 72.842
1334
+ - type: mrr_at_1
1335
+ value: 65.667
1336
+ - type: mrr_at_10
1337
+ value: 74.772
1338
+ - type: mrr_at_100
1339
+ value: 75.087
1340
+ - type: mrr_at_1000
1341
+ value: 75.095
1342
+ - type: mrr_at_3
1343
+ value: 72.944
1344
+ - type: mrr_at_5
1345
+ value: 74.078
1346
+ - type: ndcg_at_1
1347
+ value: 65.667
1348
+ - type: ndcg_at_10
1349
+ value: 78.31700000000001
1350
+ - type: ndcg_at_100
1351
+ value: 79.969
1352
+ - type: ndcg_at_1000
1353
+ value: 80.25
1354
+ - type: ndcg_at_3
1355
+ value: 74.099
1356
+ - type: ndcg_at_5
1357
+ value: 76.338
1358
+ - type: precision_at_1
1359
+ value: 65.667
1360
+ - type: precision_at_10
1361
+ value: 10.233
1362
+ - type: precision_at_100
1363
+ value: 1.107
1364
+ - type: precision_at_1000
1365
+ value: 0.11299999999999999
1366
+ - type: precision_at_3
1367
+ value: 28.889
1368
+ - type: precision_at_5
1369
+ value: 19.0
1370
+ - type: recall_at_1
1371
+ value: 63.05
1372
+ - type: recall_at_10
1373
+ value: 90.822
1374
+ - type: recall_at_100
1375
+ value: 97.667
1376
+ - type: recall_at_1000
1377
+ value: 100.0
1378
+ - type: recall_at_3
1379
+ value: 79.489
1380
+ - type: recall_at_5
1381
+ value: 85.161
1382
+ - task:
1383
+ type: PairClassification
1384
+ dataset:
1385
+ type: mteb/sprintduplicatequestions-pairclassification
1386
+ name: MTEB SprintDuplicateQuestions
1387
+ config: default
1388
+ split: test
1389
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
1390
+ metrics:
1391
+ - type: cos_sim_accuracy
1392
+ value: 99.83564356435643
1393
+ - type: cos_sim_ap
1394
+ value: 96.10619363017767
1395
+ - type: cos_sim_f1
1396
+ value: 91.61225514816677
1397
+ - type: cos_sim_precision
1398
+ value: 92.02825428859738
1399
+ - type: cos_sim_recall
1400
+ value: 91.2
1401
+ - type: dot_accuracy
1402
+ value: 99.83564356435643
1403
+ - type: dot_ap
1404
+ value: 96.10619363017767
1405
+ - type: dot_f1
1406
+ value: 91.61225514816677
1407
+ - type: dot_precision
1408
+ value: 92.02825428859738
1409
+ - type: dot_recall
1410
+ value: 91.2
1411
+ - type: euclidean_accuracy
1412
+ value: 99.83564356435643
1413
+ - type: euclidean_ap
1414
+ value: 96.10619363017769
1415
+ - type: euclidean_f1
1416
+ value: 91.61225514816677
1417
+ - type: euclidean_precision
1418
+ value: 92.02825428859738
1419
+ - type: euclidean_recall
1420
+ value: 91.2
1421
+ - type: manhattan_accuracy
1422
+ value: 99.84158415841584
1423
+ - type: manhattan_ap
1424
+ value: 96.27527798658713
1425
+ - type: manhattan_f1
1426
+ value: 92.0
1427
+ - type: manhattan_precision
1428
+ value: 92.0
1429
+ - type: manhattan_recall
1430
+ value: 92.0
1431
+ - type: max_accuracy
1432
+ value: 99.84158415841584
1433
+ - type: max_ap
1434
+ value: 96.27527798658713
1435
+ - type: max_f1
1436
+ value: 92.0
1437
+ - task:
1438
+ type: Clustering
1439
+ dataset:
1440
+ type: mteb/stackexchange-clustering
1441
+ name: MTEB StackExchangeClustering
1442
+ config: default
1443
+ split: test
1444
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
1445
+ metrics:
1446
+ - type: v_measure
1447
+ value: 76.93753872885304
1448
+ - task:
1449
+ type: Clustering
1450
+ dataset:
1451
+ type: mteb/stackexchange-clustering-p2p
1452
+ name: MTEB StackExchangeClusteringP2P
1453
+ config: default
1454
+ split: test
1455
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
1456
+ metrics:
1457
+ - type: v_measure
1458
+ value: 46.044085080870126
1459
+ - task:
1460
+ type: Reranking
1461
+ dataset:
1462
+ type: mteb/stackoverflowdupquestions-reranking
1463
+ name: MTEB StackOverflowDupQuestions
1464
+ config: default
1465
+ split: test
1466
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
1467
+ metrics:
1468
+ - type: map
1469
+ value: 55.885129730227256
1470
+ - type: mrr
1471
+ value: 56.95062494694848
1472
+ - task:
1473
+ type: Summarization
1474
+ dataset:
1475
+ type: mteb/summeval
1476
+ name: MTEB SummEval
1477
+ config: default
1478
+ split: test
1479
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
1480
+ metrics:
1481
+ - type: cos_sim_pearson
1482
+ value: 31.202047940935508
1483
+ - type: cos_sim_spearman
1484
+ value: 30.984832035722228
1485
+ - type: dot_pearson
1486
+ value: 31.20204247226978
1487
+ - type: dot_spearman
1488
+ value: 30.984832035722228
1489
+ - task:
1490
+ type: Retrieval
1491
+ dataset:
1492
+ type: mteb/trec-covid
1493
+ name: MTEB TRECCOVID
1494
+ config: default
1495
+ split: test
1496
+ revision: bb9466bac8153a0349341eb1b22e06409e78ef4e
1497
+ metrics:
1498
+ - type: map_at_1
1499
+ value: 0.245
1500
+ - type: map_at_10
1501
+ value: 2.249
1502
+ - type: map_at_100
1503
+ value: 14.85
1504
+ - type: map_at_1000
1505
+ value: 36.596000000000004
1506
+ - type: map_at_3
1507
+ value: 0.717
1508
+ - type: map_at_5
1509
+ value: 1.18
1510
+ - type: mrr_at_1
1511
+ value: 94.0
1512
+ - type: mrr_at_10
1513
+ value: 96.167
1514
+ - type: mrr_at_100
1515
+ value: 96.167
1516
+ - type: mrr_at_1000
1517
+ value: 96.167
1518
+ - type: mrr_at_3
1519
+ value: 95.667
1520
+ - type: mrr_at_5
1521
+ value: 96.167
1522
+ - type: ndcg_at_1
1523
+ value: 91.0
1524
+ - type: ndcg_at_10
1525
+ value: 87.09700000000001
1526
+ - type: ndcg_at_100
1527
+ value: 69.637
1528
+ - type: ndcg_at_1000
1529
+ value: 62.257
1530
+ - type: ndcg_at_3
1531
+ value: 90.235
1532
+ - type: ndcg_at_5
1533
+ value: 89.51400000000001
1534
+ - type: precision_at_1
1535
+ value: 94.0
1536
+ - type: precision_at_10
1537
+ value: 90.60000000000001
1538
+ - type: precision_at_100
1539
+ value: 71.38
1540
+ - type: precision_at_1000
1541
+ value: 27.400000000000002
1542
+ - type: precision_at_3
1543
+ value: 94.0
1544
+ - type: precision_at_5
1545
+ value: 93.2
1546
+ - type: recall_at_1
1547
+ value: 0.245
1548
+ - type: recall_at_10
1549
+ value: 2.366
1550
+ - type: recall_at_100
1551
+ value: 17.491
1552
+ - type: recall_at_1000
1553
+ value: 58.772999999999996
1554
+ - type: recall_at_3
1555
+ value: 0.7270000000000001
1556
+ - type: recall_at_5
1557
+ value: 1.221
1558
+ - task:
1559
+ type: Retrieval
1560
+ dataset:
1561
+ type: mteb/touche2020
1562
+ name: MTEB Touche2020
1563
+ config: default
1564
+ split: test
1565
+ revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f
1566
+ metrics:
1567
+ - type: map_at_1
1568
+ value: 3.435
1569
+ - type: map_at_10
1570
+ value: 12.147
1571
+ - type: map_at_100
1572
+ value: 18.724
1573
+ - type: map_at_1000
1574
+ value: 20.426
1575
+ - type: map_at_3
1576
+ value: 6.526999999999999
1577
+ - type: map_at_5
1578
+ value: 9.198
1579
+ - type: mrr_at_1
1580
+ value: 48.980000000000004
1581
+ - type: mrr_at_10
1582
+ value: 62.970000000000006
1583
+ - type: mrr_at_100
1584
+ value: 63.288999999999994
1585
+ - type: mrr_at_1000
1586
+ value: 63.288999999999994
1587
+ - type: mrr_at_3
1588
+ value: 59.184000000000005
1589
+ - type: mrr_at_5
1590
+ value: 61.224000000000004
1591
+ - type: ndcg_at_1
1592
+ value: 46.939
1593
+ - type: ndcg_at_10
1594
+ value: 30.61
1595
+ - type: ndcg_at_100
1596
+ value: 41.683
1597
+ - type: ndcg_at_1000
1598
+ value: 53.144000000000005
1599
+ - type: ndcg_at_3
1600
+ value: 36.284
1601
+ - type: ndcg_at_5
1602
+ value: 34.345
1603
+ - type: precision_at_1
1604
+ value: 48.980000000000004
1605
+ - type: precision_at_10
1606
+ value: 26.122
1607
+ - type: precision_at_100
1608
+ value: 8.204
1609
+ - type: precision_at_1000
1610
+ value: 1.6019999999999999
1611
+ - type: precision_at_3
1612
+ value: 35.374
1613
+ - type: precision_at_5
1614
+ value: 32.653
1615
+ - type: recall_at_1
1616
+ value: 3.435
1617
+ - type: recall_at_10
1618
+ value: 18.953
1619
+ - type: recall_at_100
1620
+ value: 50.775000000000006
1621
+ - type: recall_at_1000
1622
+ value: 85.858
1623
+ - type: recall_at_3
1624
+ value: 7.813000000000001
1625
+ - type: recall_at_5
1626
+ value: 11.952
1627
+ - task:
1628
+ type: Classification
1629
+ dataset:
1630
+ type: mteb/toxic_conversations_50k
1631
+ name: MTEB ToxicConversationsClassification
1632
+ config: default
1633
+ split: test
1634
+ revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de
1635
+ metrics:
1636
+ - type: accuracy
1637
+ value: 71.2938
1638
+ - type: ap
1639
+ value: 15.090139095602268
1640
+ - type: f1
1641
+ value: 55.23862650598296
1642
+ - task:
1643
+ type: Classification
1644
+ dataset:
1645
+ type: mteb/tweet_sentiment_extraction
1646
+ name: MTEB TweetSentimentExtractionClassification
1647
+ config: default
1648
+ split: test
1649
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
1650
+ metrics:
1651
+ - type: accuracy
1652
+ value: 64.7623089983022
1653
+ - type: f1
1654
+ value: 65.07617131099336
1655
+ - task:
1656
+ type: Clustering
1657
+ dataset:
1658
+ type: mteb/twentynewsgroups-clustering
1659
+ name: MTEB TwentyNewsgroupsClustering
1660
+ config: default
1661
+ split: test
1662
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
1663
+ metrics:
1664
+ - type: v_measure
1665
+ value: 57.2988222684939
1666
+ - task:
1667
+ type: PairClassification
1668
+ dataset:
1669
+ type: mteb/twittersemeval2015-pairclassification
1670
+ name: MTEB TwitterSemEval2015
1671
+ config: default
1672
+ split: test
1673
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
1674
+ metrics:
1675
+ - type: cos_sim_accuracy
1676
+ value: 88.6034451928235
1677
+ - type: cos_sim_ap
1678
+ value: 81.51815279166863
1679
+ - type: cos_sim_f1
1680
+ value: 74.43794671864849
1681
+ - type: cos_sim_precision
1682
+ value: 73.34186939820742
1683
+ - type: cos_sim_recall
1684
+ value: 75.56728232189973
1685
+ - type: dot_accuracy
1686
+ value: 88.6034451928235
1687
+ - type: dot_ap
1688
+ value: 81.51816956866841
1689
+ - type: dot_f1
1690
+ value: 74.43794671864849
1691
+ - type: dot_precision
1692
+ value: 73.34186939820742
1693
+ - type: dot_recall
1694
+ value: 75.56728232189973
1695
+ - type: euclidean_accuracy
1696
+ value: 88.6034451928235
1697
+ - type: euclidean_ap
1698
+ value: 81.51817015121485
1699
+ - type: euclidean_f1
1700
+ value: 74.43794671864849
1701
+ - type: euclidean_precision
1702
+ value: 73.34186939820742
1703
+ - type: euclidean_recall
1704
+ value: 75.56728232189973
1705
+ - type: manhattan_accuracy
1706
+ value: 88.5736424867378
1707
+ - type: manhattan_ap
1708
+ value: 81.37610101292196
1709
+ - type: manhattan_f1
1710
+ value: 74.2504182215931
1711
+ - type: manhattan_precision
1712
+ value: 72.46922883697563
1713
+ - type: manhattan_recall
1714
+ value: 76.12137203166228
1715
+ - type: max_accuracy
1716
+ value: 88.6034451928235
1717
+ - type: max_ap
1718
+ value: 81.51817015121485
1719
+ - type: max_f1
1720
+ value: 74.43794671864849
1721
+ - task:
1722
+ type: PairClassification
1723
+ dataset:
1724
+ type: mteb/twitterurlcorpus-pairclassification
1725
+ name: MTEB TwitterURLCorpus
1726
+ config: default
1727
+ split: test
1728
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
1729
+ metrics:
1730
+ - type: cos_sim_accuracy
1731
+ value: 89.53118329646446
1732
+ - type: cos_sim_ap
1733
+ value: 87.41972033060013
1734
+ - type: cos_sim_f1
1735
+ value: 79.4392523364486
1736
+ - type: cos_sim_precision
1737
+ value: 75.53457372951958
1738
+ - type: cos_sim_recall
1739
+ value: 83.7696335078534
1740
+ - type: dot_accuracy
1741
+ value: 89.53118329646446
1742
+ - type: dot_ap
1743
+ value: 87.41971646088945
1744
+ - type: dot_f1
1745
+ value: 79.4392523364486
1746
+ - type: dot_precision
1747
+ value: 75.53457372951958
1748
+ - type: dot_recall
1749
+ value: 83.7696335078534
1750
+ - type: euclidean_accuracy
1751
+ value: 89.53118329646446
1752
+ - type: euclidean_ap
1753
+ value: 87.41972415605997
1754
+ - type: euclidean_f1
1755
+ value: 79.4392523364486
1756
+ - type: euclidean_precision
1757
+ value: 75.53457372951958
1758
+ - type: euclidean_recall
1759
+ value: 83.7696335078534
1760
+ - type: manhattan_accuracy
1761
+ value: 89.5855163581325
1762
+ - type: manhattan_ap
1763
+ value: 87.51158697451964
1764
+ - type: manhattan_f1
1765
+ value: 79.54455087655883
1766
+ - type: manhattan_precision
1767
+ value: 74.96763643796416
1768
+ - type: manhattan_recall
1769
+ value: 84.71666153372344
1770
+ - type: max_accuracy
1771
+ value: 89.5855163581325
1772
+ - type: max_ap
1773
+ value: 87.51158697451964
1774
+ - type: max_f1
1775
+ value: 79.54455087655883
1776
+ language:
1777
+ - en
1778
+ license: cc-by-nc-4.0
1779
+ ---
1780
+ # Linq-AI-Research/Linq-Embed-Mistral (Quantized)
1781
+
1782
+ ## Description
1783
+ This model is a quantized version of the original model [`Linq-AI-Research/Linq-Embed-Mistral`](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral).
1784
+
1785
+ It's quantized using the BitsAndBytes library to 4-bit using the [bnb-my-repo](https://huggingface.co/spaces/bnb-community/bnb-my-repo) space.
1786
+
1787
+ ## Quantization Details
1788
+ - **Quantization Type**: int4
1789
+ - **bnb_4bit_quant_type**: nf4
1790
+ - **bnb_4bit_use_double_quant**: True
1791
+ - **bnb_4bit_compute_dtype**: bfloat16
1792
+ - **bnb_4bit_quant_storage**: uint8
1793
+
1794
+
1795
+
1796
+ # 📄 Original Model Information
1797
+
1798
+
1799
+ <h1 align="center">Linq-AI-Research/Linq-Embed-Mistral</h1>
1800
+
1801
+ **Linq-Embed-Mistral**
1802
+
1803
+ Linq-Embed-Mistral has been developed by building upon the foundations of the [E5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) models. We focus on improving text retrieval using advanced data refinement methods, including sophisticated data crafting, data filtering, and negative mining guided by teacher models, which are highly tailored to each task, to improve the quality of the synthetic data generated by LLM. These methods are applied to both existing benchmark dataset and highly tailored synthetic dataset generated via LLMs. Our efforts primarily aim to create high-quality triplet datasets (query, positive example, negative example), significantly improving text retrieval performance.
1804
+
1805
+ Linq-Embed-Mistral performs well in the MTEB benchmarks (as of May 29, 2024). The model excels in retrieval tasks, ranking <ins>**`1st`**</ins> among all models listed on the MTEB leaderboard with a performance score of <ins>**`60.2`**</ins>. This outstanding performance underscores its superior capability in enhancing search precision and reliability. The model achieves an average score of <ins>**`68.2`**</ins> across 56 datasets in the MTEB benchmarks, making it the highest-ranking publicly accessible model and third overall. (Please note that [NV-Emb-v1](https://huggingface.co/nvidia/NV-Embed-v1) and [voyage-large-2-instruct](https://docs.voyageai.com/embeddings/), ranked 1st and 2nd on the leaderboard as of May 29, reported their performance without releasing their models.)
1806
+
1807
+
1808
+ This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses. Please refer to specific papers for more details:
1809
+
1810
+ - [MTEB benchmark](https://arxiv.org/abs/2210.07316)
1811
+ - [Mistral](https://arxiv.org/abs/2310.06825)
1812
+ - [E5-mistral-7b-instruct](https://arxiv.org/pdf/2401.00368.pdf)
1813
+
1814
+ For more details, refer to [this blog post](https://getlinq.com/blog/linq-embed-mistral/) and [this report](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral/blob/main/LinqAIResearch2024_Linq-Embed-Mistral.pdf).
1815
+
1816
+ ## How to use
1817
+
1818
+ Here is an example of how to encode queries and passages from the Mr.TyDi training dataset, both with Sentence Transformers or Transformers directly.
1819
+
1820
+ ### Sentence Transformers
1821
+
1822
+ ```python
1823
+ from sentence_transformers import SentenceTransformer
1824
+
1825
+ # Load the model
1826
+ model = SentenceTransformer("Linq-AI-Research/Linq-Embed-Mistral")
1827
+
1828
+ # Each query must come with a one-sentence instruction that describes the task
1829
+ task = 'Given a question, retrieve Wikipedia passages that answer the question'
1830
+ prompt = f"Instruct: {task}\nQuery: "
1831
+ queries = [
1832
+ "최초의 원자력 발전소는 무엇인가?",
1833
+ "Who invented Hangul?"
1834
+ ]
1835
+ passages = [
1836
+ "현재 사용되는 핵분열 방식을 이용한 전력생산은 1948년 9월 미국 테네시주 오크리지에 설치된 X-10 흑연원자로에서 전구의 불을 밝히는 데 사용되면서 시작되었다. 그리고 1954년 6월에 구소련의 오브닌스크에 건설된 흑연감속 비등경수 압력관형 원자로를 사용한 오브닌스크 원자력 발전소가 시험적으로 전력생산을 시작하였고, 최초의 상업용 원자력 엉더이로를 사용한 영국 셀라필드 원자력 단지에 위치한 콜더 홀(Calder Hall) 원자력 발전소로, 1956년 10월 17일 상업 운전을 시작하였다.",
1837
+ "Hangul was personally created and promulgated by the fourth king of the Joseon dynasty, Sejong the Great.[1][2] Sejong's scholarly institute, the Hall of Worthies, is often credited with the work, and at least one of its scholars was heavily involved in its creation, but it appears to have also been a personal project of Sejong."
1838
+ ]
1839
+
1840
+ # Encode the queries and passages. We only use the prompt for the queries
1841
+ query_embeddings = model.encode(queries, prompt=prompt)
1842
+ passage_embeddings = model.encode(passages)
1843
+
1844
+ # Compute the (cosine) similarity scores
1845
+ scores = model.similarity(query_embeddings, passage_embeddings) * 100
1846
+ print(scores.tolist())
1847
+ # [[73.72908782958984, 30.122787475585938], [29.15508460998535, 79.25375366210938]]
1848
+ ```
1849
+
1850
+ ### Transformers
1851
+
1852
+ ```python
1853
+ import torch
1854
+ import torch.nn.functional as F
1855
+ from torch import Tensor
1856
+ from transformers import AutoTokenizer, AutoModel
1857
+
1858
+ def last_token_pool(last_hidden_states: Tensor,
1859
+ attention_mask: Tensor) -> Tensor:
1860
+ left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
1861
+ if left_padding:
1862
+ return last_hidden_states[:, -1]
1863
+ else:
1864
+ sequence_lengths = attention_mask.sum(dim=1) - 1
1865
+ batch_size = last_hidden_states.shape[0]
1866
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
1867
+
1868
+ def get_detailed_instruct(task_description: str, query: str) -> str:
1869
+ return f'Instruct: {task_description}\nQuery: {query}'
1870
+
1871
+ # Each query must come with a one-sentence instruction that describes the task
1872
+ task = 'Given a question, retrieve Wikipedia passages that answer the question'
1873
+ queries = [
1874
+ get_detailed_instruct(task, '최초의 원자력 발전소는 무엇인가?'),
1875
+ get_detailed_instruct(task, 'Who invented Hangul?')
1876
+ ]
1877
+ # No need to add instruction for retrieval documents
1878
+ passages = [
1879
+ "현재 사용되는 핵분열 방식을 이용한 전력생산은 1948년 9월 미국 테네시주 오크리지에 설치된 X-10 흑연원자로에서 전구의 불을 밝히는 데 사용되면서 시작되었다. 그리고 1954년 6월에 구소련의 오브닌스크에 건설된 흑연감속 비등경수 압력관형 원자로를 사용한 오브닌스크 원자력 발전소가 시험적으로 전력생산을 시작하였고, 최초의 상업용 원자력 엉더이로를 사용한 영국 셀라필드 원자력 단지에 위치한 콜더 홀(Calder Hall) 원자력 발전소로, 1956년 10월 17일 상업 운전을 시작하였다.",
1880
+ "Hangul was personally created and promulgated by the fourth king of the Joseon dynasty, Sejong the Great.[1][2] Sejong's scholarly institute, the Hall of Worthies, is often credited with the work, and at least one of its scholars was heavily involved in its creation, but it appears to have also been a personal project of Sejong."
1881
+ ]
1882
+
1883
+ # Load model and tokenizer
1884
+ tokenizer = AutoTokenizer.from_pretrained('Linq-AI-Research/Linq-Embed-Mistral')
1885
+ model = AutoModel.from_pretrained('Linq-AI-Research/Linq-Embed-Mistral')
1886
+
1887
+ max_length = 4096
1888
+ input_texts = [*queries, *passages]
1889
+ # Tokenize the input texts
1890
+ batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors="pt")
1891
+ outputs = model(**batch_dict)
1892
+ embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
1893
+
1894
+ # Normalize embeddings
1895
+ embeddings = F.normalize(embeddings, p=2, dim=1)
1896
+ scores = (embeddings[:2] @ embeddings[2:].T) * 100
1897
+ print(scores.tolist())
1898
+ # [[73.72909545898438, 30.122783660888672], [29.155078887939453, 79.25374603271484]]
1899
+ ```
1900
+
1901
+ ### MTEB Benchmark Evaluation
1902
+
1903
+ Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to reproduce evaluation results on the [BEIR](https://arxiv.org/abs/2104.08663) and [MTEB](https://arxiv.org/abs/2210.07316) benchmark.
1904
+
1905
+ ## Evaluation Result
1906
+
1907
+ ### MTEB (as of May 29, 2024)
1908
+
1909
+ | Model Name | Retrieval (15) | Average (56) |
1910
+ | :------------------------------------------------------------------------------: | :------------: | :----------: |
1911
+ | [Linq-Embed-Mistral](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral) | 60.2 | 68.2 |
1912
+ | [NV-Embed-v1](https://huggingface.co/nvidia/NV-Embed-v1) | 59.4 | 69.3 |
1913
+ | [SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral) | 59.0 | 67.6 |
1914
+ | [voyage-large-2-instruct](https://docs.voyageai.com/docs/embeddings) | 58.3 | 68.3 |
1915
+ | [GritLM-7B](https://huggingface.co/GritLM/GritLM-7B) | 57.4 | 66.8 |
1916
+ | [voyage-lite-02-instruct](https://docs.voyageai.com/docs/embeddings) | 56.6 | 67.1 |
1917
+ |[gte-Qwen1.5-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| 56.2 | 67.3 |
1918
+ | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | 56.9 | 66.6 |
1919
+ |[google-gecko.text-embedding-preview-0409](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings?hl=ko#latest_models)| 55.7 | 66.3 |
1920
+ |[text-embedding-3-large](https://openai.com/index/new-embedding-models-and-api-updates/)| 55.4 | 64.6 |
1921
+ |[Cohere-embed-english-v3.0](https://huggingface.co/Cohere/Cohere-embed-english-v3.0)| 55.0 | 64.5 |
1922
+
1923
+ # Linq Research Team.
1924
+
1925
+ - [Junseong Kim](https://huggingface.co/Junseong)
1926
+ - [Seolhwa Lee](https://huggingface.co/Seolhwa)
1927
+ - [Jihoon Kwon](https://huggingface.co/Mayfull)
1928
+ - [Sangmo Gu](https://huggingface.co/karma-os)
1929
+ - Yejin Kim
1930
+ - Minkyung Cho
1931
+ - [Jy-yong Sohn](https://itml.yonsei.ac.kr/professor)
1932
+ - [Chanyeol Choi](https://www.linkedin.com/in/chanyeolchoi)
1933
+
1934
+ # Citation
1935
+
1936
+ ```bibtex
1937
+ @misc{LinqAIResearch2024,
1938
+ title={Linq-Embed-Mistral:Elevating Text Retrieval with Improved GPT Data Through Task-Specific Control and Quality Refinement},
1939
+ author={Junseong Kim, Seolhwa Lee, Jihoon Kwon, Sangmo Gu, Yejin Kim, Minkyung Cho, Jy-yong Sohn, Chanyeol Choi},
1940
+ howpublished={Linq AI Research Blog},
1941
+ year={2024},
1942
+ url={https://getlinq.com/blog/linq-embed-mistral/}
1943
+ }
1944
+ ```
config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Linq-AI-Research/Linq-Embed-Mistral",
3
+ "architectures": [
4
+ "MistralModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 32768,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 32,
18
+ "num_key_value_heads": 8,
19
+ "pad_token_id": 2,
20
+ "quantization_config": {
21
+ "_load_in_4bit": true,
22
+ "_load_in_8bit": false,
23
+ "bnb_4bit_compute_dtype": "bfloat16",
24
+ "bnb_4bit_quant_storage": "uint8",
25
+ "bnb_4bit_quant_type": "nf4",
26
+ "bnb_4bit_use_double_quant": true,
27
+ "llm_int8_enable_fp32_cpu_offload": false,
28
+ "llm_int8_has_fp16_weight": false,
29
+ "llm_int8_skip_modules": null,
30
+ "llm_int8_threshold": 6.0,
31
+ "load_in_4bit": true,
32
+ "load_in_8bit": false,
33
+ "quant_method": "bitsandbytes"
34
+ },
35
+ "rms_norm_eps": 1e-05,
36
+ "rope_theta": 10000.0,
37
+ "sliding_window": 4096,
38
+ "tie_word_embeddings": false,
39
+ "torch_dtype": "float16",
40
+ "transformers_version": "4.49.0",
41
+ "use_cache": false,
42
+ "vocab_size": 32000
43
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14256c99f4caa01de57298e548530a6536de5ca91bd9f29284fad86b6ccdce5a
3
+ size 3863534977
special_tokens_map.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "unk_token": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": true,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [
32
+ "<unk>",
33
+ "<s>",
34
+ "</s>"
35
+ ],
36
+ "bos_token": "<s>",
37
+ "clean_up_tokenization_spaces": false,
38
+ "eos_token": "</s>",
39
+ "extra_special_tokens": {},
40
+ "legacy": true,
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "</s>",
43
+ "sp_model_kwargs": {},
44
+ "spaces_between_special_tokens": false,
45
+ "tokenizer_class": "LlamaTokenizer",
46
+ "unk_token": "<unk>",
47
+ "use_default_system_prompt": false
48
+ }