fangxq commited on
Commit
ae284a3
·
verified ·
1 Parent(s): 05ad3b5
README.md ADDED
@@ -0,0 +1,1234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: XYZ-embedding
4
+ results:
5
+ - dataset:
6
+ config: default
7
+ name: MTEB AFQMC
8
+ revision: None
9
+ split: validation
10
+ type: C-MTEB/AFQMC
11
+ metrics:
12
+ - type: cos_sim_pearson
13
+ value: 55.51799059309076
14
+ - type: cos_sim_spearman
15
+ value: 58.407433584137806
16
+ - type: manhattan_pearson
17
+ value: 57.17473672145622
18
+ - type: manhattan_spearman
19
+ value: 58.389018054159955
20
+ - type: euclidean_pearson
21
+ value: 57.19483956761451
22
+ - type: euclidean_spearman
23
+ value: 58.407433584137806
24
+ - type: main_score
25
+ value: 58.407433584137806
26
+ task:
27
+ type: STS
28
+ - dataset:
29
+ config: default
30
+ name: MTEB ATEC
31
+ revision: None
32
+ split: test
33
+ type: C-MTEB/ATEC
34
+ metrics:
35
+ - type: cos_sim_pearson
36
+ value: 57.31078155367183
37
+ - type: cos_sim_spearman
38
+ value: 57.59782762324478
39
+ - type: manhattan_pearson
40
+ value: 62.525487007985035
41
+ - type: manhattan_spearman
42
+ value: 57.591139966303615
43
+ - type: euclidean_pearson
44
+ value: 62.53702437760052
45
+ - type: euclidean_spearman
46
+ value: 57.597828749091384
47
+ - type: main_score
48
+ value: 57.59782762324478
49
+ task:
50
+ type: STS
51
+ - dataset:
52
+ config: zh
53
+ name: MTEB AmazonReviewsClassification (zh)
54
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
55
+ split: test
56
+ type: mteb/amazon_reviews_multi
57
+ metrics:
58
+ - type: accuracy
59
+ value: 49.374
60
+ - type: accuracy_stderr
61
+ value: 1.436636349254743
62
+ - type: f1
63
+ value: 47.115240601017774
64
+ - type: f1_stderr
65
+ value: 1.5642799356594534
66
+ - type: main_score
67
+ value: 49.374
68
+ task:
69
+ type: Classification
70
+ - dataset:
71
+ config: default
72
+ name: MTEB BQ
73
+ revision: None
74
+ split: test
75
+ type: C-MTEB/BQ
76
+ metrics:
77
+ - type: cos_sim_pearson
78
+ value: 71.49514309404829
79
+ - type: cos_sim_spearman
80
+ value: 72.66161713021279
81
+ - type: manhattan_pearson
82
+ value: 71.03443640254005
83
+ - type: manhattan_spearman
84
+ value: 72.63439621980275
85
+ - type: euclidean_pearson
86
+ value: 71.06830370642658
87
+ - type: euclidean_spearman
88
+ value: 72.66161713043078
89
+ - type: main_score
90
+ value: 72.66161713021279
91
+ task:
92
+ type: STS
93
+ - dataset:
94
+ config: default
95
+ name: MTEB CLSClusteringP2P
96
+ revision: None
97
+ split: test
98
+ type: C-MTEB/CLSClusteringP2P
99
+ metrics:
100
+ - type: v_measure
101
+ value: 57.237692641281
102
+ - type: v_measure_std
103
+ value: 1.2777768354339174
104
+ - type: main_score
105
+ value: 57.237692641281
106
+ task:
107
+ type: Clustering
108
+ - dataset:
109
+ config: default
110
+ name: MTEB CLSClusteringS2S
111
+ revision: None
112
+ split: test
113
+ type: C-MTEB/CLSClusteringS2S
114
+ metrics:
115
+ - type: v_measure
116
+ value: 48.41686666939331
117
+ - type: v_measure_std
118
+ value: 1.7663118461900793
119
+ - type: main_score
120
+ value: 48.41686666939331
121
+ task:
122
+ type: Clustering
123
+ - dataset:
124
+ config: default
125
+ name: MTEB CMedQAv1
126
+ revision: None
127
+ split: test
128
+ type: C-MTEB/CMedQAv1-reranking
129
+ metrics:
130
+ - type: map
131
+ value: 89.9766367822762
132
+ - type: mrr
133
+ value: 91.88896825396824
134
+ - type: main_score
135
+ value: 89.9766367822762
136
+ task:
137
+ type: Reranking
138
+ - dataset:
139
+ config: default
140
+ name: MTEB CMedQAv2
141
+ revision: None
142
+ split: test
143
+ type: C-MTEB/CMedQAv2-reranking
144
+ metrics:
145
+ - type: map
146
+ value: 89.04628340075982
147
+ - type: mrr
148
+ value: 91.21702380952381
149
+ - type: main_score
150
+ value: 89.04628340075982
151
+ task:
152
+ type: Reranking
153
+ - dataset:
154
+ config: default
155
+ name: MTEB CmedqaRetrieval
156
+ revision: None
157
+ split: dev
158
+ type: C-MTEB/CmedqaRetrieval
159
+ metrics:
160
+ - type: map_at_1
161
+ value: 27.796
162
+ - type: map_at_10
163
+ value: 41.498000000000005
164
+ - type: map_at_100
165
+ value: 43.332
166
+ - type: map_at_1000
167
+ value: 43.429
168
+ - type: map_at_3
169
+ value: 37.172
170
+ - type: map_at_5
171
+ value: 39.617000000000004
172
+ - type: mrr_at_1
173
+ value: 42.111
174
+ - type: mrr_at_10
175
+ value: 50.726000000000006
176
+ - type: mrr_at_100
177
+ value: 51.632
178
+ - type: mrr_at_1000
179
+ value: 51.67
180
+ - type: mrr_at_3
181
+ value: 48.429
182
+ - type: mrr_at_5
183
+ value: 49.662
184
+ - type: ndcg_at_1
185
+ value: 42.111
186
+ - type: ndcg_at_10
187
+ value: 48.294
188
+ - type: ndcg_at_100
189
+ value: 55.135999999999996
190
+ - type: ndcg_at_1000
191
+ value: 56.818000000000005
192
+ - type: ndcg_at_3
193
+ value: 43.185
194
+ - type: ndcg_at_5
195
+ value: 45.266
196
+ - type: precision_at_1
197
+ value: 42.111
198
+ - type: precision_at_10
199
+ value: 10.635
200
+ - type: precision_at_100
201
+ value: 1.619
202
+ - type: precision_at_1000
203
+ value: 0.183
204
+ - type: precision_at_3
205
+ value: 24.539
206
+ - type: precision_at_5
207
+ value: 17.644000000000002
208
+ - type: recall_at_1
209
+ value: 27.796
210
+ - type: recall_at_10
211
+ value: 59.034
212
+ - type: recall_at_100
213
+ value: 86.991
214
+ - type: recall_at_1000
215
+ value: 98.304
216
+ - type: recall_at_3
217
+ value: 43.356
218
+ - type: recall_at_5
219
+ value: 49.998
220
+ - type: main_score
221
+ value: 48.294
222
+ task:
223
+ type: Retrieval
224
+ - dataset:
225
+ config: default
226
+ name: MTEB Cmnli
227
+ revision: None
228
+ split: validation
229
+ type: C-MTEB/CMNLI
230
+ metrics:
231
+ - type: cos_sim_accuracy
232
+ value: 82.8983764281419
233
+ - type: cos_sim_accuracy_threshold
234
+ value: 56.05731010437012
235
+ - type: cos_sim_ap
236
+ value: 90.23156362696572
237
+ - type: cos_sim_f1
238
+ value: 83.83207278307574
239
+ - type: cos_sim_f1_threshold
240
+ value: 52.05453634262085
241
+ - type: cos_sim_precision
242
+ value: 78.91044160132068
243
+ - type: cos_sim_recall
244
+ value: 89.40846387654898
245
+ - type: dot_accuracy
246
+ value: 82.8983764281419
247
+ - type: dot_accuracy_threshold
248
+ value: 56.05730414390564
249
+ - type: dot_ap
250
+ value: 90.20952356258861
251
+ - type: dot_f1
252
+ value: 83.83207278307574
253
+ - type: dot_f1_threshold
254
+ value: 52.054524421691895
255
+ - type: dot_precision
256
+ value: 78.91044160132068
257
+ - type: dot_recall
258
+ value: 89.40846387654898
259
+ - type: euclidean_accuracy
260
+ value: 82.8983764281419
261
+ - type: euclidean_accuracy_threshold
262
+ value: 93.74719858169556
263
+ - type: euclidean_ap
264
+ value: 90.23156283510565
265
+ - type: euclidean_f1
266
+ value: 83.83207278307574
267
+ - type: euclidean_f1_threshold
268
+ value: 97.92392253875732
269
+ - type: euclidean_precision
270
+ value: 78.91044160132068
271
+ - type: euclidean_recall
272
+ value: 89.40846387654898
273
+ - type: manhattan_accuracy
274
+ value: 82.85027059530968
275
+ - type: manhattan_accuracy_threshold
276
+ value: 3164.584159851074
277
+ - type: manhattan_ap
278
+ value: 90.23178004516869
279
+ - type: manhattan_f1
280
+ value: 83.82157123834887
281
+ - type: manhattan_f1_threshold
282
+ value: 3273.5992431640625
283
+ - type: manhattan_precision
284
+ value: 79.76768743400211
285
+ - type: manhattan_recall
286
+ value: 88.30956277764788
287
+ - type: max_accuracy
288
+ value: 82.8983764281419
289
+ - type: max_ap
290
+ value: 90.23178004516869
291
+ - type: max_f1
292
+ value: 83.83207278307574
293
+ task:
294
+ type: PairClassification
295
+ - dataset:
296
+ config: default
297
+ name: MTEB CovidRetrieval
298
+ revision: None
299
+ split: dev
300
+ type: C-MTEB/CovidRetrieval
301
+ metrics:
302
+ - type: map_at_1
303
+ value: 80.479
304
+ - type: map_at_10
305
+ value: 87.984
306
+ - type: map_at_100
307
+ value: 88.036
308
+ - type: map_at_1000
309
+ value: 88.03699999999999
310
+ - type: map_at_3
311
+ value: 87.083
312
+ - type: map_at_5
313
+ value: 87.694
314
+ - type: mrr_at_1
315
+ value: 80.927
316
+ - type: mrr_at_10
317
+ value: 88.046
318
+ - type: mrr_at_100
319
+ value: 88.099
320
+ - type: mrr_at_1000
321
+ value: 88.1
322
+ - type: mrr_at_3
323
+ value: 87.215
324
+ - type: mrr_at_5
325
+ value: 87.768
326
+ - type: ndcg_at_1
327
+ value: 80.927
328
+ - type: ndcg_at_10
329
+ value: 90.756
330
+ - type: ndcg_at_100
331
+ value: 90.96
332
+ - type: ndcg_at_1000
333
+ value: 90.975
334
+ - type: ndcg_at_3
335
+ value: 89.032
336
+ - type: ndcg_at_5
337
+ value: 90.106
338
+ - type: precision_at_1
339
+ value: 80.927
340
+ - type: precision_at_10
341
+ value: 10.011000000000001
342
+ - type: precision_at_100
343
+ value: 1.009
344
+ - type: precision_at_1000
345
+ value: 0.101
346
+ - type: precision_at_3
347
+ value: 31.752999999999997
348
+ - type: precision_at_5
349
+ value: 19.6
350
+ - type: recall_at_1
351
+ value: 80.479
352
+ - type: recall_at_10
353
+ value: 99.05199999999999
354
+ - type: recall_at_100
355
+ value: 99.895
356
+ - type: recall_at_1000
357
+ value: 100.0
358
+ - type: recall_at_3
359
+ value: 94.494
360
+ - type: recall_at_5
361
+ value: 97.102
362
+ - type: main_score
363
+ value: 90.756
364
+ task:
365
+ type: Retrieval
366
+ - dataset:
367
+ config: default
368
+ name: MTEB DuRetrieval
369
+ revision: None
370
+ split: dev
371
+ type: C-MTEB/DuRetrieval
372
+ metrics:
373
+ - type: map_at_1
374
+ value: 27.853
375
+ - type: map_at_10
376
+ value: 85.13199999999999
377
+ - type: map_at_100
378
+ value: 87.688
379
+ - type: map_at_1000
380
+ value: 87.712
381
+ - type: map_at_3
382
+ value: 59.705
383
+ - type: map_at_5
384
+ value: 75.139
385
+ - type: mrr_at_1
386
+ value: 93.65
387
+ - type: mrr_at_10
388
+ value: 95.682
389
+ - type: mrr_at_100
390
+ value: 95.722
391
+ - type: mrr_at_1000
392
+ value: 95.724
393
+ - type: mrr_at_3
394
+ value: 95.467
395
+ - type: mrr_at_5
396
+ value: 95.612
397
+ - type: ndcg_at_1
398
+ value: 93.65
399
+ - type: ndcg_at_10
400
+ value: 91.155
401
+ - type: ndcg_at_100
402
+ value: 93.183
403
+ - type: ndcg_at_1000
404
+ value: 93.38499999999999
405
+ - type: ndcg_at_3
406
+ value: 90.648
407
+ - type: ndcg_at_5
408
+ value: 89.47699999999999
409
+ - type: precision_at_1
410
+ value: 93.65
411
+ - type: precision_at_10
412
+ value: 43.11
413
+ - type: precision_at_100
414
+ value: 4.854
415
+ - type: precision_at_1000
416
+ value: 0.49100000000000005
417
+ - type: precision_at_3
418
+ value: 81.11699999999999
419
+ - type: precision_at_5
420
+ value: 68.28999999999999
421
+ - type: recall_at_1
422
+ value: 27.853
423
+ - type: recall_at_10
424
+ value: 91.678
425
+ - type: recall_at_100
426
+ value: 98.553
427
+ - type: recall_at_1000
428
+ value: 99.553
429
+ - type: recall_at_3
430
+ value: 61.381
431
+ - type: recall_at_5
432
+ value: 78.605
433
+ - type: main_score
434
+ value: 91.155
435
+ task:
436
+ type: Retrieval
437
+ - dataset:
438
+ config: default
439
+ name: MTEB EcomRetrieval
440
+ revision: None
441
+ split: dev
442
+ type: C-MTEB/EcomRetrieval
443
+ metrics:
444
+ - type: map_at_1
445
+ value: 54.50000000000001
446
+ - type: map_at_10
447
+ value: 65.167
448
+ - type: map_at_100
449
+ value: 65.664
450
+ - type: map_at_1000
451
+ value: 65.67399999999999
452
+ - type: map_at_3
453
+ value: 62.633
454
+ - type: map_at_5
455
+ value: 64.208
456
+ - type: mrr_at_1
457
+ value: 54.50000000000001
458
+ - type: mrr_at_10
459
+ value: 65.167
460
+ - type: mrr_at_100
461
+ value: 65.664
462
+ - type: mrr_at_1000
463
+ value: 65.67399999999999
464
+ - type: mrr_at_3
465
+ value: 62.633
466
+ - type: mrr_at_5
467
+ value: 64.208
468
+ - type: ndcg_at_1
469
+ value: 54.50000000000001
470
+ - type: ndcg_at_10
471
+ value: 70.294
472
+ - type: ndcg_at_100
473
+ value: 72.564
474
+ - type: ndcg_at_1000
475
+ value: 72.841
476
+ - type: ndcg_at_3
477
+ value: 65.128
478
+ - type: ndcg_at_5
479
+ value: 67.96799999999999
480
+ - type: precision_at_1
481
+ value: 54.50000000000001
482
+ - type: precision_at_10
483
+ value: 8.64
484
+ - type: precision_at_100
485
+ value: 0.967
486
+ - type: precision_at_1000
487
+ value: 0.099
488
+ - type: precision_at_3
489
+ value: 24.099999999999998
490
+ - type: precision_at_5
491
+ value: 15.840000000000002
492
+ - type: recall_at_1
493
+ value: 54.50000000000001
494
+ - type: recall_at_10
495
+ value: 86.4
496
+ - type: recall_at_100
497
+ value: 96.7
498
+ - type: recall_at_1000
499
+ value: 98.9
500
+ - type: recall_at_3
501
+ value: 72.3
502
+ - type: recall_at_5
503
+ value: 79.2
504
+ - type: main_score
505
+ value: 70.294
506
+ task:
507
+ type: Retrieval
508
+ - dataset:
509
+ config: default
510
+ name: MTEB IFlyTek
511
+ revision: None
512
+ split: validation
513
+ type: C-MTEB/IFlyTek-classification
514
+ metrics:
515
+ - type: accuracy
516
+ value: 52.743362831858406
517
+ - type: accuracy_stderr
518
+ value: 0.23768288128480788
519
+ - type: f1
520
+ value: 41.1548855278405
521
+ - type: f1_stderr
522
+ value: 0.4088759842813554
523
+ - type: main_score
524
+ value: 52.743362831858406
525
+ task:
526
+ type: Classification
527
+ - dataset:
528
+ config: default
529
+ name: MTEB JDReview
530
+ revision: None
531
+ split: test
532
+ type: C-MTEB/JDReview-classification
533
+ metrics:
534
+ - type: accuracy
535
+ value: 89.08067542213884
536
+ - type: accuracy_stderr
537
+ value: 0.9559278951487445
538
+ - type: ap
539
+ value: 60.875320104586564
540
+ - type: ap_stderr
541
+ value: 2.137806661565934
542
+ - type: f1
543
+ value: 84.39314192399665
544
+ - type: f1_stderr
545
+ value: 1.132407155321657
546
+ - type: main_score
547
+ value: 89.08067542213884
548
+ task:
549
+ type: Classification
550
+ - dataset:
551
+ config: default
552
+ name: MTEB LCQMC
553
+ revision: None
554
+ split: test
555
+ type: C-MTEB/LCQMC
556
+ metrics:
557
+ - type: cos_sim_pearson
558
+ value: 73.3633875566899
559
+ - type: cos_sim_spearman
560
+ value: 79.27679599527615
561
+ - type: manhattan_pearson
562
+ value: 79.12061667088273
563
+ - type: manhattan_spearman
564
+ value: 79.26989882781706
565
+ - type: euclidean_pearson
566
+ value: 79.12871362068391
567
+ - type: euclidean_spearman
568
+ value: 79.27679377557219
569
+ - type: main_score
570
+ value: 79.27679599527615
571
+ task:
572
+ type: STS
573
+ - dataset:
574
+ config: default
575
+ name: MTEB MMarcoReranking
576
+ revision: None
577
+ split: dev
578
+ type: C-MTEB/Mmarco-reranking
579
+ metrics:
580
+ - type: map
581
+ value: 37.68251937316638
582
+ - type: mrr
583
+ value: 36.61746031746032
584
+ - type: main_score
585
+ value: 37.68251937316638
586
+ task:
587
+ type: Reranking
588
+ - dataset:
589
+ config: default
590
+ name: MTEB MMarcoRetrieval
591
+ revision: None
592
+ split: dev
593
+ type: C-MTEB/MMarcoRetrieval
594
+ metrics:
595
+ - type: map_at_1
596
+ value: 69.401
597
+ - type: map_at_10
598
+ value: 78.8
599
+ - type: map_at_100
600
+ value: 79.077
601
+ - type: map_at_1000
602
+ value: 79.081
603
+ - type: map_at_3
604
+ value: 76.97
605
+ - type: map_at_5
606
+ value: 78.185
607
+ - type: mrr_at_1
608
+ value: 71.719
609
+ - type: mrr_at_10
610
+ value: 79.327
611
+ - type: mrr_at_100
612
+ value: 79.56400000000001
613
+ - type: mrr_at_1000
614
+ value: 79.56800000000001
615
+ - type: mrr_at_3
616
+ value: 77.736
617
+ - type: mrr_at_5
618
+ value: 78.782
619
+ - type: ndcg_at_1
620
+ value: 71.719
621
+ - type: ndcg_at_10
622
+ value: 82.505
623
+ - type: ndcg_at_100
624
+ value: 83.673
625
+ - type: ndcg_at_1000
626
+ value: 83.786
627
+ - type: ndcg_at_3
628
+ value: 79.07600000000001
629
+ - type: ndcg_at_5
630
+ value: 81.122
631
+ - type: precision_at_1
632
+ value: 71.719
633
+ - type: precision_at_10
634
+ value: 9.924
635
+ - type: precision_at_100
636
+ value: 1.049
637
+ - type: precision_at_1000
638
+ value: 0.106
639
+ - type: precision_at_3
640
+ value: 29.742
641
+ - type: precision_at_5
642
+ value: 18.937
643
+ - type: recall_at_1
644
+ value: 69.401
645
+ - type: recall_at_10
646
+ value: 93.349
647
+ - type: recall_at_100
648
+ value: 98.492
649
+ - type: recall_at_1000
650
+ value: 99.384
651
+ - type: recall_at_3
652
+ value: 84.385
653
+ - type: recall_at_5
654
+ value: 89.237
655
+ - type: main_score
656
+ value: 82.505
657
+ task:
658
+ type: Retrieval
659
+ - dataset:
660
+ config: zh-CN
661
+ name: MTEB MassiveIntentClassification (zh-CN)
662
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
663
+ split: test
664
+ type: mteb/amazon_massive_intent
665
+ metrics:
666
+ - type: accuracy
667
+ value: 77.9388029589778
668
+ - type: accuracy_stderr
669
+ value: 1.416192788478398
670
+ - type: f1
671
+ value: 74.77765701086211
672
+ - type: f1_stderr
673
+ value: 1.254859698486085
674
+ - type: main_score
675
+ value: 77.9388029589778
676
+ task:
677
+ type: Classification
678
+ - dataset:
679
+ config: zh-CN
680
+ name: MTEB MassiveScenarioClassification (zh-CN)
681
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
682
+ split: test
683
+ type: mteb/amazon_massive_scenario
684
+ metrics:
685
+ - type: accuracy
686
+ value: 83.8231338264963
687
+ - type: accuracy_stderr
688
+ value: 0.6973305760755886
689
+ - type: f1
690
+ value: 83.13105322628088
691
+ - type: f1_stderr
692
+ value: 0.600506118139685
693
+ - type: main_score
694
+ value: 83.8231338264963
695
+ task:
696
+ type: Classification
697
+ - dataset:
698
+ config: default
699
+ name: MTEB MedicalRetrieval
700
+ revision: None
701
+ split: dev
702
+ type: C-MTEB/MedicalRetrieval
703
+ metrics:
704
+ - type: map_at_1
705
+ value: 57.8
706
+ - type: map_at_10
707
+ value: 64.696
708
+ - type: map_at_100
709
+ value: 65.294
710
+ - type: map_at_1000
711
+ value: 65.328
712
+ - type: map_at_3
713
+ value: 62.949999999999996
714
+ - type: map_at_5
715
+ value: 64.095
716
+ - type: mrr_at_1
717
+ value: 58.099999999999994
718
+ - type: mrr_at_10
719
+ value: 64.85
720
+ - type: mrr_at_100
721
+ value: 65.448
722
+ - type: mrr_at_1000
723
+ value: 65.482
724
+ - type: mrr_at_3
725
+ value: 63.1
726
+ - type: mrr_at_5
727
+ value: 64.23
728
+ - type: ndcg_at_1
729
+ value: 57.8
730
+ - type: ndcg_at_10
731
+ value: 68.041
732
+ - type: ndcg_at_100
733
+ value: 71.074
734
+ - type: ndcg_at_1000
735
+ value: 71.919
736
+ - type: ndcg_at_3
737
+ value: 64.584
738
+ - type: ndcg_at_5
739
+ value: 66.625
740
+ - type: precision_at_1
741
+ value: 57.8
742
+ - type: precision_at_10
743
+ value: 7.85
744
+ - type: precision_at_100
745
+ value: 0.9289999999999999
746
+ - type: precision_at_1000
747
+ value: 0.099
748
+ - type: precision_at_3
749
+ value: 23.1
750
+ - type: precision_at_5
751
+ value: 14.84
752
+ - type: recall_at_1
753
+ value: 57.8
754
+ - type: recall_at_10
755
+ value: 78.5
756
+ - type: recall_at_100
757
+ value: 92.9
758
+ - type: recall_at_1000
759
+ value: 99.4
760
+ - type: recall_at_3
761
+ value: 69.3
762
+ - type: recall_at_5
763
+ value: 74.2
764
+ - type: main_score
765
+ value: 68.041
766
+ task:
767
+ type: Retrieval
768
+ - dataset:
769
+ config: default
770
+ name: MTEB MultilingualSentiment
771
+ revision: None
772
+ split: validation
773
+ type: C-MTEB/MultilingualSentiment-classification
774
+ metrics:
775
+ - type: accuracy
776
+ value: 78.60333333333334
777
+ - type: accuracy_stderr
778
+ value: 0.3331499495555859
779
+ - type: f1
780
+ value: 78.4814340961856
781
+ - type: f1_stderr
782
+ value: 0.45721454672060496
783
+ - type: main_score
784
+ value: 78.60333333333334
785
+ task:
786
+ type: Classification
787
+ - dataset:
788
+ config: default
789
+ name: MTEB Ocnli
790
+ revision: None
791
+ split: validation
792
+ type: C-MTEB/OCNLI
793
+ metrics:
794
+ - type: cos_sim_accuracy
795
+ value: 80.5630752571738
796
+ - type: cos_sim_accuracy_threshold
797
+ value: 53.72971296310425
798
+ - type: cos_sim_ap
799
+ value: 85.61885910463258
800
+ - type: cos_sim_f1
801
+ value: 82.40469208211144
802
+ - type: cos_sim_f1_threshold
803
+ value: 50.07883310317993
804
+ - type: cos_sim_precision
805
+ value: 76.70609645131938
806
+ - type: cos_sim_recall
807
+ value: 89.01795142555439
808
+ - type: dot_accuracy
809
+ value: 80.5630752571738
810
+ - type: dot_accuracy_threshold
811
+ value: 53.7297248840332
812
+ - type: dot_ap
813
+ value: 85.61885910463258
814
+ - type: dot_f1
815
+ value: 82.40469208211144
816
+ - type: dot_f1_threshold
817
+ value: 50.07884502410889
818
+ - type: dot_precision
819
+ value: 76.70609645131938
820
+ - type: dot_recall
821
+ value: 89.01795142555439
822
+ - type: euclidean_accuracy
823
+ value: 80.5630752571738
824
+ - type: euclidean_accuracy_threshold
825
+ value: 96.19801044464111
826
+ - type: euclidean_ap
827
+ value: 85.61885910463258
828
+ - type: euclidean_f1
829
+ value: 82.40469208211144
830
+ - type: euclidean_f1_threshold
831
+ value: 99.92111921310425
832
+ - type: euclidean_precision
833
+ value: 76.70609645131938
834
+ - type: euclidean_recall
835
+ value: 89.01795142555439
836
+ - type: manhattan_accuracy
837
+ value: 80.67135896047645
838
+ - type: manhattan_accuracy_threshold
839
+ value: 3323.1739044189453
840
+ - type: manhattan_ap
841
+ value: 85.55348220886658
842
+ - type: manhattan_f1
843
+ value: 82.26744186046511
844
+ - type: manhattan_f1_threshold
845
+ value: 3389.273452758789
846
+ - type: manhattan_precision
847
+ value: 76.00716204118174
848
+ - type: manhattan_recall
849
+ value: 89.65153115100317
850
+ - type: max_accuracy
851
+ value: 80.67135896047645
852
+ - type: max_ap
853
+ value: 85.61885910463258
854
+ - type: max_f1
855
+ value: 82.40469208211144
856
+ task:
857
+ type: PairClassification
858
+ - dataset:
859
+ config: default
860
+ name: MTEB OnlineShopping
861
+ revision: None
862
+ split: test
863
+ type: C-MTEB/OnlineShopping-classification
864
+ metrics:
865
+ - type: accuracy
866
+ value: 94.94
867
+ - type: accuracy_stderr
868
+ value: 0.49030602688525093
869
+ - type: ap
870
+ value: 93.0785841977823
871
+ - type: ap_stderr
872
+ value: 0.5447383082750599
873
+ - type: f1
874
+ value: 94.92765777406245
875
+ - type: f1_stderr
876
+ value: 0.4891510966106189
877
+ - type: main_score
878
+ value: 94.94
879
+ task:
880
+ type: Classification
881
+ - dataset:
882
+ config: default
883
+ name: MTEB PAWSX
884
+ revision: None
885
+ split: test
886
+ type: C-MTEB/PAWSX
887
+ metrics:
888
+ - type: cos_sim_pearson
889
+ value: 36.564307811370654
890
+ - type: cos_sim_spearman
891
+ value: 42.44208208349051
892
+ - type: manhattan_pearson
893
+ value: 42.099358471578306
894
+ - type: manhattan_spearman
895
+ value: 42.50283181486304
896
+ - type: euclidean_pearson
897
+ value: 42.07954956675317
898
+ - type: euclidean_spearman
899
+ value: 42.453014115018554
900
+ - type: main_score
901
+ value: 42.44208208349051
902
+ task:
903
+ type: STS
904
+ - dataset:
905
+ config: default
906
+ name: MTEB QBQTC
907
+ revision: None
908
+ split: test
909
+ type: C-MTEB/QBQTC
910
+ metrics:
911
+ - type: cos_sim_pearson
912
+ value: 39.19092968089104
913
+ - type: cos_sim_spearman
914
+ value: 41.5174661348832
915
+ - type: manhattan_pearson
916
+ value: 37.91587646684523
917
+ - type: manhattan_spearman
918
+ value: 41.536668677987194
919
+ - type: euclidean_pearson
920
+ value: 37.91079973901135
921
+ - type: euclidean_spearman
922
+ value: 41.51833855501128
923
+ - type: main_score
924
+ value: 41.5174661348832
925
+ task:
926
+ type: STS
927
+ - dataset:
928
+ config: zh
929
+ name: MTEB STS22 (zh)
930
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
931
+ split: test
932
+ type: mteb/sts22-crosslingual-sts
933
+ metrics:
934
+ - type: cos_sim_pearson
935
+ value: 62.029449510721605
936
+ - type: cos_sim_spearman
937
+ value: 66.31935471251364
938
+ - type: manhattan_pearson
939
+ value: 63.63179975157496
940
+ - type: manhattan_spearman
941
+ value: 66.3007950466125
942
+ - type: euclidean_pearson
943
+ value: 63.59752734041086
944
+ - type: euclidean_spearman
945
+ value: 66.31935471251364
946
+ - type: main_score
947
+ value: 66.31935471251364
948
+ task:
949
+ type: STS
950
+ - dataset:
951
+ config: default
952
+ name: MTEB STSB
953
+ revision: None
954
+ split: test
955
+ type: C-MTEB/STSB
956
+ metrics:
957
+ - type: cos_sim_pearson
958
+ value: 81.81459862563769
959
+ - type: cos_sim_spearman
960
+ value: 82.15323953301453
961
+ - type: manhattan_pearson
962
+ value: 81.61904305126016
963
+ - type: manhattan_spearman
964
+ value: 82.1361073852468
965
+ - type: euclidean_pearson
966
+ value: 81.60799063723992
967
+ - type: euclidean_spearman
968
+ value: 82.15405405083231
969
+ - type: main_score
970
+ value: 82.15323953301453
971
+ task:
972
+ type: STS
973
+ - dataset:
974
+ config: default
975
+ name: MTEB T2Reranking
976
+ revision: None
977
+ split: dev
978
+ type: C-MTEB/T2Reranking
979
+ metrics:
980
+ - type: map
981
+ value: 69.13560834260383
982
+ - type: mrr
983
+ value: 79.95749642669074
984
+ - type: main_score
985
+ value: 69.13560834260383
986
+ task:
987
+ type: Reranking
988
+ - dataset:
989
+ config: default
990
+ name: MTEB T2Retrieval
991
+ revision: None
992
+ split: dev
993
+ type: C-MTEB/T2Retrieval
994
+ metrics:
995
+ - type: map_at_1
996
+ value: 28.041
997
+ - type: map_at_10
998
+ value: 78.509
999
+ - type: map_at_100
1000
+ value: 82.083
1001
+ - type: map_at_1000
1002
+ value: 82.143
1003
+ - type: map_at_3
1004
+ value: 55.345
1005
+ - type: map_at_5
1006
+ value: 67.899
1007
+ - type: mrr_at_1
1008
+ value: 90.86
1009
+ - type: mrr_at_10
1010
+ value: 93.31
1011
+ - type: mrr_at_100
1012
+ value: 93.388
1013
+ - type: mrr_at_1000
1014
+ value: 93.391
1015
+ - type: mrr_at_3
1016
+ value: 92.92200000000001
1017
+ - type: mrr_at_5
1018
+ value: 93.167
1019
+ - type: ndcg_at_1
1020
+ value: 90.86
1021
+ - type: ndcg_at_10
1022
+ value: 85.875
1023
+ - type: ndcg_at_100
1024
+ value: 89.269
1025
+ - type: ndcg_at_1000
1026
+ value: 89.827
1027
+ - type: ndcg_at_3
1028
+ value: 87.254
1029
+ - type: ndcg_at_5
1030
+ value: 85.855
1031
+ - type: precision_at_1
1032
+ value: 90.86
1033
+ - type: precision_at_10
1034
+ value: 42.488
1035
+ - type: precision_at_100
1036
+ value: 5.029
1037
+ - type: precision_at_1000
1038
+ value: 0.516
1039
+ - type: precision_at_3
1040
+ value: 76.172
1041
+ - type: precision_at_5
1042
+ value: 63.759
1043
+ - type: recall_at_1
1044
+ value: 28.041
1045
+ - type: recall_at_10
1046
+ value: 84.829
1047
+ - type: recall_at_100
1048
+ value: 95.89999999999999
1049
+ - type: recall_at_1000
1050
+ value: 98.665
1051
+ - type: recall_at_3
1052
+ value: 57.009
1053
+ - type: recall_at_5
1054
+ value: 71.188
1055
+ - type: main_score
1056
+ value: 85.875
1057
+ task:
1058
+ type: Retrieval
1059
+ - dataset:
1060
+ config: default
1061
+ name: MTEB TNews
1062
+ revision: None
1063
+ split: validation
1064
+ type: C-MTEB/TNews-classification
1065
+ metrics:
1066
+ - type: accuracy
1067
+ value: 54.309000000000005
1068
+ - type: accuracy_stderr
1069
+ value: 0.4694347665011627
1070
+ - type: f1
1071
+ value: 52.598803987889255
1072
+ - type: f1_stderr
1073
+ value: 0.5191189533227434
1074
+ - type: main_score
1075
+ value: 54.309000000000005
1076
+ task:
1077
+ type: Classification
1078
+ - dataset:
1079
+ config: default
1080
+ name: MTEB ThuNewsClusteringP2P
1081
+ revision: None
1082
+ split: test
1083
+ type: C-MTEB/ThuNewsClusteringP2P
1084
+ metrics:
1085
+ - type: v_measure
1086
+ value: 76.64191229011249
1087
+ - type: v_measure_std
1088
+ value: 2.807206940615986
1089
+ - type: main_score
1090
+ value: 76.64191229011249
1091
+ task:
1092
+ type: Clustering
1093
+ - dataset:
1094
+ config: default
1095
+ name: MTEB ThuNewsClusteringS2S
1096
+ revision: None
1097
+ split: test
1098
+ type: C-MTEB/ThuNewsClusteringS2S
1099
+ metrics:
1100
+ - type: v_measure
1101
+ value: 71.02529199411326
1102
+ - type: v_measure_std
1103
+ value: 2.0547855888165945
1104
+ - type: main_score
1105
+ value: 71.02529199411326
1106
+ task:
1107
+ type: Clustering
1108
+ - dataset:
1109
+ config: default
1110
+ name: MTEB VideoRetrieval
1111
+ revision: None
1112
+ split: dev
1113
+ type: C-MTEB/VideoRetrieval
1114
+ metrics:
1115
+ - type: map_at_1
1116
+ value: 67.30000000000001
1117
+ - type: map_at_10
1118
+ value: 76.819
1119
+ - type: map_at_100
1120
+ value: 77.141
1121
+ - type: map_at_1000
1122
+ value: 77.142
1123
+ - type: map_at_3
1124
+ value: 75.233
1125
+ - type: map_at_5
1126
+ value: 76.163
1127
+ - type: mrr_at_1
1128
+ value: 67.30000000000001
1129
+ - type: mrr_at_10
1130
+ value: 76.819
1131
+ - type: mrr_at_100
1132
+ value: 77.141
1133
+ - type: mrr_at_1000
1134
+ value: 77.142
1135
+ - type: mrr_at_3
1136
+ value: 75.233
1137
+ - type: mrr_at_5
1138
+ value: 76.163
1139
+ - type: ndcg_at_1
1140
+ value: 67.30000000000001
1141
+ - type: ndcg_at_10
1142
+ value: 80.93599999999999
1143
+ - type: ndcg_at_100
1144
+ value: 82.311
1145
+ - type: ndcg_at_1000
1146
+ value: 82.349
1147
+ - type: ndcg_at_3
1148
+ value: 77.724
1149
+ - type: ndcg_at_5
1150
+ value: 79.406
1151
+ - type: precision_at_1
1152
+ value: 67.30000000000001
1153
+ - type: precision_at_10
1154
+ value: 9.36
1155
+ - type: precision_at_100
1156
+ value: 0.996
1157
+ - type: precision_at_1000
1158
+ value: 0.1
1159
+ - type: precision_at_3
1160
+ value: 28.299999999999997
1161
+ - type: precision_at_5
1162
+ value: 17.8
1163
+ - type: recall_at_1
1164
+ value: 67.30000000000001
1165
+ - type: recall_at_10
1166
+ value: 93.60000000000001
1167
+ - type: recall_at_100
1168
+ value: 99.6
1169
+ - type: recall_at_1000
1170
+ value: 99.9
1171
+ - type: recall_at_3
1172
+ value: 84.89999999999999
1173
+ - type: recall_at_5
1174
+ value: 89.0
1175
+ - type: main_score
1176
+ value: 80.93599999999999
1177
+ task:
1178
+ type: Retrieval
1179
+ - dataset:
1180
+ config: default
1181
+ name: MTEB Waimai
1182
+ revision: None
1183
+ split: test
1184
+ type: C-MTEB/waimai-classification
1185
+ metrics:
1186
+ - type: accuracy
1187
+ value: 89.47
1188
+ - type: accuracy_stderr
1189
+ value: 0.26476404589747476
1190
+ - type: ap
1191
+ value: 75.49555223825388
1192
+ - type: ap_stderr
1193
+ value: 0.596040511982105
1194
+ - type: f1
1195
+ value: 88.01797939221065
1196
+ - type: f1_stderr
1197
+ value: 0.27168216797281214
1198
+ - type: main_score
1199
+ value: 89.47
1200
+ task:
1201
+ type: Classification
1202
+ tags:
1203
+ - mteb
1204
+ ---
1205
+ <h2 align="left">XYZ-embedding-zh-v2</h2>
1206
+
1207
+ ## Usage (Sentence Transformers)
1208
+
1209
+ First install the Sentence Transformers library:
1210
+
1211
+ ```bash
1212
+ pip install -U sentence-transformers
1213
+ ```
1214
+ Then you can load this model and run inference.
1215
+ ```python
1216
+ from sentence_transformers import SentenceTransformer
1217
+
1218
+ # Download from the 🤗 Hub
1219
+ model = SentenceTransformer("fangxq/XYZ-embedding-zh-v2")
1220
+ # Run inference
1221
+ sentences = [
1222
+ 'The weather is lovely today.',
1223
+ "It's so sunny outside!",
1224
+ 'He drove to the stadium.',
1225
+ ]
1226
+ embeddings = model.encode(sentences)
1227
+ print(embeddings.shape)
1228
+ # [3, 1792]
1229
+
1230
+ # Get the similarity scores for the embeddings
1231
+ similarities = model.similarity(embeddings, embeddings)
1232
+ print(similarities.shape)
1233
+ # [3, 3]
1234
+ ```
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-12,
16
+ "max_position_embeddings": 512,
17
+ "model_type": "bert",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "pad_token_id": 0,
21
+ "pooler_fc_size": 768,
22
+ "pooler_num_attention_heads": 12,
23
+ "pooler_num_fc_layers": 3,
24
+ "pooler_size_per_head": 128,
25
+ "pooler_type": "first_token_transform",
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 21128
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.0",
5
+ "pytorch": "2.2.2+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ }
20
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8090436280027987a24ffb67f66976b4069d4812c580f271ef7fe4720a037bcf
3
+ size 1302216550
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff