thanhpham1 commited on
Commit
47f2116
·
verified ·
1 Parent(s): 5e83fc4

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1025 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:5146
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: sentence-transformers/all-mpnet-base-v2
14
+ widget:
15
+ - source_sentence: 'import subprocess
16
+
17
+ zen_of_python = subprocess.check_output(["python", "-c", "import this"])
18
+
19
+ corpus = zen_of_python.split()
20
+
21
+
22
+ num_partitions = 3
23
+
24
+ chunk = len(corpus) // num_partitions
25
+
26
+ partitions = [
27
+
28
+ corpus[i * chunk: (i + 1) * chunk] for i in range(num_partitions)
29
+
30
+ ]
31
+
32
+
33
+ Mapping Data#
34
+
35
+ To determine the map phase, we require a map function to use on each document.
36
+
37
+ The output is the pair (word, 1) for every word found in a document.
38
+
39
+ For basic text documents we load as Python strings, the process is as follows:
40
+
41
+
42
+ def map_function(document):
43
+
44
+ for word in document.lower().split():
45
+
46
+ yield word, 1
47
+
48
+
49
+ We use the apply_map function on a large collection of documents by marking it
50
+ as a task in Ray using the @ray.remote decorator.
51
+
52
+ When we call apply_map, we apply it to three sets of document data (num_partitions=3).
53
+
54
+ The apply_map function returns three lists, one for each partition so that Ray
55
+ can rearrange the results of the map phase and distribute them to the appropriate
56
+ nodes.
57
+
58
+
59
+ import ray'
60
+ sentences:
61
+ - What does the map_function yield for each word in a document?
62
+ - What does PBT do differently from traditional hyperparameter tuning methods?
63
+ - What is returned by task_with_static_multiple_returns_good in the Actor class?
64
+ - source_sentence: '192.168.0.15 7241 Worker ffffffffffffffffffffffffffffffffffffffff0100000001000000
65
+ 10 MiB PINNED_IN_MEMORY (deserialize task arg)
66
+
67
+ __main__.f
68
+
69
+
70
+ 192.168.0.15 7207 Driver ffffffffffffffffffffffffffffffffffffffff0100000001000000
71
+ 15 MiB USED_BY_PENDING_TASK (put object)
72
+
73
+ test.py:
74
+
75
+ <module>:28
76
+
77
+
78
+ While the task is running, we see that ray memory shows both a LOCAL_REFERENCE
79
+ and a USED_BY_PENDING_TASK reference for the object in the driver process. The
80
+ worker process also holds a reference to the object because the Python arg is
81
+ directly referencing the memory in the plasma, so it can’t be evicted; therefore
82
+ it is PINNED_IN_MEMORY.
83
+
84
+ 4. Serialized ObjectRef references
85
+
86
+ @ray.remote
87
+
88
+ def f(arg):
89
+
90
+ while True:
91
+
92
+ pass
93
+
94
+
95
+ a = ray.put(None)
96
+
97
+ b = f.remote([a])'
98
+ sentences:
99
+ - How can a dataset be created from in-memory data?
100
+ - What does Algorithm.training_step return for the new API stack?
101
+ - Why can't the object be evicted while the worker process holds a reference?
102
+ - source_sentence: 'For distributed systems engineers, Ray automatically handles key
103
+ processes:
104
+
105
+
106
+ Orchestration–Managing the various components of a distributed system.
107
+
108
+ Scheduling–Coordinating when and where tasks are executed.
109
+
110
+ Fault tolerance–Ensuring tasks complete regardless of inevitable points of failure.
111
+
112
+ Auto-scaling–Adjusting the number of resources allocated to dynamic demand.
113
+
114
+
115
+
116
+ What you can do with Ray#
117
+
118
+ These are some common ML workloads that individuals, organizations, and companies
119
+ leverage Ray to build their AI applications:
120
+
121
+
122
+ Batch inference on CPUs and GPUs
123
+
124
+ Model serving
125
+
126
+ Distributed training of large models
127
+
128
+ Parallel hyperparameter tuning experiments
129
+
130
+ Reinforcement learning
131
+
132
+ ML platform
133
+
134
+
135
+
136
+
137
+ Ray framework#
138
+
139
+
140
+
141
+
142
+
143
+
144
+
145
+ Stack of Ray libraries - unified toolkit for ML workloads.
146
+
147
+
148
+
149
+
150
+ Ray’s unified compute framework consists of three layers:'
151
+ sentences:
152
+ - What does remote_worker_envs control when num_envs_per_env_runner > 1?
153
+ - How is the learning rate set in the config?
154
+ - According to the excerpt, what does Ray automatically handle for distributed systems
155
+ engineers?
156
+ - source_sentence: 'RLlib component tree#
157
+
158
+ The following is the structure of the RLlib component tree, showing under which
159
+ name you can
160
+
161
+ access a subcomponent’s own checkpoint within the higher-level checkpoint. At
162
+ the highest level
163
+
164
+ is the Algorithm class:
165
+
166
+ algorithm/
167
+
168
+ learner_group/
169
+
170
+ learner/
171
+
172
+ rl_module/
173
+
174
+ default_policy/ # <- single-agent case
175
+
176
+ [module ID 1]/ # <- multi-agent case
177
+
178
+ [module ID 2]/ # ...
179
+
180
+ env_runner/
181
+
182
+ env_to_module_connector/
183
+
184
+ module_to_env_connector/
185
+
186
+
187
+ Note
188
+
189
+ The env_runner/ subcomponent currently doesn’t hold a copy of the RLModule
190
+
191
+ checkpoint because it’s already saved under learner/. The Ray team is working
192
+ on resolving
193
+
194
+ this issue, probably through soft-linking to avoid duplicate files and unnecessary
195
+ disk usage.
196
+
197
+
198
+ Creating instances from a checkpoint with from_checkpoint#
199
+
200
+ Once you have a checkpoint of either a trained Algorithm or
201
+
202
+ any of its subcomponents, you can recreate new objects directly
203
+
204
+ from this checkpoint.
205
+
206
+ The following are two examples:'
207
+ sentences:
208
+ - Why does RLlib convert each row into a single-step episode by default?
209
+ - What is at the highest level of the RLlib component tree?
210
+ - What is recommended regarding AOF when using storage options that do not support
211
+ append operations?
212
+ - source_sentence: 'Option 2: Manually Create URL (slower to implement, but recommended
213
+ for production environments)#
214
+
215
+ The second option is to manually create this URL by pattern-matching your specific
216
+ use case with one of the following examples.
217
+
218
+ This is recommended because it provides finer-grained control over which repository
219
+ branch and commit to use when generating your dependency zip file.
220
+
221
+ These options prevent consistency issues on Ray Clusters (see the warning above
222
+ for more info).
223
+
224
+ To create the URL, pick a URL template below that fits your use case, and fill
225
+ in all parameters in brackets (e.g. [username], [repository], etc.) with the specific
226
+ values from your repository.
227
+
228
+ For instance, suppose your GitHub username is example_user, the repository’s name
229
+ is example_repository, and the desired commit hash is abcdefg.
230
+
231
+ If example_repository is public and you want to retrieve the abcdefg commit (which
232
+ matches the first example use case), the URL would be:'
233
+ sentences:
234
+ - What can Ray Train and Ray Tune be used together for?
235
+ - How do you create the URL for Option 2?
236
+ - Which function can you use to read a CSV file for batch processing in Ray?
237
+ pipeline_tag: sentence-similarity
238
+ library_name: sentence-transformers
239
+ metrics:
240
+ - cosine_accuracy@1
241
+ - cosine_accuracy@3
242
+ - cosine_accuracy@5
243
+ - cosine_accuracy@10
244
+ - cosine_precision@1
245
+ - cosine_precision@3
246
+ - cosine_precision@5
247
+ - cosine_precision@10
248
+ - cosine_recall@1
249
+ - cosine_recall@3
250
+ - cosine_recall@5
251
+ - cosine_recall@10
252
+ - cosine_ndcg@10
253
+ - cosine_mrr@10
254
+ - cosine_map@100
255
+ model-index:
256
+ - name: Fine-tune-all-mpnet-base-v2
257
+ results:
258
+ - task:
259
+ type: information-retrieval
260
+ name: Information Retrieval
261
+ dataset:
262
+ name: dim 768
263
+ type: dim_768
264
+ metrics:
265
+ - type: cosine_accuracy@1
266
+ value: 0.5874125874125874
267
+ name: Cosine Accuracy@1
268
+ - type: cosine_accuracy@3
269
+ value: 0.6818181818181818
270
+ name: Cosine Accuracy@3
271
+ - type: cosine_accuracy@5
272
+ value: 0.7954545454545454
273
+ name: Cosine Accuracy@5
274
+ - type: cosine_accuracy@10
275
+ value: 0.8863636363636364
276
+ name: Cosine Accuracy@10
277
+ - type: cosine_precision@1
278
+ value: 0.5874125874125874
279
+ name: Cosine Precision@1
280
+ - type: cosine_precision@3
281
+ value: 0.5180652680652681
282
+ name: Cosine Precision@3
283
+ - type: cosine_precision@5
284
+ value: 0.3944055944055945
285
+ name: Cosine Precision@5
286
+ - type: cosine_precision@10
287
+ value: 0.23199300699300698
288
+ name: Cosine Precision@10
289
+ - type: cosine_recall@1
290
+ value: 0.263986013986014
291
+ name: Cosine Recall@1
292
+ - type: cosine_recall@3
293
+ value: 0.6073717948717948
294
+ name: Cosine Recall@3
295
+ - type: cosine_recall@5
296
+ value: 0.7521853146853147
297
+ name: Cosine Recall@5
298
+ - type: cosine_recall@10
299
+ value: 0.8780594405594405
300
+ name: Cosine Recall@10
301
+ - type: cosine_ndcg@10
302
+ value: 0.7386606603331115
303
+ name: Cosine Ndcg@10
304
+ - type: cosine_mrr@10
305
+ value: 0.6635614385614379
306
+ name: Cosine Mrr@10
307
+ - type: cosine_map@100
308
+ value: 0.6988731642119342
309
+ name: Cosine Map@100
310
+ - task:
311
+ type: information-retrieval
312
+ name: Information Retrieval
313
+ dataset:
314
+ name: dim 512
315
+ type: dim_512
316
+ metrics:
317
+ - type: cosine_accuracy@1
318
+ value: 0.5734265734265734
319
+ name: Cosine Accuracy@1
320
+ - type: cosine_accuracy@3
321
+ value: 0.666083916083916
322
+ name: Cosine Accuracy@3
323
+ - type: cosine_accuracy@5
324
+ value: 0.8006993006993007
325
+ name: Cosine Accuracy@5
326
+ - type: cosine_accuracy@10
327
+ value: 0.8811188811188811
328
+ name: Cosine Accuracy@10
329
+ - type: cosine_precision@1
330
+ value: 0.5734265734265734
331
+ name: Cosine Precision@1
332
+ - type: cosine_precision@3
333
+ value: 0.5052447552447552
334
+ name: Cosine Precision@3
335
+ - type: cosine_precision@5
336
+ value: 0.39370629370629373
337
+ name: Cosine Precision@5
338
+ - type: cosine_precision@10
339
+ value: 0.23094405594405593
340
+ name: Cosine Precision@10
341
+ - type: cosine_recall@1
342
+ value: 0.26005244755244755
343
+ name: Cosine Recall@1
344
+ - type: cosine_recall@3
345
+ value: 0.5914918414918414
346
+ name: Cosine Recall@3
347
+ - type: cosine_recall@5
348
+ value: 0.7543706293706294
349
+ name: Cosine Recall@5
350
+ - type: cosine_recall@10
351
+ value: 0.8726689976689977
352
+ name: Cosine Recall@10
353
+ - type: cosine_ndcg@10
354
+ value: 0.7303335650898982
355
+ name: Cosine Ndcg@10
356
+ - type: cosine_mrr@10
357
+ value: 0.652235958485958
358
+ name: Cosine Mrr@10
359
+ - type: cosine_map@100
360
+ value: 0.689387057080973
361
+ name: Cosine Map@100
362
+ - task:
363
+ type: information-retrieval
364
+ name: Information Retrieval
365
+ dataset:
366
+ name: dim 256
367
+ type: dim_256
368
+ metrics:
369
+ - type: cosine_accuracy@1
370
+ value: 0.5664335664335665
371
+ name: Cosine Accuracy@1
372
+ - type: cosine_accuracy@3
373
+ value: 0.666083916083916
374
+ name: Cosine Accuracy@3
375
+ - type: cosine_accuracy@5
376
+ value: 0.7797202797202797
377
+ name: Cosine Accuracy@5
378
+ - type: cosine_accuracy@10
379
+ value: 0.8583916083916084
380
+ name: Cosine Accuracy@10
381
+ - type: cosine_precision@1
382
+ value: 0.5664335664335665
383
+ name: Cosine Precision@1
384
+ - type: cosine_precision@3
385
+ value: 0.5011655011655011
386
+ name: Cosine Precision@3
387
+ - type: cosine_precision@5
388
+ value: 0.38636363636363635
389
+ name: Cosine Precision@5
390
+ - type: cosine_precision@10
391
+ value: 0.22534965034965035
392
+ name: Cosine Precision@10
393
+ - type: cosine_recall@1
394
+ value: 0.2577214452214452
395
+ name: Cosine Recall@1
396
+ - type: cosine_recall@3
397
+ value: 0.5893065268065268
398
+ name: Cosine Recall@3
399
+ - type: cosine_recall@5
400
+ value: 0.7354312354312353
401
+ name: Cosine Recall@5
402
+ - type: cosine_recall@10
403
+ value: 0.8487762237762237
404
+ name: Cosine Recall@10
405
+ - type: cosine_ndcg@10
406
+ value: 0.7167871578299232
407
+ name: Cosine Ndcg@10
408
+ - type: cosine_mrr@10
409
+ value: 0.6432942057942053
410
+ name: Cosine Mrr@10
411
+ - type: cosine_map@100
412
+ value: 0.6823584299690649
413
+ name: Cosine Map@100
414
+ - task:
415
+ type: information-retrieval
416
+ name: Information Retrieval
417
+ dataset:
418
+ name: dim 128
419
+ type: dim_128
420
+ metrics:
421
+ - type: cosine_accuracy@1
422
+ value: 0.5402097902097902
423
+ name: Cosine Accuracy@1
424
+ - type: cosine_accuracy@3
425
+ value: 0.6398601398601399
426
+ name: Cosine Accuracy@3
427
+ - type: cosine_accuracy@5
428
+ value: 0.743006993006993
429
+ name: Cosine Accuracy@5
430
+ - type: cosine_accuracy@10
431
+ value: 0.8304195804195804
432
+ name: Cosine Accuracy@10
433
+ - type: cosine_precision@1
434
+ value: 0.5402097902097902
435
+ name: Cosine Precision@1
436
+ - type: cosine_precision@3
437
+ value: 0.47960372960372966
438
+ name: Cosine Precision@3
439
+ - type: cosine_precision@5
440
+ value: 0.3678321678321678
441
+ name: Cosine Precision@5
442
+ - type: cosine_precision@10
443
+ value: 0.2181818181818182
444
+ name: Cosine Precision@10
445
+ - type: cosine_recall@1
446
+ value: 0.24519230769230768
447
+ name: Cosine Recall@1
448
+ - type: cosine_recall@3
449
+ value: 0.5623543123543123
450
+ name: Cosine Recall@3
451
+ - type: cosine_recall@5
452
+ value: 0.701048951048951
453
+ name: Cosine Recall@5
454
+ - type: cosine_recall@10
455
+ value: 0.8228438228438228
456
+ name: Cosine Recall@10
457
+ - type: cosine_ndcg@10
458
+ value: 0.6886328428362513
459
+ name: Cosine Ndcg@10
460
+ - type: cosine_mrr@10
461
+ value: 0.6146582584082584
462
+ name: Cosine Mrr@10
463
+ - type: cosine_map@100
464
+ value: 0.6543671947827556
465
+ name: Cosine Map@100
466
+ - task:
467
+ type: information-retrieval
468
+ name: Information Retrieval
469
+ dataset:
470
+ name: dim 64
471
+ type: dim_64
472
+ metrics:
473
+ - type: cosine_accuracy@1
474
+ value: 0.4353146853146853
475
+ name: Cosine Accuracy@1
476
+ - type: cosine_accuracy@3
477
+ value: 0.5332167832167832
478
+ name: Cosine Accuracy@3
479
+ - type: cosine_accuracy@5
480
+ value: 0.6311188811188811
481
+ name: Cosine Accuracy@5
482
+ - type: cosine_accuracy@10
483
+ value: 0.7622377622377622
484
+ name: Cosine Accuracy@10
485
+ - type: cosine_precision@1
486
+ value: 0.4353146853146853
487
+ name: Cosine Precision@1
488
+ - type: cosine_precision@3
489
+ value: 0.3945221445221445
490
+ name: Cosine Precision@3
491
+ - type: cosine_precision@5
492
+ value: 0.3094405594405594
493
+ name: Cosine Precision@5
494
+ - type: cosine_precision@10
495
+ value: 0.19825174825174827
496
+ name: Cosine Precision@10
497
+ - type: cosine_recall@1
498
+ value: 0.19842657342657344
499
+ name: Cosine Recall@1
500
+ - type: cosine_recall@3
501
+ value: 0.46547202797202797
502
+ name: Cosine Recall@3
503
+ - type: cosine_recall@5
504
+ value: 0.5910547785547785
505
+ name: Cosine Recall@5
506
+ - type: cosine_recall@10
507
+ value: 0.7467948717948718
508
+ name: Cosine Recall@10
509
+ - type: cosine_ndcg@10
510
+ value: 0.5953015131317417
511
+ name: Cosine Ndcg@10
512
+ - type: cosine_mrr@10
513
+ value: 0.5138784826284825
514
+ name: Cosine Mrr@10
515
+ - type: cosine_map@100
516
+ value: 0.559206100539383
517
+ name: Cosine Map@100
518
+ ---
519
+
520
+ # Fine-tune-all-mpnet-base-v2
521
+
522
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
523
+
524
+ ## Model Details
525
+
526
+ ### Model Description
527
+ - **Model Type:** Sentence Transformer
528
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 -->
529
+ - **Maximum Sequence Length:** 384 tokens
530
+ - **Output Dimensionality:** 768 dimensions
531
+ - **Similarity Function:** Cosine Similarity
532
+ - **Training Dataset:**
533
+ - json
534
+ - **Language:** en
535
+ - **License:** apache-2.0
536
+
537
+ ### Model Sources
538
+
539
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
540
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
541
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
542
+
543
+ ### Full Model Architecture
544
+
545
+ ```
546
+ SentenceTransformer(
547
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
548
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
549
+ (2): Normalize()
550
+ )
551
+ ```
552
+
553
+ ## Usage
554
+
555
+ ### Direct Usage (Sentence Transformers)
556
+
557
+ First install the Sentence Transformers library:
558
+
559
+ ```bash
560
+ pip install -U sentence-transformers
561
+ ```
562
+
563
+ Then you can load this model and run inference.
564
+ ```python
565
+ from sentence_transformers import SentenceTransformer
566
+
567
+ # Download from the 🤗 Hub
568
+ model = SentenceTransformer("thanhpham1/Fine-tune-all-mpnet-base-v2")
569
+ # Run inference
570
+ sentences = [
571
+ 'Option 2: Manually Create URL (slower to implement, but recommended for production environments)#\nThe second option is to manually create this URL by pattern-matching your specific use case with one of the following examples.\nThis is recommended because it provides finer-grained control over which repository branch and commit to use when generating your dependency zip file.\nThese options prevent consistency issues on Ray Clusters (see the warning above for more info).\nTo create the URL, pick a URL template below that fits your use case, and fill in all parameters in brackets (e.g. [username], [repository], etc.) with the specific values from your repository.\nFor instance, suppose your GitHub username is example_user, the repository’s name is example_repository, and the desired commit hash is abcdefg.\nIf example_repository is public and you want to retrieve the abcdefg commit (which matches the first example use case), the URL would be:',
572
+ 'How do you create the URL for Option 2?',
573
+ 'What can Ray Train and Ray Tune be used together for?',
574
+ ]
575
+ embeddings = model.encode(sentences)
576
+ print(embeddings.shape)
577
+ # [3, 768]
578
+
579
+ # Get the similarity scores for the embeddings
580
+ similarities = model.similarity(embeddings, embeddings)
581
+ print(similarities.shape)
582
+ # [3, 3]
583
+ ```
584
+
585
+ <!--
586
+ ### Direct Usage (Transformers)
587
+
588
+ <details><summary>Click to see the direct usage in Transformers</summary>
589
+
590
+ </details>
591
+ -->
592
+
593
+ <!--
594
+ ### Downstream Usage (Sentence Transformers)
595
+
596
+ You can finetune this model on your own dataset.
597
+
598
+ <details><summary>Click to expand</summary>
599
+
600
+ </details>
601
+ -->
602
+
603
+ <!--
604
+ ### Out-of-Scope Use
605
+
606
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
607
+ -->
608
+
609
+ ## Evaluation
610
+
611
+ ### Metrics
612
+
613
+ #### Information Retrieval
614
+
615
+ * Dataset: `dim_768`
616
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
617
+ ```json
618
+ {
619
+ "truncate_dim": 768
620
+ }
621
+ ```
622
+
623
+ | Metric | Value |
624
+ |:--------------------|:-----------|
625
+ | cosine_accuracy@1 | 0.5874 |
626
+ | cosine_accuracy@3 | 0.6818 |
627
+ | cosine_accuracy@5 | 0.7955 |
628
+ | cosine_accuracy@10 | 0.8864 |
629
+ | cosine_precision@1 | 0.5874 |
630
+ | cosine_precision@3 | 0.5181 |
631
+ | cosine_precision@5 | 0.3944 |
632
+ | cosine_precision@10 | 0.232 |
633
+ | cosine_recall@1 | 0.264 |
634
+ | cosine_recall@3 | 0.6074 |
635
+ | cosine_recall@5 | 0.7522 |
636
+ | cosine_recall@10 | 0.8781 |
637
+ | **cosine_ndcg@10** | **0.7387** |
638
+ | cosine_mrr@10 | 0.6636 |
639
+ | cosine_map@100 | 0.6989 |
640
+
641
+ #### Information Retrieval
642
+
643
+ * Dataset: `dim_512`
644
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
645
+ ```json
646
+ {
647
+ "truncate_dim": 512
648
+ }
649
+ ```
650
+
651
+ | Metric | Value |
652
+ |:--------------------|:-----------|
653
+ | cosine_accuracy@1 | 0.5734 |
654
+ | cosine_accuracy@3 | 0.6661 |
655
+ | cosine_accuracy@5 | 0.8007 |
656
+ | cosine_accuracy@10 | 0.8811 |
657
+ | cosine_precision@1 | 0.5734 |
658
+ | cosine_precision@3 | 0.5052 |
659
+ | cosine_precision@5 | 0.3937 |
660
+ | cosine_precision@10 | 0.2309 |
661
+ | cosine_recall@1 | 0.2601 |
662
+ | cosine_recall@3 | 0.5915 |
663
+ | cosine_recall@5 | 0.7544 |
664
+ | cosine_recall@10 | 0.8727 |
665
+ | **cosine_ndcg@10** | **0.7303** |
666
+ | cosine_mrr@10 | 0.6522 |
667
+ | cosine_map@100 | 0.6894 |
668
+
669
+ #### Information Retrieval
670
+
671
+ * Dataset: `dim_256`
672
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
673
+ ```json
674
+ {
675
+ "truncate_dim": 256
676
+ }
677
+ ```
678
+
679
+ | Metric | Value |
680
+ |:--------------------|:-----------|
681
+ | cosine_accuracy@1 | 0.5664 |
682
+ | cosine_accuracy@3 | 0.6661 |
683
+ | cosine_accuracy@5 | 0.7797 |
684
+ | cosine_accuracy@10 | 0.8584 |
685
+ | cosine_precision@1 | 0.5664 |
686
+ | cosine_precision@3 | 0.5012 |
687
+ | cosine_precision@5 | 0.3864 |
688
+ | cosine_precision@10 | 0.2253 |
689
+ | cosine_recall@1 | 0.2577 |
690
+ | cosine_recall@3 | 0.5893 |
691
+ | cosine_recall@5 | 0.7354 |
692
+ | cosine_recall@10 | 0.8488 |
693
+ | **cosine_ndcg@10** | **0.7168** |
694
+ | cosine_mrr@10 | 0.6433 |
695
+ | cosine_map@100 | 0.6824 |
696
+
697
+ #### Information Retrieval
698
+
699
+ * Dataset: `dim_128`
700
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
701
+ ```json
702
+ {
703
+ "truncate_dim": 128
704
+ }
705
+ ```
706
+
707
+ | Metric | Value |
708
+ |:--------------------|:-----------|
709
+ | cosine_accuracy@1 | 0.5402 |
710
+ | cosine_accuracy@3 | 0.6399 |
711
+ | cosine_accuracy@5 | 0.743 |
712
+ | cosine_accuracy@10 | 0.8304 |
713
+ | cosine_precision@1 | 0.5402 |
714
+ | cosine_precision@3 | 0.4796 |
715
+ | cosine_precision@5 | 0.3678 |
716
+ | cosine_precision@10 | 0.2182 |
717
+ | cosine_recall@1 | 0.2452 |
718
+ | cosine_recall@3 | 0.5624 |
719
+ | cosine_recall@5 | 0.701 |
720
+ | cosine_recall@10 | 0.8228 |
721
+ | **cosine_ndcg@10** | **0.6886** |
722
+ | cosine_mrr@10 | 0.6147 |
723
+ | cosine_map@100 | 0.6544 |
724
+
725
+ #### Information Retrieval
726
+
727
+ * Dataset: `dim_64`
728
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
729
+ ```json
730
+ {
731
+ "truncate_dim": 64
732
+ }
733
+ ```
734
+
735
+ | Metric | Value |
736
+ |:--------------------|:-----------|
737
+ | cosine_accuracy@1 | 0.4353 |
738
+ | cosine_accuracy@3 | 0.5332 |
739
+ | cosine_accuracy@5 | 0.6311 |
740
+ | cosine_accuracy@10 | 0.7622 |
741
+ | cosine_precision@1 | 0.4353 |
742
+ | cosine_precision@3 | 0.3945 |
743
+ | cosine_precision@5 | 0.3094 |
744
+ | cosine_precision@10 | 0.1983 |
745
+ | cosine_recall@1 | 0.1984 |
746
+ | cosine_recall@3 | 0.4655 |
747
+ | cosine_recall@5 | 0.5911 |
748
+ | cosine_recall@10 | 0.7468 |
749
+ | **cosine_ndcg@10** | **0.5953** |
750
+ | cosine_mrr@10 | 0.5139 |
751
+ | cosine_map@100 | 0.5592 |
752
+
753
+ <!--
754
+ ## Bias, Risks and Limitations
755
+
756
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
757
+ -->
758
+
759
+ <!--
760
+ ### Recommendations
761
+
762
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
763
+ -->
764
+
765
+ ## Training Details
766
+
767
+ ### Training Dataset
768
+
769
+ #### json
770
+
771
+ * Dataset: json
772
+ * Size: 5,146 training samples
773
+ * Columns: <code>anchor</code> and <code>positive</code>
774
+ * Approximate statistics based on the first 1000 samples:
775
+ | | anchor | positive |
776
+ |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
777
+ | type | string | string |
778
+ | details | <ul><li>min: 8 tokens</li><li>mean: 17.8 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 66 tokens</li><li>mean: 225.02 tokens</li><li>max: 384 tokens</li></ul> |
779
+ * Samples:
780
+ | anchor | positive |
781
+ |:----------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
782
+ | <code>Does Ray Train work with vanilla TensorFlow in addition to TensorFlow with Keras?</code> | <code>Get Started with Distributed Training using TensorFlow/Keras#<br>Ray Train’s TensorFlow integration enables you<br>to scale your TensorFlow and Keras training functions to many machines and GPUs.<br>On a technical level, Ray Train schedules your training workers<br>and configures TF_CONFIG for you, allowing you to run<br>your MultiWorkerMirroredStrategy training script. See Distributed<br>training with TensorFlow<br>for more information.<br>Most of the examples in this guide use TensorFlow with Keras, but<br>Ray Train also works with vanilla TensorFlow.<br><br>Quickstart#<br>import ray<br>import tensorflow as tf<br><br>from ray import train<br>from ray.train import ScalingConfig<br>from ray.train.tensorflow import TensorflowTrainer<br>from ray.train.tensorflow.keras import ReportCheckpointCallback<br><br># If using GPUs, set this to True.<br>use_gpu = False<br><br>a = 5<br>b = 10<br>size = 100</code> |
783
+ | <code>What type of failure can Ray automatically recover from?</code> | <code>Ray can automatically recover from data loss but not owner failure.<br><br>Recovering from data loss#<br>When an object value is lost from the object store, such as during node<br>failures, Ray will use lineage reconstruction to recover the object.<br>Ray will first automatically attempt to recover the value by looking<br>for copies of the same object on other nodes. If none are found, then Ray will<br>automatically recover the value by re-executing<br>the task that previously created the value. Arguments to the task are<br>recursively reconstructed through the same mechanism.<br>Lineage reconstruction currently has the following limitations:</code> |
784
+ | <code>From which directory should you run the zip command to ensure the proper zip file structure?</code> | <code>Suppose instead you want to host your files in your /some_path/example_dir directory remotely and provide a remote URI.<br>You would need to first compress the example_dir directory into a zip file.<br>There should be no other files or directories at the top level of the zip file, other than example_dir.<br>You can use the following command in the Terminal to do this:<br>cd /some_path<br>zip -r zip_file_name.zip example_dir<br><br>Note that this command must be run from the parent directory of the desired working_dir to ensure that the resulting zip file contains a single top-level directory.<br>In general, the zip file’s name and the top-level directory’s name can be anything.<br>The top-level directory’s contents will be used as the working_dir (or py_module).<br>You can check that the zip file contains a single top-level directory by running the following command in the Terminal:<br>zipinfo -1 zip_file_name.zip<br># example_dir/<br># example_dir/my_file_1.txt<br># example_dir/subdir/my_file_2.txt</code> |
785
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
786
+ ```json
787
+ {
788
+ "loss": "MultipleNegativesRankingLoss",
789
+ "matryoshka_dims": [
790
+ 768,
791
+ 512,
792
+ 256,
793
+ 128,
794
+ 64
795
+ ],
796
+ "matryoshka_weights": [
797
+ 1,
798
+ 1,
799
+ 1,
800
+ 1,
801
+ 1
802
+ ],
803
+ "n_dims_per_step": -1
804
+ }
805
+ ```
806
+
807
+ ### Training Hyperparameters
808
+ #### Non-Default Hyperparameters
809
+
810
+ - `eval_strategy`: epoch
811
+ - `per_device_train_batch_size`: 32
812
+ - `per_device_eval_batch_size`: 16
813
+ - `gradient_accumulation_steps`: 16
814
+ - `learning_rate`: 2e-05
815
+ - `num_train_epochs`: 4
816
+ - `lr_scheduler_type`: cosine
817
+ - `warmup_ratio`: 0.1
818
+ - `bf16`: True
819
+ - `tf32`: False
820
+ - `load_best_model_at_end`: True
821
+ - `optim`: adamw_torch_fused
822
+ - `batch_sampler`: no_duplicates
823
+
824
+ #### All Hyperparameters
825
+ <details><summary>Click to expand</summary>
826
+
827
+ - `overwrite_output_dir`: False
828
+ - `do_predict`: False
829
+ - `eval_strategy`: epoch
830
+ - `prediction_loss_only`: True
831
+ - `per_device_train_batch_size`: 32
832
+ - `per_device_eval_batch_size`: 16
833
+ - `per_gpu_train_batch_size`: None
834
+ - `per_gpu_eval_batch_size`: None
835
+ - `gradient_accumulation_steps`: 16
836
+ - `eval_accumulation_steps`: None
837
+ - `torch_empty_cache_steps`: None
838
+ - `learning_rate`: 2e-05
839
+ - `weight_decay`: 0.0
840
+ - `adam_beta1`: 0.9
841
+ - `adam_beta2`: 0.999
842
+ - `adam_epsilon`: 1e-08
843
+ - `max_grad_norm`: 1.0
844
+ - `num_train_epochs`: 4
845
+ - `max_steps`: -1
846
+ - `lr_scheduler_type`: cosine
847
+ - `lr_scheduler_kwargs`: {}
848
+ - `warmup_ratio`: 0.1
849
+ - `warmup_steps`: 0
850
+ - `log_level`: passive
851
+ - `log_level_replica`: warning
852
+ - `log_on_each_node`: True
853
+ - `logging_nan_inf_filter`: True
854
+ - `save_safetensors`: True
855
+ - `save_on_each_node`: False
856
+ - `save_only_model`: False
857
+ - `restore_callback_states_from_checkpoint`: False
858
+ - `no_cuda`: False
859
+ - `use_cpu`: False
860
+ - `use_mps_device`: False
861
+ - `seed`: 42
862
+ - `data_seed`: None
863
+ - `jit_mode_eval`: False
864
+ - `use_ipex`: False
865
+ - `bf16`: True
866
+ - `fp16`: False
867
+ - `fp16_opt_level`: O1
868
+ - `half_precision_backend`: auto
869
+ - `bf16_full_eval`: False
870
+ - `fp16_full_eval`: False
871
+ - `tf32`: False
872
+ - `local_rank`: 0
873
+ - `ddp_backend`: None
874
+ - `tpu_num_cores`: None
875
+ - `tpu_metrics_debug`: False
876
+ - `debug`: []
877
+ - `dataloader_drop_last`: False
878
+ - `dataloader_num_workers`: 0
879
+ - `dataloader_prefetch_factor`: None
880
+ - `past_index`: -1
881
+ - `disable_tqdm`: False
882
+ - `remove_unused_columns`: True
883
+ - `label_names`: None
884
+ - `load_best_model_at_end`: True
885
+ - `ignore_data_skip`: False
886
+ - `fsdp`: []
887
+ - `fsdp_min_num_params`: 0
888
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
889
+ - `fsdp_transformer_layer_cls_to_wrap`: None
890
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
891
+ - `deepspeed`: None
892
+ - `label_smoothing_factor`: 0.0
893
+ - `optim`: adamw_torch_fused
894
+ - `optim_args`: None
895
+ - `adafactor`: False
896
+ - `group_by_length`: False
897
+ - `length_column_name`: length
898
+ - `ddp_find_unused_parameters`: None
899
+ - `ddp_bucket_cap_mb`: None
900
+ - `ddp_broadcast_buffers`: False
901
+ - `dataloader_pin_memory`: True
902
+ - `dataloader_persistent_workers`: False
903
+ - `skip_memory_metrics`: True
904
+ - `use_legacy_prediction_loop`: False
905
+ - `push_to_hub`: False
906
+ - `resume_from_checkpoint`: None
907
+ - `hub_model_id`: None
908
+ - `hub_strategy`: every_save
909
+ - `hub_private_repo`: None
910
+ - `hub_always_push`: False
911
+ - `gradient_checkpointing`: False
912
+ - `gradient_checkpointing_kwargs`: None
913
+ - `include_inputs_for_metrics`: False
914
+ - `include_for_metrics`: []
915
+ - `eval_do_concat_batches`: True
916
+ - `fp16_backend`: auto
917
+ - `push_to_hub_model_id`: None
918
+ - `push_to_hub_organization`: None
919
+ - `mp_parameters`:
920
+ - `auto_find_batch_size`: False
921
+ - `full_determinism`: False
922
+ - `torchdynamo`: None
923
+ - `ray_scope`: last
924
+ - `ddp_timeout`: 1800
925
+ - `torch_compile`: False
926
+ - `torch_compile_backend`: None
927
+ - `torch_compile_mode`: None
928
+ - `include_tokens_per_second`: False
929
+ - `include_num_input_tokens_seen`: False
930
+ - `neftune_noise_alpha`: None
931
+ - `optim_target_modules`: None
932
+ - `batch_eval_metrics`: False
933
+ - `eval_on_start`: False
934
+ - `use_liger_kernel`: False
935
+ - `eval_use_gather_object`: False
936
+ - `average_tokens_across_devices`: False
937
+ - `prompts`: None
938
+ - `batch_sampler`: no_duplicates
939
+ - `multi_dataset_batch_sampler`: proportional
940
+
941
+ </details>
942
+
943
+ ### Training Logs
944
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
945
+ |:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
946
+ | 0.9938 | 10 | 44.0311 | - | - | - | - | - |
947
+ | 1.0 | 11 | - | 0.6797 | 0.6651 | 0.6439 | 0.6180 | 0.4996 |
948
+ | 0.9938 | 10 | 14.5908 | - | - | - | - | - |
949
+ | 1.0 | 11 | - | 0.7179 | 0.7034 | 0.6927 | 0.6658 | 0.5720 |
950
+ | 1.8944 | 20 | 8.5538 | - | - | - | - | - |
951
+ | 2.0 | 22 | - | 0.7295 | 0.7209 | 0.7109 | 0.6793 | 0.5942 |
952
+ | 2.7950 | 30 | 6.916 | - | - | - | - | - |
953
+ | **3.0** | **33** | **-** | **0.7382** | **0.7293** | **0.7149** | **0.6916** | **0.5939** |
954
+ | 3.6957 | 40 | 6.5704 | - | - | - | - | - |
955
+ | 4.0 | 44 | - | 0.7387 | 0.7303 | 0.7168 | 0.6886 | 0.5953 |
956
+
957
+ * The bold row denotes the saved checkpoint.
958
+
959
+ ### Framework Versions
960
+ - Python: 3.11.12
961
+ - Sentence Transformers: 4.1.0
962
+ - Transformers: 4.52.3
963
+ - PyTorch: 2.6.0+cu124
964
+ - Accelerate: 1.7.0
965
+ - Datasets: 3.6.0
966
+ - Tokenizers: 0.21.1
967
+
968
+ ## Citation
969
+
970
+ ### BibTeX
971
+
972
+ #### Sentence Transformers
973
+ ```bibtex
974
+ @inproceedings{reimers-2019-sentence-bert,
975
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
976
+ author = "Reimers, Nils and Gurevych, Iryna",
977
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
978
+ month = "11",
979
+ year = "2019",
980
+ publisher = "Association for Computational Linguistics",
981
+ url = "https://arxiv.org/abs/1908.10084",
982
+ }
983
+ ```
984
+
985
+ #### MatryoshkaLoss
986
+ ```bibtex
987
+ @misc{kusupati2024matryoshka,
988
+ title={Matryoshka Representation Learning},
989
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
990
+ year={2024},
991
+ eprint={2205.13147},
992
+ archivePrefix={arXiv},
993
+ primaryClass={cs.LG}
994
+ }
995
+ ```
996
+
997
+ #### MultipleNegativesRankingLoss
998
+ ```bibtex
999
+ @misc{henderson2017efficient,
1000
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1001
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1002
+ year={2017},
1003
+ eprint={1705.00652},
1004
+ archivePrefix={arXiv},
1005
+ primaryClass={cs.CL}
1006
+ }
1007
+ ```
1008
+
1009
+ <!--
1010
+ ## Glossary
1011
+
1012
+ *Clearly define terms in order to be accessible across audiences.*
1013
+ -->
1014
+
1015
+ <!--
1016
+ ## Model Card Authors
1017
+
1018
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1019
+ -->
1020
+
1021
+ <!--
1022
+ ## Model Card Contact
1023
+
1024
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1025
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MPNetModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "mpnet",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "relative_attention_num_buckets": 32,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.3",
22
+ "vocab_size": 30527
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac5fd8d14b8c8509a1708bad750086732938e2b9a4c60527eac35a92e247911e
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff