AryehRotberg commited on
Commit
2b07ffa
·
verified ·
1 Parent(s): ceaaf13

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,556 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:203040
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: sentence-transformers/all-MiniLM-L6-v2
10
+ widget:
11
+ - source_sentence: Organizing contests, sweeptakes and surveys -Name -Contact details 
12
+ -Marketing preferences information about unsubscribing (if you unsubscribe from
13
+ our mailing list) -Data provided on the registration or survey form
14
+ sentences:
15
+ - Extra data may be collected about you through promotions
16
+ - Your personal information is used for many different purposes
17
+ - Your data is processed and stored in a country that is friendlier to user privacy
18
+ protection
19
+ - source_sentence: or visit a third-party service that includes content from our Services,
20
+ we may receive information about you, or combine such information with other personal
21
+ information.
22
+ sentences:
23
+ - Your feedback is invited regarding changes to the terms.
24
+ - This service tracks you on other websites
25
+ - Your information is only shared with third parties when given specific consent
26
+ - source_sentence: Changes to Terms of Use ADT reserves the right to update or revise
27
+ the Terms of Use governing this site, or any part thereof, at any time, at its
28
+ sole discretion, without prior notice. Such changes, modifications, additions,
29
+ or deletions shall be effective immediately upon notice thereof, which may be
30
+ given by any means including posting on this site or by other electronic or conventional
31
+ means.
32
+ sentences:
33
+ - The terms may be changed at any time, but you will receive notification of the
34
+ changes
35
+ - Spidering, crawling, or accessing the site through any automated means is not
36
+ allowed
37
+ - You are prohibited from sending chain letters, junk mail, spam or any unsolicited
38
+ messages
39
+ - source_sentence: We also collect information when you make use of the Site, including
40
+ your browsing history.
41
+ sentences:
42
+ - Your browsing history can be viewed by the service
43
+ - The service informs you that its privacy policy does not apply to third party
44
+ websites
45
+ - Promises will be kept after a merger or acquisition
46
+ - source_sentence: Each customer may register only one Coinbase account.
47
+ sentences:
48
+ - You can scrape the site, as long as it doesn't impact the server too much
49
+ - Usernames can be rejected or changed for any reason
50
+ - Alternative accounts are not allowed
51
+ pipeline_tag: sentence-similarity
52
+ library_name: sentence-transformers
53
+ metrics:
54
+ - cosine_accuracy
55
+ model-index:
56
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
57
+ results:
58
+ - task:
59
+ type: triplet
60
+ name: Triplet
61
+ dataset:
62
+ name: all nli dev
63
+ type: all-nli-dev
64
+ metrics:
65
+ - type: cosine_accuracy
66
+ value: 0.9993498921394348
67
+ name: Cosine Accuracy
68
+ ---
69
+
70
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
71
+
72
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
73
+
74
+ ## Model Details
75
+
76
+ ### Model Description
77
+ - **Model Type:** Sentence Transformer
78
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
79
+ - **Maximum Sequence Length:** 256 tokens
80
+ - **Output Dimensionality:** 384 dimensions
81
+ - **Similarity Function:** Cosine Similarity
82
+ <!-- - **Training Dataset:** Unknown -->
83
+ <!-- - **Language:** Unknown -->
84
+ <!-- - **License:** Unknown -->
85
+
86
+ ### Model Sources
87
+
88
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
89
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
90
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
91
+
92
+ ### Full Model Architecture
93
+
94
+ ```
95
+ SentenceTransformer(
96
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
97
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
98
+ (2): Normalize()
99
+ )
100
+ ```
101
+
102
+ ## Usage
103
+
104
+ ### Direct Usage (Sentence Transformers)
105
+
106
+ First install the Sentence Transformers library:
107
+
108
+ ```bash
109
+ pip install -U sentence-transformers
110
+ ```
111
+
112
+ Then you can load this model and run inference.
113
+ ```python
114
+ from sentence_transformers import SentenceTransformer
115
+
116
+ # Download from the 🤗 Hub
117
+ model = SentenceTransformer("AryehRotberg/ToS-Sentence-Transformers-V2")
118
+ # Run inference
119
+ sentences = [
120
+ 'Each customer may register only one Coinbase account.',
121
+ 'Alternative accounts are not allowed',
122
+ 'Usernames can be rejected or changed for any reason',
123
+ ]
124
+ embeddings = model.encode(sentences)
125
+ print(embeddings.shape)
126
+ # [3, 384]
127
+
128
+ # Get the similarity scores for the embeddings
129
+ similarities = model.similarity(embeddings, embeddings)
130
+ print(similarities.shape)
131
+ # [3, 3]
132
+ ```
133
+
134
+ <!--
135
+ ### Direct Usage (Transformers)
136
+
137
+ <details><summary>Click to see the direct usage in Transformers</summary>
138
+
139
+ </details>
140
+ -->
141
+
142
+ <!--
143
+ ### Downstream Usage (Sentence Transformers)
144
+
145
+ You can finetune this model on your own dataset.
146
+
147
+ <details><summary>Click to expand</summary>
148
+
149
+ </details>
150
+ -->
151
+
152
+ <!--
153
+ ### Out-of-Scope Use
154
+
155
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
156
+ -->
157
+
158
+ ## Evaluation
159
+
160
+ ### Metrics
161
+
162
+ #### Triplet
163
+
164
+ * Dataset: `all-nli-dev`
165
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
166
+
167
+ | Metric | Value |
168
+ |:--------------------|:-----------|
169
+ | **cosine_accuracy** | **0.9993** |
170
+
171
+ <!--
172
+ ## Bias, Risks and Limitations
173
+
174
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
175
+ -->
176
+
177
+ <!--
178
+ ### Recommendations
179
+
180
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
181
+ -->
182
+
183
+ ## Training Details
184
+
185
+ ### Training Dataset
186
+
187
+ #### Unnamed Dataset
188
+
189
+ * Size: 203,040 training samples
190
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
191
+ * Approximate statistics based on the first 1000 samples:
192
+ | | anchor | positive | negative |
193
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
194
+ | type | string | string | string |
195
+ | details | <ul><li>min: 5 tokens</li><li>mean: 47.01 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.08 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.45 tokens</li><li>max: 29 tokens</li></ul> |
196
+ * Samples:
197
+ | anchor | positive | negative |
198
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------|
199
+ | <code>but remains subject to the promises made in any pre-existing Privacy Policy (unless, of course, the customer consents otherwise).</code> | <code>Promises will be kept after a merger or acquisition</code> | <code>When the service wants to change its terms, you are notified a week or more in advance.</code> |
200
+ | <code>Visits are logged by the Web server. These logs are only used for maintenance purposes and to generate anonymous access statistics.</code> | <code>Only necessary logs are kept by the service to ensure quality</code> | <code>An onion site accessible over Tor is provided</code> |
201
+ | <code>You affirm that you are over the age of 13, as the FanFiction.Net Service is not intended for children under 13.</code> | <code>This service is only available to users over a certain age</code> | <code>No need to register</code> |
202
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
203
+ ```json
204
+ {
205
+ "scale": 20.0,
206
+ "similarity_fct": "cos_sim"
207
+ }
208
+ ```
209
+
210
+ ### Evaluation Dataset
211
+
212
+ #### Unnamed Dataset
213
+
214
+ * Size: 50,760 evaluation samples
215
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
216
+ * Approximate statistics based on the first 1000 samples:
217
+ | | anchor | positive | negative |
218
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
219
+ | type | string | string | string |
220
+ | details | <ul><li>min: 4 tokens</li><li>mean: 45.97 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.82 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.36 tokens</li><li>max: 29 tokens</li></ul> |
221
+ * Samples:
222
+ | anchor | positive | negative |
223
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
224
+ | <code>HP is not required to host, display, or distribute any User Submissions on or through This Website and may remove at any time or refuse any User Submissions for any reason.</code> | <code>User-generated content can be blocked or censored for any reason</code> | <code>The service will only respond to government requests that are reasonable</code> |
225
+ | <code>How we use information we collect</code> | <code>Information is provided about how your personal data is used</code> | <code>The service does not index or open files that you upload</code> |
226
+ | <code>your use of the LYKA Service is solely for your own personal use and you therefore must not, nor attempt to, resell or charge others for use of or access to the LYKA Service or for any business purposes;</code> | <code>This service is only available for use individually and non-commercially.</code> | <code>You cannot opt out of promotional communications</code> |
227
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
228
+ ```json
229
+ {
230
+ "scale": 20.0,
231
+ "similarity_fct": "cos_sim"
232
+ }
233
+ ```
234
+
235
+ ### Training Hyperparameters
236
+ #### Non-Default Hyperparameters
237
+
238
+ - `eval_strategy`: steps
239
+ - `per_device_train_batch_size`: 16
240
+ - `per_device_eval_batch_size`: 16
241
+ - `learning_rate`: 2e-05
242
+ - `num_train_epochs`: 1
243
+ - `warmup_ratio`: 0.1
244
+ - `fp16`: True
245
+ - `batch_sampler`: no_duplicates
246
+
247
+ #### All Hyperparameters
248
+ <details><summary>Click to expand</summary>
249
+
250
+ - `overwrite_output_dir`: False
251
+ - `do_predict`: False
252
+ - `eval_strategy`: steps
253
+ - `prediction_loss_only`: True
254
+ - `per_device_train_batch_size`: 16
255
+ - `per_device_eval_batch_size`: 16
256
+ - `per_gpu_train_batch_size`: None
257
+ - `per_gpu_eval_batch_size`: None
258
+ - `gradient_accumulation_steps`: 1
259
+ - `eval_accumulation_steps`: None
260
+ - `torch_empty_cache_steps`: None
261
+ - `learning_rate`: 2e-05
262
+ - `weight_decay`: 0.0
263
+ - `adam_beta1`: 0.9
264
+ - `adam_beta2`: 0.999
265
+ - `adam_epsilon`: 1e-08
266
+ - `max_grad_norm`: 1.0
267
+ - `num_train_epochs`: 1
268
+ - `max_steps`: -1
269
+ - `lr_scheduler_type`: linear
270
+ - `lr_scheduler_kwargs`: {}
271
+ - `warmup_ratio`: 0.1
272
+ - `warmup_steps`: 0
273
+ - `log_level`: passive
274
+ - `log_level_replica`: warning
275
+ - `log_on_each_node`: True
276
+ - `logging_nan_inf_filter`: True
277
+ - `save_safetensors`: True
278
+ - `save_on_each_node`: False
279
+ - `save_only_model`: False
280
+ - `restore_callback_states_from_checkpoint`: False
281
+ - `no_cuda`: False
282
+ - `use_cpu`: False
283
+ - `use_mps_device`: False
284
+ - `seed`: 42
285
+ - `data_seed`: None
286
+ - `jit_mode_eval`: False
287
+ - `use_ipex`: False
288
+ - `bf16`: False
289
+ - `fp16`: True
290
+ - `fp16_opt_level`: O1
291
+ - `half_precision_backend`: auto
292
+ - `bf16_full_eval`: False
293
+ - `fp16_full_eval`: False
294
+ - `tf32`: None
295
+ - `local_rank`: 0
296
+ - `ddp_backend`: None
297
+ - `tpu_num_cores`: None
298
+ - `tpu_metrics_debug`: False
299
+ - `debug`: []
300
+ - `dataloader_drop_last`: False
301
+ - `dataloader_num_workers`: 0
302
+ - `dataloader_prefetch_factor`: None
303
+ - `past_index`: -1
304
+ - `disable_tqdm`: False
305
+ - `remove_unused_columns`: True
306
+ - `label_names`: None
307
+ - `load_best_model_at_end`: False
308
+ - `ignore_data_skip`: False
309
+ - `fsdp`: []
310
+ - `fsdp_min_num_params`: 0
311
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
312
+ - `tp_size`: 0
313
+ - `fsdp_transformer_layer_cls_to_wrap`: None
314
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
315
+ - `deepspeed`: None
316
+ - `label_smoothing_factor`: 0.0
317
+ - `optim`: adamw_torch
318
+ - `optim_args`: None
319
+ - `adafactor`: False
320
+ - `group_by_length`: False
321
+ - `length_column_name`: length
322
+ - `ddp_find_unused_parameters`: None
323
+ - `ddp_bucket_cap_mb`: None
324
+ - `ddp_broadcast_buffers`: False
325
+ - `dataloader_pin_memory`: True
326
+ - `dataloader_persistent_workers`: False
327
+ - `skip_memory_metrics`: True
328
+ - `use_legacy_prediction_loop`: False
329
+ - `push_to_hub`: False
330
+ - `resume_from_checkpoint`: None
331
+ - `hub_model_id`: None
332
+ - `hub_strategy`: every_save
333
+ - `hub_private_repo`: None
334
+ - `hub_always_push`: False
335
+ - `gradient_checkpointing`: False
336
+ - `gradient_checkpointing_kwargs`: None
337
+ - `include_inputs_for_metrics`: False
338
+ - `include_for_metrics`: []
339
+ - `eval_do_concat_batches`: True
340
+ - `fp16_backend`: auto
341
+ - `push_to_hub_model_id`: None
342
+ - `push_to_hub_organization`: None
343
+ - `mp_parameters`:
344
+ - `auto_find_batch_size`: False
345
+ - `full_determinism`: False
346
+ - `torchdynamo`: None
347
+ - `ray_scope`: last
348
+ - `ddp_timeout`: 1800
349
+ - `torch_compile`: False
350
+ - `torch_compile_backend`: None
351
+ - `torch_compile_mode`: None
352
+ - `include_tokens_per_second`: False
353
+ - `include_num_input_tokens_seen`: False
354
+ - `neftune_noise_alpha`: None
355
+ - `optim_target_modules`: None
356
+ - `batch_eval_metrics`: False
357
+ - `eval_on_start`: False
358
+ - `use_liger_kernel`: False
359
+ - `eval_use_gather_object`: False
360
+ - `average_tokens_across_devices`: False
361
+ - `prompts`: None
362
+ - `batch_sampler`: no_duplicates
363
+ - `multi_dataset_batch_sampler`: proportional
364
+
365
+ </details>
366
+
367
+ ### Training Logs
368
+ <details><summary>Click to expand</summary>
369
+
370
+ | Epoch | Step | Training Loss | Validation Loss | all-nli-dev_cosine_accuracy |
371
+ |:------:|:-----:|:-------------:|:---------------:|:---------------------------:|
372
+ | -1 | -1 | - | - | 0.9547 |
373
+ | 0.0079 | 100 | 1.3098 | 1.1250 | 0.9618 |
374
+ | 0.0158 | 200 | 1.0671 | 0.9039 | 0.9726 |
375
+ | 0.0236 | 300 | 0.8861 | 0.7616 | 0.9788 |
376
+ | 0.0315 | 400 | 0.7625 | 0.6672 | 0.9824 |
377
+ | 0.0394 | 500 | 0.7217 | 0.5984 | 0.9852 |
378
+ | 0.0473 | 600 | 0.6612 | 0.5432 | 0.9875 |
379
+ | 0.0552 | 700 | 0.5484 | 0.5048 | 0.9884 |
380
+ | 0.0630 | 800 | 0.5435 | 0.4699 | 0.9898 |
381
+ | 0.0709 | 900 | 0.522 | 0.4319 | 0.9909 |
382
+ | 0.0788 | 1000 | 0.4715 | 0.4152 | 0.9915 |
383
+ | 0.0867 | 1100 | 0.4495 | 0.3909 | 0.9923 |
384
+ | 0.0946 | 1200 | 0.4552 | 0.3741 | 0.9929 |
385
+ | 0.1024 | 1300 | 0.4159 | 0.3559 | 0.9934 |
386
+ | 0.1103 | 1400 | 0.4095 | 0.3404 | 0.9937 |
387
+ | 0.1182 | 1500 | 0.3849 | 0.3267 | 0.9936 |
388
+ | 0.1261 | 1600 | 0.3357 | 0.3208 | 0.9941 |
389
+ | 0.1340 | 1700 | 0.4029 | 0.2989 | 0.9946 |
390
+ | 0.1418 | 1800 | 0.3413 | 0.2882 | 0.9949 |
391
+ | 0.1497 | 1900 | 0.3254 | 0.2842 | 0.9952 |
392
+ | 0.1576 | 2000 | 0.3123 | 0.2817 | 0.9950 |
393
+ | 0.1655 | 2100 | 0.3003 | 0.2652 | 0.9955 |
394
+ | 0.1734 | 2200 | 0.3117 | 0.2559 | 0.9959 |
395
+ | 0.1812 | 2300 | 0.332 | 0.2504 | 0.9959 |
396
+ | 0.1891 | 2400 | 0.2923 | 0.2481 | 0.9962 |
397
+ | 0.1970 | 2500 | 0.2747 | 0.2389 | 0.9961 |
398
+ | 0.2049 | 2600 | 0.2507 | 0.2355 | 0.9962 |
399
+ | 0.2128 | 2700 | 0.2563 | 0.2294 | 0.9965 |
400
+ | 0.2206 | 2800 | 0.2512 | 0.2228 | 0.9967 |
401
+ | 0.2285 | 2900 | 0.2622 | 0.2201 | 0.9967 |
402
+ | 0.2364 | 3000 | 0.234 | 0.2183 | 0.9968 |
403
+ | 0.2443 | 3100 | 0.2607 | 0.2158 | 0.9969 |
404
+ | 0.2522 | 3200 | 0.2221 | 0.2077 | 0.9973 |
405
+ | 0.2600 | 3300 | 0.2559 | 0.2037 | 0.9971 |
406
+ | 0.2679 | 3400 | 0.2261 | 0.2044 | 0.9969 |
407
+ | 0.2758 | 3500 | 0.2453 | 0.1985 | 0.9969 |
408
+ | 0.2837 | 3600 | 0.2251 | 0.1927 | 0.9975 |
409
+ | 0.2916 | 3700 | 0.2716 | 0.1913 | 0.9976 |
410
+ | 0.2994 | 3800 | 0.1949 | 0.1894 | 0.9975 |
411
+ | 0.3073 | 3900 | 0.2361 | 0.1868 | 0.9973 |
412
+ | 0.3152 | 4000 | 0.223 | 0.1812 | 0.9974 |
413
+ | 0.3231 | 4100 | 0.1846 | 0.1788 | 0.9974 |
414
+ | 0.3310 | 4200 | 0.2143 | 0.1771 | 0.9974 |
415
+ | 0.3388 | 4300 | 0.2063 | 0.1705 | 0.9976 |
416
+ | 0.3467 | 4400 | 0.2207 | 0.1693 | 0.9977 |
417
+ | 0.3546 | 4500 | 0.2053 | 0.1608 | 0.9980 |
418
+ | 0.3625 | 4600 | 0.1705 | 0.1603 | 0.9981 |
419
+ | 0.3704 | 4700 | 0.2085 | 0.1597 | 0.9980 |
420
+ | 0.3783 | 4800 | 0.2034 | 0.1561 | 0.9981 |
421
+ | 0.3861 | 4900 | 0.1765 | 0.1562 | 0.9981 |
422
+ | 0.3940 | 5000 | 0.1955 | 0.1497 | 0.9982 |
423
+ | 0.4019 | 5100 | 0.1843 | 0.1487 | 0.9981 |
424
+ | 0.4098 | 5200 | 0.186 | 0.1479 | 0.9981 |
425
+ | 0.4177 | 5300 | 0.1631 | 0.1498 | 0.9980 |
426
+ | 0.4255 | 5400 | 0.1719 | 0.1468 | 0.9980 |
427
+ | 0.4334 | 5500 | 0.1916 | 0.1436 | 0.9983 |
428
+ | 0.4413 | 5600 | 0.1706 | 0.1421 | 0.9982 |
429
+ | 0.4492 | 5700 | 0.1512 | 0.1372 | 0.9984 |
430
+ | 0.4571 | 5800 | 0.1626 | 0.1357 | 0.9984 |
431
+ | 0.4649 | 5900 | 0.1652 | 0.1332 | 0.9985 |
432
+ | 0.4728 | 6000 | 0.146 | 0.1325 | 0.9986 |
433
+ | 0.4807 | 6100 | 0.1487 | 0.1308 | 0.9986 |
434
+ | 0.4886 | 6200 | 0.1565 | 0.1290 | 0.9985 |
435
+ | 0.4965 | 6300 | 0.1567 | 0.1281 | 0.9985 |
436
+ | 0.5043 | 6400 | 0.1678 | 0.1264 | 0.9985 |
437
+ | 0.5122 | 6500 | 0.1203 | 0.1261 | 0.9986 |
438
+ | 0.5201 | 6600 | 0.1572 | 0.1245 | 0.9985 |
439
+ | 0.5280 | 6700 | 0.1539 | 0.1221 | 0.9985 |
440
+ | 0.5359 | 6800 | 0.1546 | 0.1226 | 0.9986 |
441
+ | 0.5437 | 6900 | 0.1216 | 0.1185 | 0.9987 |
442
+ | 0.5516 | 7000 | 0.1272 | 0.1193 | 0.9986 |
443
+ | 0.5595 | 7100 | 0.1321 | 0.1179 | 0.9988 |
444
+ | 0.5674 | 7200 | 0.1305 | 0.1144 | 0.9988 |
445
+ | 0.5753 | 7300 | 0.1558 | 0.1151 | 0.9987 |
446
+ | 0.5831 | 7400 | 0.1282 | 0.1133 | 0.9986 |
447
+ | 0.5910 | 7500 | 0.1442 | 0.1113 | 0.9986 |
448
+ | 0.5989 | 7600 | 0.1529 | 0.1094 | 0.9988 |
449
+ | 0.6068 | 7700 | 0.1254 | 0.1086 | 0.9987 |
450
+ | 0.6147 | 7800 | 0.1158 | 0.1061 | 0.9988 |
451
+ | 0.6225 | 7900 | 0.1127 | 0.1063 | 0.9988 |
452
+ | 0.6304 | 8000 | 0.1253 | 0.1052 | 0.9988 |
453
+ | 0.6383 | 8100 | 0.1542 | 0.1050 | 0.9989 |
454
+ | 0.6462 | 8200 | 0.1237 | 0.1038 | 0.9990 |
455
+ | 0.6541 | 8300 | 0.1307 | 0.1029 | 0.9988 |
456
+ | 0.6619 | 8400 | 0.1231 | 0.1022 | 0.9989 |
457
+ | 0.6698 | 8500 | 0.1573 | 0.1002 | 0.9990 |
458
+ | 0.6777 | 8600 | 0.1257 | 0.0990 | 0.9990 |
459
+ | 0.6856 | 8700 | 0.103 | 0.0986 | 0.9990 |
460
+ | 0.6935 | 8800 | 0.1143 | 0.0983 | 0.9990 |
461
+ | 0.7013 | 8900 | 0.1138 | 0.0965 | 0.9991 |
462
+ | 0.7092 | 9000 | 0.1158 | 0.0962 | 0.9990 |
463
+ | 0.7171 | 9100 | 0.1104 | 0.0960 | 0.9991 |
464
+ | 0.7250 | 9200 | 0.1054 | 0.0967 | 0.9991 |
465
+ | 0.7329 | 9300 | 0.1194 | 0.0946 | 0.9991 |
466
+ | 0.7407 | 9400 | 0.1245 | 0.0936 | 0.9991 |
467
+ | 0.7486 | 9500 | 0.126 | 0.0926 | 0.9991 |
468
+ | 0.7565 | 9600 | 0.1059 | 0.0913 | 0.9992 |
469
+ | 0.7644 | 9700 | 0.1101 | 0.0906 | 0.9992 |
470
+ | 0.7723 | 9800 | 0.1192 | 0.0898 | 0.9993 |
471
+ | 0.7801 | 9900 | 0.1241 | 0.0886 | 0.9993 |
472
+ | 0.7880 | 10000 | 0.1134 | 0.0876 | 0.9993 |
473
+ | 0.7959 | 10100 | 0.1071 | 0.0868 | 0.9993 |
474
+ | 0.8038 | 10200 | 0.1043 | 0.0869 | 0.9993 |
475
+ | 0.8117 | 10300 | 0.1191 | 0.0864 | 0.9993 |
476
+ | 0.8195 | 10400 | 0.1188 | 0.0853 | 0.9993 |
477
+ | 0.8274 | 10500 | 0.1014 | 0.0847 | 0.9993 |
478
+ | 0.8353 | 10600 | 0.0878 | 0.0846 | 0.9993 |
479
+ | 0.8432 | 10700 | 0.0952 | 0.0839 | 0.9993 |
480
+ | 0.8511 | 10800 | 0.1169 | 0.0841 | 0.9993 |
481
+ | 0.8589 | 10900 | 0.1032 | 0.0825 | 0.9993 |
482
+ | 0.8668 | 11000 | 0.1086 | 0.0823 | 0.9993 |
483
+ | 0.8747 | 11100 | 0.1058 | 0.0820 | 0.9993 |
484
+ | 0.8826 | 11200 | 0.0973 | 0.0818 | 0.9993 |
485
+ | 0.8905 | 11300 | 0.1166 | 0.0811 | 0.9993 |
486
+ | 0.8983 | 11400 | 0.0965 | 0.0807 | 0.9993 |
487
+ | 0.9062 | 11500 | 0.0974 | 0.0805 | 0.9993 |
488
+ | 0.9141 | 11600 | 0.0984 | 0.0803 | 0.9993 |
489
+ | 0.9220 | 11700 | 0.1199 | 0.0798 | 0.9993 |
490
+ | 0.9299 | 11800 | 0.0854 | 0.0794 | 0.9993 |
491
+ | 0.9377 | 11900 | 0.1004 | 0.0798 | 0.9993 |
492
+ | 0.9456 | 12000 | 0.1119 | 0.0792 | 0.9993 |
493
+ | 0.9535 | 12100 | 0.1171 | 0.0790 | 0.9993 |
494
+ | 0.9614 | 12200 | 0.1045 | 0.0787 | 0.9993 |
495
+ | 0.9693 | 12300 | 0.1116 | 0.0784 | 0.9993 |
496
+ | 0.9771 | 12400 | 0.091 | 0.0781 | 0.9993 |
497
+ | 0.9850 | 12500 | 0.083 | 0.0781 | 0.9993 |
498
+ | 0.9929 | 12600 | 0.1146 | 0.0779 | 0.9993 |
499
+
500
+ </details>
501
+
502
+ ### Framework Versions
503
+ - Python: 3.11.12
504
+ - Sentence Transformers: 3.4.1
505
+ - Transformers: 4.51.3
506
+ - PyTorch: 2.6.0+cu124
507
+ - Accelerate: 1.5.2
508
+ - Datasets: 3.5.0
509
+ - Tokenizers: 0.21.1
510
+
511
+ ## Citation
512
+
513
+ ### BibTeX
514
+
515
+ #### Sentence Transformers
516
+ ```bibtex
517
+ @inproceedings{reimers-2019-sentence-bert,
518
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
519
+ author = "Reimers, Nils and Gurevych, Iryna",
520
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
521
+ month = "11",
522
+ year = "2019",
523
+ publisher = "Association for Computational Linguistics",
524
+ url = "https://arxiv.org/abs/1908.10084",
525
+ }
526
+ ```
527
+
528
+ #### MultipleNegativesRankingLoss
529
+ ```bibtex
530
+ @misc{henderson2017efficient,
531
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
532
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
533
+ year={2017},
534
+ eprint={1705.00652},
535
+ archivePrefix={arXiv},
536
+ primaryClass={cs.CL}
537
+ }
538
+ ```
539
+
540
+ <!--
541
+ ## Glossary
542
+
543
+ *Clearly define terms in order to be accessible across audiences.*
544
+ -->
545
+
546
+ <!--
547
+ ## Model Card Authors
548
+
549
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
550
+ -->
551
+
552
+ <!--
553
+ ## Model Card Contact
554
+
555
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
556
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.4.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adf80f15a757f3c8a60c16ee9d35cc316d28ac6ebc962893bd7cd6d3fdab16e0
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff