AryehRotberg commited on
Commit
7161d26
·
verified ·
1 Parent(s): a0d3a3f

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,532 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:167508
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: sentence-transformers/all-MiniLM-L6-v2
10
+ widget:
11
+ - source_sentence: We operate globally and may transfer your personal information
12
+ sentences:
13
+ - Content you post may be edited by the service for any reason
14
+ - You can retrieve an archive of your data
15
+ - Your data may be processed and stored anywhere in the world
16
+ - source_sentence: These pages, the content, and infrastructure of these pages and
17
+ the online reservation service (including the facilitation of payment service)
18
+ provided by us on these pages and through the website are owned, operated, and
19
+ provided by Booking.com B.V. and are provided for your personal, non-commercial
20
+ (B2C) use only, subject to the terms and conditions set out below.
21
+ sentences:
22
+ - The service will only respond to government requests that are reasonable
23
+ - Two factor authentication is provided for your account
24
+ - This service is only available for use individually and non-commercially.
25
+ - source_sentence: If you do not want to receive email or other communications from
26
+ us, please adjust your Customer Communication Preferences
27
+ sentences:
28
+ - You can opt out of promotional communications
29
+ - This Service provides a list of Third Parties involved in its operation.
30
+ - Terms may be changed at any time
31
+ - source_sentence: irrevocably and unconditionally waive any moral rights or similar
32
+ rights you have in any Content pursuant to the Copyright, Designs and Patents
33
+ Act 1988 (as amended, superseded or replaced from time to time) (the “Act”) or
34
+ equivalent legislation anywhere in the World.
35
+ sentences:
36
+ - User-generated content can be blocked or censored for any reason
37
+ - User suspension from the service will be fair and proportionate.
38
+ - You waive your moral rights
39
+ - source_sentence: You agree that regardless of any statute or law to the contrary,
40
+ any claim or cause of action arising out of or related to use of the Desmos Services
41
+ or these Terms must be filed within one (1) year after such claim or cause of
42
+ action arose or be forever barred.
43
+ sentences:
44
+ - The data retention period is kept to the minimum necessary for fulfilling its
45
+ purposes
46
+ - You are not allowed to use pseudonyms, as trust and transparency between users
47
+ regarding their identities is relevant to the service.
48
+ - You have a reduced time period to take legal action against the service
49
+ pipeline_tag: sentence-similarity
50
+ library_name: sentence-transformers
51
+ metrics:
52
+ - cosine_accuracy
53
+ model-index:
54
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
55
+ results:
56
+ - task:
57
+ type: triplet
58
+ name: Triplet
59
+ dataset:
60
+ name: all nli dev
61
+ type: all-nli-dev
62
+ metrics:
63
+ - type: cosine_accuracy
64
+ value: 0.9990686774253845
65
+ name: Cosine Accuracy
66
+ ---
67
+
68
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
69
+
70
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
71
+
72
+ ## Model Details
73
+
74
+ ### Model Description
75
+ - **Model Type:** Sentence Transformer
76
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
77
+ - **Maximum Sequence Length:** 256 tokens
78
+ - **Output Dimensionality:** 384 dimensions
79
+ - **Similarity Function:** Cosine Similarity
80
+ <!-- - **Training Dataset:** Unknown -->
81
+ <!-- - **Language:** Unknown -->
82
+ <!-- - **License:** Unknown -->
83
+
84
+ ### Model Sources
85
+
86
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
87
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
88
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
89
+
90
+ ### Full Model Architecture
91
+
92
+ ```
93
+ SentenceTransformer(
94
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
95
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
96
+ (2): Normalize()
97
+ )
98
+ ```
99
+
100
+ ## Usage
101
+
102
+ ### Direct Usage (Sentence Transformers)
103
+
104
+ First install the Sentence Transformers library:
105
+
106
+ ```bash
107
+ pip install -U sentence-transformers
108
+ ```
109
+
110
+ Then you can load this model and run inference.
111
+ ```python
112
+ from sentence_transformers import SentenceTransformer
113
+
114
+ # Download from the 🤗 Hub
115
+ model = SentenceTransformer("AryehRotberg/ToS-Sentence-Transformers-V3")
116
+ # Run inference
117
+ sentences = [
118
+ 'You agree that regardless of any statute or law to the contrary, any claim or cause of action arising out of or related to use of the Desmos Services or these Terms must be filed within one (1) year after such claim or cause of action arose or be forever barred.',
119
+ 'You have a reduced time period to take legal action against the service',
120
+ 'The data retention period is kept to the minimum necessary for fulfilling its purposes',
121
+ ]
122
+ embeddings = model.encode(sentences)
123
+ print(embeddings.shape)
124
+ # [3, 384]
125
+
126
+ # Get the similarity scores for the embeddings
127
+ similarities = model.similarity(embeddings, embeddings)
128
+ print(similarities.shape)
129
+ # [3, 3]
130
+ ```
131
+
132
+ <!--
133
+ ### Direct Usage (Transformers)
134
+
135
+ <details><summary>Click to see the direct usage in Transformers</summary>
136
+
137
+ </details>
138
+ -->
139
+
140
+ <!--
141
+ ### Downstream Usage (Sentence Transformers)
142
+
143
+ You can finetune this model on your own dataset.
144
+
145
+ <details><summary>Click to expand</summary>
146
+
147
+ </details>
148
+ -->
149
+
150
+ <!--
151
+ ### Out-of-Scope Use
152
+
153
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
154
+ -->
155
+
156
+ ## Evaluation
157
+
158
+ ### Metrics
159
+
160
+ #### Triplet
161
+
162
+ * Dataset: `all-nli-dev`
163
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
164
+
165
+ | Metric | Value |
166
+ |:--------------------|:-----------|
167
+ | **cosine_accuracy** | **0.9991** |
168
+
169
+ <!--
170
+ ## Bias, Risks and Limitations
171
+
172
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
173
+ -->
174
+
175
+ <!--
176
+ ### Recommendations
177
+
178
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
179
+ -->
180
+
181
+ ## Training Details
182
+
183
+ ### Training Dataset
184
+
185
+ #### Unnamed Dataset
186
+
187
+ * Size: 167,508 training samples
188
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
189
+ * Approximate statistics based on the first 1000 samples:
190
+ | | anchor | positive | negative |
191
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
192
+ | type | string | string | string |
193
+ | details | <ul><li>min: 3 tokens</li><li>mean: 48.65 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.89 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.24 tokens</li><li>max: 29 tokens</li></ul> |
194
+ * Samples:
195
+ | anchor | positive | negative |
196
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
197
+ | <code>This websites also uses Google Analytics, a web analysis service provided by Google Inc. ("Google"). Google Inc. is an enterprise of the holding company Alphabet Inc., domiciled in the USA</code> | <code>Third-party cookies are used for statistics</code> | <code>The service is open-source</code> |
198
+ | <code>Terms of Use This Agreement was last revised on Dec 6, 2017.</code> | <code>There is a date of the last update of the agreements</code> | <code>Many third parties are involved in operating the service</code> |
199
+ | <code>We reserve the right, at Our sole discretion, to modify or replace these Terms at any time. If a revision is material We will make reasonable efforts to provide at least 30 days' notice prior to any new terms taking effect.</code> | <code>When the service wants to make a material change to its terms, you are notified at least 30 days in advance</code> | <code>User-generated content is encrypted, and this service cannot decrypt it</code> |
200
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
201
+ ```json
202
+ {
203
+ "scale": 20.0,
204
+ "similarity_fct": "cos_sim"
205
+ }
206
+ ```
207
+
208
+ ### Evaluation Dataset
209
+
210
+ #### Unnamed Dataset
211
+
212
+ * Size: 41,877 evaluation samples
213
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
214
+ * Approximate statistics based on the first 1000 samples:
215
+ | | anchor | positive | negative |
216
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
217
+ | type | string | string | string |
218
+ | details | <ul><li>min: 4 tokens</li><li>mean: 47.12 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.92 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.56 tokens</li><li>max: 29 tokens</li></ul> |
219
+ * Samples:
220
+ | anchor | positive | negative |
221
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------|
222
+ | <code>© access or search or attempt to access or search the Services by any means (automated or otherwise) other than through our currently available, published interfaces that are provided by Podyssey (and only pursuant to those terms and conditions) or unless permitted by Podyssey’s robots.txt file or other robot exclusion mechanisms. (d) scrape the Services, and particularly scrape Content (as defined below) from the Services.</code> | <code>Spidering, crawling, or accessing the site through any automated means is not allowed</code> | <code>User-generated content is encrypted, and this service cannot decrypt it</code> |
223
+ | <code>License by Customer to Use Feedback. Customer grants to SFDC and its Affiliates a worldwide, perpetual, irrevocable, royalty-free license to use and incorporate into its services any suggestion, enhancement request, recommendation, correction or other feedback provided by Customer or Users relating to the operation of SFDC’s or its Affiliates’ services.</code> | <code>If you offer suggestions to the service, they may use that without your approval or compensation, but they do not become the owner</code> | <code>You can opt out of providing personal information to third parties</code> |
224
+ | <code>OVPN does not log any activity when connected to our VPN service.</code> | <code>Only necessary logs are kept by the service to ensure quality</code> | <code>You agree to defend, indemnify, and hold the service harmless in case of a claim related to your use of the service</code> |
225
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
226
+ ```json
227
+ {
228
+ "scale": 20.0,
229
+ "similarity_fct": "cos_sim"
230
+ }
231
+ ```
232
+
233
+ ### Training Hyperparameters
234
+ #### Non-Default Hyperparameters
235
+
236
+ - `eval_strategy`: steps
237
+ - `per_device_train_batch_size`: 16
238
+ - `per_device_eval_batch_size`: 16
239
+ - `learning_rate`: 2e-05
240
+ - `num_train_epochs`: 1
241
+ - `warmup_ratio`: 0.1
242
+ - `fp16`: True
243
+ - `batch_sampler`: no_duplicates
244
+
245
+ #### All Hyperparameters
246
+ <details><summary>Click to expand</summary>
247
+
248
+ - `overwrite_output_dir`: False
249
+ - `do_predict`: False
250
+ - `eval_strategy`: steps
251
+ - `prediction_loss_only`: True
252
+ - `per_device_train_batch_size`: 16
253
+ - `per_device_eval_batch_size`: 16
254
+ - `per_gpu_train_batch_size`: None
255
+ - `per_gpu_eval_batch_size`: None
256
+ - `gradient_accumulation_steps`: 1
257
+ - `eval_accumulation_steps`: None
258
+ - `torch_empty_cache_steps`: None
259
+ - `learning_rate`: 2e-05
260
+ - `weight_decay`: 0.0
261
+ - `adam_beta1`: 0.9
262
+ - `adam_beta2`: 0.999
263
+ - `adam_epsilon`: 1e-08
264
+ - `max_grad_norm`: 1.0
265
+ - `num_train_epochs`: 1
266
+ - `max_steps`: -1
267
+ - `lr_scheduler_type`: linear
268
+ - `lr_scheduler_kwargs`: {}
269
+ - `warmup_ratio`: 0.1
270
+ - `warmup_steps`: 0
271
+ - `log_level`: passive
272
+ - `log_level_replica`: warning
273
+ - `log_on_each_node`: True
274
+ - `logging_nan_inf_filter`: True
275
+ - `save_safetensors`: True
276
+ - `save_on_each_node`: False
277
+ - `save_only_model`: False
278
+ - `restore_callback_states_from_checkpoint`: False
279
+ - `no_cuda`: False
280
+ - `use_cpu`: False
281
+ - `use_mps_device`: False
282
+ - `seed`: 42
283
+ - `data_seed`: None
284
+ - `jit_mode_eval`: False
285
+ - `use_ipex`: False
286
+ - `bf16`: False
287
+ - `fp16`: True
288
+ - `fp16_opt_level`: O1
289
+ - `half_precision_backend`: auto
290
+ - `bf16_full_eval`: False
291
+ - `fp16_full_eval`: False
292
+ - `tf32`: None
293
+ - `local_rank`: 0
294
+ - `ddp_backend`: None
295
+ - `tpu_num_cores`: None
296
+ - `tpu_metrics_debug`: False
297
+ - `debug`: []
298
+ - `dataloader_drop_last`: False
299
+ - `dataloader_num_workers`: 0
300
+ - `dataloader_prefetch_factor`: None
301
+ - `past_index`: -1
302
+ - `disable_tqdm`: False
303
+ - `remove_unused_columns`: True
304
+ - `label_names`: None
305
+ - `load_best_model_at_end`: False
306
+ - `ignore_data_skip`: False
307
+ - `fsdp`: []
308
+ - `fsdp_min_num_params`: 0
309
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
310
+ - `tp_size`: 0
311
+ - `fsdp_transformer_layer_cls_to_wrap`: None
312
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
313
+ - `deepspeed`: None
314
+ - `label_smoothing_factor`: 0.0
315
+ - `optim`: adamw_torch
316
+ - `optim_args`: None
317
+ - `adafactor`: False
318
+ - `group_by_length`: False
319
+ - `length_column_name`: length
320
+ - `ddp_find_unused_parameters`: None
321
+ - `ddp_bucket_cap_mb`: None
322
+ - `ddp_broadcast_buffers`: False
323
+ - `dataloader_pin_memory`: True
324
+ - `dataloader_persistent_workers`: False
325
+ - `skip_memory_metrics`: True
326
+ - `use_legacy_prediction_loop`: False
327
+ - `push_to_hub`: False
328
+ - `resume_from_checkpoint`: None
329
+ - `hub_model_id`: None
330
+ - `hub_strategy`: every_save
331
+ - `hub_private_repo`: None
332
+ - `hub_always_push`: False
333
+ - `gradient_checkpointing`: False
334
+ - `gradient_checkpointing_kwargs`: None
335
+ - `include_inputs_for_metrics`: False
336
+ - `include_for_metrics`: []
337
+ - `eval_do_concat_batches`: True
338
+ - `fp16_backend`: auto
339
+ - `push_to_hub_model_id`: None
340
+ - `push_to_hub_organization`: None
341
+ - `mp_parameters`:
342
+ - `auto_find_batch_size`: False
343
+ - `full_determinism`: False
344
+ - `torchdynamo`: None
345
+ - `ray_scope`: last
346
+ - `ddp_timeout`: 1800
347
+ - `torch_compile`: False
348
+ - `torch_compile_backend`: None
349
+ - `torch_compile_mode`: None
350
+ - `include_tokens_per_second`: False
351
+ - `include_num_input_tokens_seen`: False
352
+ - `neftune_noise_alpha`: None
353
+ - `optim_target_modules`: None
354
+ - `batch_eval_metrics`: False
355
+ - `eval_on_start`: False
356
+ - `use_liger_kernel`: False
357
+ - `eval_use_gather_object`: False
358
+ - `average_tokens_across_devices`: False
359
+ - `prompts`: None
360
+ - `batch_sampler`: no_duplicates
361
+ - `multi_dataset_batch_sampler`: proportional
362
+
363
+ </details>
364
+
365
+ ### Training Logs
366
+ <details><summary>Click to expand</summary>
367
+
368
+ | Epoch | Step | Training Loss | Validation Loss | all-nli-dev_cosine_accuracy |
369
+ |:------:|:-----:|:-------------:|:---------------:|:---------------------------:|
370
+ | -1 | -1 | - | - | 0.9478 |
371
+ | 0.0096 | 100 | 1.34 | 1.1442 | 0.9598 |
372
+ | 0.0191 | 200 | 1.1161 | 0.9002 | 0.9725 |
373
+ | 0.0287 | 300 | 0.8731 | 0.7618 | 0.9786 |
374
+ | 0.0382 | 400 | 0.739 | 0.6587 | 0.9835 |
375
+ | 0.0478 | 500 | 0.6753 | 0.5901 | 0.9860 |
376
+ | 0.0573 | 600 | 0.6199 | 0.5277 | 0.9877 |
377
+ | 0.0669 | 700 | 0.5434 | 0.4952 | 0.9890 |
378
+ | 0.0764 | 800 | 0.4781 | 0.4602 | 0.9901 |
379
+ | 0.0860 | 900 | 0.4852 | 0.4351 | 0.9905 |
380
+ | 0.0955 | 1000 | 0.4329 | 0.4114 | 0.9910 |
381
+ | 0.1051 | 1100 | 0.4432 | 0.3804 | 0.9919 |
382
+ | 0.1146 | 1200 | 0.4224 | 0.3649 | 0.9928 |
383
+ | 0.1242 | 1300 | 0.3697 | 0.3488 | 0.9930 |
384
+ | 0.1337 | 1400 | 0.3724 | 0.3338 | 0.9936 |
385
+ | 0.1433 | 1500 | 0.3467 | 0.3246 | 0.9938 |
386
+ | 0.1528 | 1600 | 0.3728 | 0.3045 | 0.9945 |
387
+ | 0.1624 | 1700 | 0.3281 | 0.2952 | 0.9943 |
388
+ | 0.1719 | 1800 | 0.3187 | 0.2907 | 0.9946 |
389
+ | 0.1815 | 1900 | 0.336 | 0.2707 | 0.9952 |
390
+ | 0.1910 | 2000 | 0.2957 | 0.2667 | 0.9952 |
391
+ | 0.2006 | 2100 | 0.2787 | 0.2650 | 0.9955 |
392
+ | 0.2101 | 2200 | 0.2698 | 0.2534 | 0.9954 |
393
+ | 0.2197 | 2300 | 0.2741 | 0.2562 | 0.9956 |
394
+ | 0.2292 | 2400 | 0.2736 | 0.2477 | 0.9957 |
395
+ | 0.2388 | 2500 | 0.2936 | 0.2400 | 0.9960 |
396
+ | 0.2483 | 2600 | 0.2513 | 0.2321 | 0.9962 |
397
+ | 0.2579 | 2700 | 0.2564 | 0.2301 | 0.9965 |
398
+ | 0.2674 | 2800 | 0.245 | 0.2277 | 0.9965 |
399
+ | 0.2770 | 2900 | 0.2406 | 0.2156 | 0.9967 |
400
+ | 0.2865 | 3000 | 0.2074 | 0.2125 | 0.9966 |
401
+ | 0.2961 | 3100 | 0.2544 | 0.2081 | 0.9965 |
402
+ | 0.3056 | 3200 | 0.2333 | 0.2034 | 0.9968 |
403
+ | 0.3152 | 3300 | 0.2311 | 0.1998 | 0.9971 |
404
+ | 0.3247 | 3400 | 0.2294 | 0.1931 | 0.9972 |
405
+ | 0.3343 | 3500 | 0.2289 | 0.1877 | 0.9973 |
406
+ | 0.3438 | 3600 | 0.2291 | 0.1843 | 0.9974 |
407
+ | 0.3534 | 3700 | 0.2406 | 0.1748 | 0.9977 |
408
+ | 0.3629 | 3800 | 0.1851 | 0.1754 | 0.9974 |
409
+ | 0.3725 | 3900 | 0.2172 | 0.1691 | 0.9976 |
410
+ | 0.3820 | 4000 | 0.1885 | 0.1677 | 0.9979 |
411
+ | 0.3916 | 4100 | 0.2041 | 0.1662 | 0.9977 |
412
+ | 0.4011 | 4200 | 0.2052 | 0.1671 | 0.9977 |
413
+ | 0.4107 | 4300 | 0.1739 | 0.1626 | 0.9980 |
414
+ | 0.4202 | 4400 | 0.1721 | 0.1598 | 0.9979 |
415
+ | 0.4298 | 4500 | 0.1682 | 0.1575 | 0.9980 |
416
+ | 0.4394 | 4600 | 0.2076 | 0.1518 | 0.9980 |
417
+ | 0.4489 | 4700 | 0.1657 | 0.1549 | 0.9978 |
418
+ | 0.4585 | 4800 | 0.1827 | 0.1456 | 0.9981 |
419
+ | 0.4680 | 4900 | 0.1577 | 0.1412 | 0.9984 |
420
+ | 0.4776 | 5000 | 0.1869 | 0.1400 | 0.9983 |
421
+ | 0.4871 | 5100 | 0.1437 | 0.1400 | 0.9983 |
422
+ | 0.4967 | 5200 | 0.1806 | 0.1372 | 0.9982 |
423
+ | 0.5062 | 5300 | 0.1457 | 0.1358 | 0.9982 |
424
+ | 0.5158 | 5400 | 0.1529 | 0.1339 | 0.9983 |
425
+ | 0.5253 | 5500 | 0.1732 | 0.1300 | 0.9982 |
426
+ | 0.5349 | 5600 | 0.1563 | 0.1270 | 0.9984 |
427
+ | 0.5444 | 5700 | 0.1411 | 0.1267 | 0.9985 |
428
+ | 0.5540 | 5800 | 0.149 | 0.1270 | 0.9985 |
429
+ | 0.5635 | 5900 | 0.1492 | 0.1264 | 0.9985 |
430
+ | 0.5731 | 6000 | 0.1466 | 0.1200 | 0.9986 |
431
+ | 0.5826 | 6100 | 0.1423 | 0.1190 | 0.9986 |
432
+ | 0.5922 | 6200 | 0.1389 | 0.1204 | 0.9985 |
433
+ | 0.6017 | 6300 | 0.1287 | 0.1153 | 0.9984 |
434
+ | 0.6113 | 6400 | 0.1307 | 0.1139 | 0.9986 |
435
+ | 0.6208 | 6500 | 0.1383 | 0.1129 | 0.9987 |
436
+ | 0.6304 | 6600 | 0.1332 | 0.1105 | 0.9987 |
437
+ | 0.6399 | 6700 | 0.1228 | 0.1090 | 0.9988 |
438
+ | 0.6495 | 6800 | 0.119 | 0.1093 | 0.9987 |
439
+ | 0.6590 | 6900 | 0.1459 | 0.1076 | 0.9987 |
440
+ | 0.6686 | 7000 | 0.1162 | 0.1058 | 0.9988 |
441
+ | 0.6781 | 7100 | 0.1105 | 0.1054 | 0.9988 |
442
+ | 0.6877 | 7200 | 0.1379 | 0.1044 | 0.9988 |
443
+ | 0.6972 | 7300 | 0.1555 | 0.1017 | 0.9989 |
444
+ | 0.7068 | 7400 | 0.1471 | 0.0982 | 0.9989 |
445
+ | 0.7163 | 7500 | 0.1308 | 0.0983 | 0.9988 |
446
+ | 0.7259 | 7600 | 0.1095 | 0.0965 | 0.9988 |
447
+ | 0.7354 | 7700 | 0.1321 | 0.0956 | 0.9989 |
448
+ | 0.7450 | 7800 | 0.1108 | 0.0938 | 0.9987 |
449
+ | 0.7545 | 7900 | 0.1151 | 0.0918 | 0.9989 |
450
+ | 0.7641 | 8000 | 0.1179 | 0.0920 | 0.9990 |
451
+ | 0.7736 | 8100 | 0.117 | 0.0910 | 0.9991 |
452
+ | 0.7832 | 8200 | 0.1426 | 0.0895 | 0.9989 |
453
+ | 0.7927 | 8300 | 0.122 | 0.0891 | 0.9990 |
454
+ | 0.8023 | 8400 | 0.1136 | 0.0888 | 0.9989 |
455
+ | 0.8118 | 8500 | 0.0935 | 0.0882 | 0.9989 |
456
+ | 0.8214 | 8600 | 0.1143 | 0.0872 | 0.9989 |
457
+ | 0.8309 | 8700 | 0.0982 | 0.0873 | 0.9989 |
458
+ | 0.8405 | 8800 | 0.1171 | 0.0857 | 0.9989 |
459
+ | 0.8500 | 8900 | 0.1091 | 0.0844 | 0.9989 |
460
+ | 0.8596 | 9000 | 0.1046 | 0.0840 | 0.9989 |
461
+ | 0.8691 | 9100 | 0.0897 | 0.0836 | 0.9990 |
462
+ | 0.8787 | 9200 | 0.0804 | 0.0832 | 0.9991 |
463
+ | 0.8883 | 9300 | 0.0967 | 0.0827 | 0.9991 |
464
+ | 0.8978 | 9400 | 0.0897 | 0.0820 | 0.9991 |
465
+ | 0.9074 | 9500 | 0.0968 | 0.0813 | 0.9990 |
466
+ | 0.9169 | 9600 | 0.1108 | 0.0814 | 0.9991 |
467
+ | 0.9265 | 9700 | 0.1058 | 0.0806 | 0.9991 |
468
+ | 0.9360 | 9800 | 0.0871 | 0.0800 | 0.9990 |
469
+ | 0.9456 | 9900 | 0.1079 | 0.0797 | 0.9991 |
470
+ | 0.9551 | 10000 | 0.1064 | 0.0794 | 0.9991 |
471
+ | 0.9647 | 10100 | 0.1095 | 0.0792 | 0.9991 |
472
+ | 0.9742 | 10200 | 0.0858 | 0.0791 | 0.9991 |
473
+ | 0.9838 | 10300 | 0.0997 | 0.0791 | 0.9991 |
474
+ | 0.9933 | 10400 | 0.0888 | 0.0791 | 0.9991 |
475
+
476
+ </details>
477
+
478
+ ### Framework Versions
479
+ - Python: 3.12.7
480
+ - Sentence Transformers: 4.1.0
481
+ - Transformers: 4.51.3
482
+ - PyTorch: 2.4.1+cu124
483
+ - Accelerate: 1.6.0
484
+ - Datasets: 3.5.0
485
+ - Tokenizers: 0.21.0
486
+
487
+ ## Citation
488
+
489
+ ### BibTeX
490
+
491
+ #### Sentence Transformers
492
+ ```bibtex
493
+ @inproceedings{reimers-2019-sentence-bert,
494
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
495
+ author = "Reimers, Nils and Gurevych, Iryna",
496
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
497
+ month = "11",
498
+ year = "2019",
499
+ publisher = "Association for Computational Linguistics",
500
+ url = "https://arxiv.org/abs/1908.10084",
501
+ }
502
+ ```
503
+
504
+ #### MultipleNegativesRankingLoss
505
+ ```bibtex
506
+ @misc{henderson2017efficient,
507
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
508
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
509
+ year={2017},
510
+ eprint={1705.00652},
511
+ archivePrefix={arXiv},
512
+ primaryClass={cs.CL}
513
+ }
514
+ ```
515
+
516
+ <!--
517
+ ## Glossary
518
+
519
+ *Clearly define terms in order to be accessible across audiences.*
520
+ -->
521
+
522
+ <!--
523
+ ## Model Card Authors
524
+
525
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
526
+ -->
527
+
528
+ <!--
529
+ ## Model Card Contact
530
+
531
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
532
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.4.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36bdda015f64391c116e2a6af074bcd0e679bfe36732493dfe9fd2a9a9a37a0f
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff