msugimura commited on
Commit
a9b868f
·
verified ·
1 Parent(s): e940b84

Upload checkpoint-54/README.md

Browse files
Files changed (1) hide show
  1. checkpoint-54/README.md +448 -0
checkpoint-54/README.md ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:849
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: sentence-transformers/all-MiniLM-L12-v2
10
+ widget:
11
+ - source_sentence: Graphic designer who specializes in creating visual content for
12
+ brands, including logos, marketing materials, and user interfaces. Focuses on
13
+ aesthetics, user experience, and brand identity.
14
+ sentences:
15
+ - 'user_1: I''m looking to refresh my company''s brand image but don''t know where
16
+ to start.
17
+
18
+ user_2: You should consult a brand manager.'
19
+ - 'user_1: I need help designing a logo for my new business.
20
+
21
+ user_2: Have you thought about hiring a graphic designer?
22
+
23
+ user_1: Yes, I want something that really represents my brand.'
24
+ - 'user_1: My car''s making a weird noise, and I don''t know what to do.
25
+
26
+ user_2: You should take it to a mechanic.'
27
+ - source_sentence: Nutritionist who specializes in dietary planning and nutritional
28
+ counseling. Helps clients achieve their health goals through personalized meal
29
+ plans and education.
30
+ sentences:
31
+ - 'user_1: I''m trying to lose weight but I don''t know what to eat.
32
+
33
+ user_2: Have you considered talking to a nutritionist?'
34
+ - 'user_1: Our database is running slow, and I don''t know why.
35
+
36
+ user_2: Have you checked the indexing?'
37
+ - 'user_1: I need help fixing my car''s engine; it''s making a weird noise.
38
+
39
+ user_2: Have you checked the oil level?'
40
+ - source_sentence: 'user_2: Sure, what problem are you working on?'
41
+ sentences:
42
+ - Gardening expert specializing in vegetable gardening techniques and plant care.
43
+ - Event planner focusing on corporate events and wedding coordination.
44
+ - Math tutor specializing in teaching and clarifying mathematical concepts and problem-solving.
45
+ - source_sentence: 'user_2: Have you thought about getting some storage bins?'
46
+ sentences:
47
+ - Web developer focused on software engineering and application design.
48
+ - Professional organizer specializing in home organization and decluttering strategies.
49
+ - Pet behavior specialist who provides advice on dog breeds and training for small
50
+ living spaces.
51
+ - source_sentence: 'user_1: Maybe the national parks, I want to see some nature.'
52
+ sentences:
53
+ - Mental health counselor specializing in stress management and coping strategies.
54
+ - Data analyst focusing on market trends and business intelligence.
55
+ - Travel consultant specializing in road trip planning and national park itineraries.
56
+ pipeline_tag: sentence-similarity
57
+ library_name: sentence-transformers
58
+ ---
59
+
60
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2
61
+
62
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) on the semantic_triplets_round1 and inverse_semantic_triplets datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
63
+
64
+ ## Model Details
65
+
66
+ ### Model Description
67
+ - **Model Type:** Sentence Transformer
68
+ - **Base model:** [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) <!-- at revision c004d8e3e901237d8fa7e9fff12774962e391ce5 -->
69
+ - **Maximum Sequence Length:** 128 tokens
70
+ - **Output Dimensionality:** 384 dimensions
71
+ - **Similarity Function:** Cosine Similarity
72
+ - **Training Datasets:**
73
+ - semantic_triplets_round1
74
+ - inverse_semantic_triplets
75
+ <!-- - **Language:** Unknown -->
76
+ <!-- - **License:** Unknown -->
77
+
78
+ ### Model Sources
79
+
80
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
81
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
82
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
83
+
84
+ ### Full Model Architecture
85
+
86
+ ```
87
+ SentenceTransformer(
88
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
89
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
90
+ (2): Normalize()
91
+ )
92
+ ```
93
+
94
+ ## Usage
95
+
96
+ ### Direct Usage (Sentence Transformers)
97
+
98
+ First install the Sentence Transformers library:
99
+
100
+ ```bash
101
+ pip install -U sentence-transformers
102
+ ```
103
+
104
+ Then you can load this model and run inference.
105
+ ```python
106
+ from sentence_transformers import SentenceTransformer
107
+
108
+ # Download from the 🤗 Hub
109
+ model = SentenceTransformer("sentence_transformers_model_id")
110
+ # Run inference
111
+ sentences = [
112
+ 'user_1: Maybe the national parks, I want to see some nature.',
113
+ 'Travel consultant specializing in road trip planning and national park itineraries.',
114
+ 'Data analyst focusing on market trends and business intelligence.',
115
+ ]
116
+ embeddings = model.encode(sentences)
117
+ print(embeddings.shape)
118
+ # [3, 384]
119
+
120
+ # Get the similarity scores for the embeddings
121
+ similarities = model.similarity(embeddings, embeddings)
122
+ print(similarities.shape)
123
+ # [3, 3]
124
+ ```
125
+
126
+ <!--
127
+ ### Direct Usage (Transformers)
128
+
129
+ <details><summary>Click to see the direct usage in Transformers</summary>
130
+
131
+ </details>
132
+ -->
133
+
134
+ <!--
135
+ ### Downstream Usage (Sentence Transformers)
136
+
137
+ You can finetune this model on your own dataset.
138
+
139
+ <details><summary>Click to expand</summary>
140
+
141
+ </details>
142
+ -->
143
+
144
+ <!--
145
+ ### Out-of-Scope Use
146
+
147
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
148
+ -->
149
+
150
+ <!--
151
+ ## Bias, Risks and Limitations
152
+
153
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
154
+ -->
155
+
156
+ <!--
157
+ ### Recommendations
158
+
159
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
160
+ -->
161
+
162
+ ## Training Details
163
+
164
+ ### Training Datasets
165
+
166
+ #### semantic_triplets_round1
167
+
168
+ * Dataset: semantic_triplets_round1
169
+ * Size: 422 training samples
170
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
171
+ * Approximate statistics based on the first 422 samples:
172
+ | | anchor | positive | negative |
173
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
174
+ | type | string | string | string |
175
+ | details | <ul><li>min: 10 tokens</li><li>mean: 17.44 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 14.17 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 12.49 tokens</li><li>max: 20 tokens</li></ul> |
176
+ * Samples:
177
+ | anchor | positive | negative |
178
+ |:--------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|
179
+ | <code>user_1: Can anyone recommend a good app for tracking my expenses?</code> | <code>Personal finance advisor specializing in budgeting tools and expense tracking applications.</code> | <code>Fitness instructor focusing on workout plans and nutrition.</code> |
180
+ | <code>user_1: Can anyone recommend a good workout routine for beginners?</code> | <code>Fitness trainer who specializes in creating beginner workout plans and exercise coaching.</code> | <code>Financial advisor focused on investment strategies and retirement planning.</code> |
181
+ | <code>user_2: What kind of vegetables are you thinking of planting?</code> | <code>Gardening expert who provides guidance on vegetable gardening techniques and plant care.</code> | <code>Investment advisor specializing in stock market strategies and financial planning.</code> |
182
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
183
+ ```json
184
+ {
185
+ "scale": 20.0,
186
+ "similarity_fct": "cos_sim"
187
+ }
188
+ ```
189
+
190
+ #### inverse_semantic_triplets
191
+
192
+ * Dataset: inverse_semantic_triplets
193
+ * Size: 427 training samples
194
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
195
+ * Approximate statistics based on the first 427 samples:
196
+ | | anchor | positive | negative |
197
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
198
+ | type | string | string | string |
199
+ | details | <ul><li>min: 18 tokens</li><li>mean: 28.42 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 19 tokens</li><li>mean: 40.04 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 27.66 tokens</li><li>max: 62 tokens</li></ul> |
200
+ * Samples:
201
+ | anchor | positive | negative |
202
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
203
+ | <code>UX researcher specializing in user experience design and user testing. Conducts research to understand user needs and improve product usability.</code> | <code>user_1: I'm looking for ways to improve the usability of our app.<br>user_2: Have you considered conducting user interviews?</code> | <code>user_1: I need to plan a trip to Europe next summer.<br>user_2: What countries are you thinking about visiting?</code> |
204
+ | <code>Software developer specializing in web applications, proficient in various programming languages and frameworks. I design, develop, and maintain software solutions, focusing on user experience and functionality.</code> | <code>user_1: I'm trying to build a web application, but I'm stuck on how to integrate the backend with the frontend.<br>user_2: What technologies are you using for both?<br>user_1: I’m using Node.js for the backend and React for the frontend.</code> | <code>user_1: I'm looking for a good recipe for chocolate chip cookies.<br>user_2: I can share my favorite one!</code> |
205
+ | <code>Marketing strategist who focuses on developing comprehensive marketing plans to drive brand engagement and sales growth. Specializes in digital marketing and content strategy.</code> | <code>user_1: I'm launching a new product and need a marketing strategy.<br>user_2: Have you set any goals for your campaign?</code> | <code>user_1: I'm looking for a new pair of running shoes.<br>user_2: What brand do you prefer?</code> |
206
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
207
+ ```json
208
+ {
209
+ "scale": 20.0,
210
+ "similarity_fct": "cos_sim"
211
+ }
212
+ ```
213
+
214
+ ### Evaluation Datasets
215
+
216
+ #### semantic_triplets_round1
217
+
218
+ * Dataset: semantic_triplets_round1
219
+ * Size: 47 evaluation samples
220
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
221
+ * Approximate statistics based on the first 47 samples:
222
+ | | anchor | positive | negative |
223
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
224
+ | type | string | string | string |
225
+ | details | <ul><li>min: 12 tokens</li><li>mean: 17.87 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 14.32 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 12.49 tokens</li><li>max: 16 tokens</li></ul> |
226
+ * Samples:
227
+ | anchor | positive | negative |
228
+ |:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
229
+ | <code>user_1: What's the best way to train my puppy to stop barking?</code> | <code>Dog training specialist focused on behavioral issues and obedience training.</code> | <code>Financial advisor who specializes in investment strategies and wealth management.</code> |
230
+ | <code>user_2: What vegetables do you want to grow?</code> | <code>Gardening expert specializing in vegetable gardening and sustainable practices.</code> | <code>Real estate agent focusing on home buying and selling.</code> |
231
+ | <code>user_1: Anyone have tips on how to improve my running time for a 5k?</code> | <code>Running coach specializing in training plans and performance improvement.</code> | <code>Financial advisor focusing on investment strategies and retirement planning.</code> |
232
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
233
+ ```json
234
+ {
235
+ "scale": 20.0,
236
+ "similarity_fct": "cos_sim"
237
+ }
238
+ ```
239
+
240
+ #### inverse_semantic_triplets
241
+
242
+ * Dataset: inverse_semantic_triplets
243
+ * Size: 48 evaluation samples
244
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
245
+ * Approximate statistics based on the first 48 samples:
246
+ | | anchor | positive | negative |
247
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
248
+ | type | string | string | string |
249
+ | details | <ul><li>min: 20 tokens</li><li>mean: 28.42 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 23 tokens</li><li>mean: 39.71 tokens</li><li>max: 65 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 28.4 tokens</li><li>max: 52 tokens</li></ul> |
250
+ * Samples:
251
+ | anchor | positive | negative |
252
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------|
253
+ | <code>Graphic designer who specializes in creating visual content for brands, including logos, marketing materials, and user interfaces. Focuses on aesthetics, user experience, and brand identity.</code> | <code>user_1: I need help designing a logo for my new business.<br>user_2: Have you thought about hiring a graphic designer?<br>user_1: Yes, I want something that really represents my brand.</code> | <code>user_1: My car's making a weird noise, and I don't know what to do.<br>user_2: You should take it to a mechanic.</code> |
254
+ | <code>Physical therapist specializing in rehabilitation for sports injuries, pain management, and improving mobility through tailored exercise programs.</code> | <code>user_1: I twisted my ankle playing basketball, and it's really swollen.<br>user_2: Have you seen a doctor about it?</code> | <code>user_1: I'm thinking of redecorating my living room.<br>user_2: What style are you going for?</code> |
255
+ | <code>An accountant who specializes in financial record-keeping, tax preparation, and business consulting. Provides services to help clients manage their finances effectively and ensure compliance with tax regulations.</code> | <code>user_1: I need help with my taxes this year.<br>user_2: Are you looking for someone to prepare them for you?</code> | <code>user_1: I'm thinking about getting a puppy.</code> |
256
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
257
+ ```json
258
+ {
259
+ "scale": 20.0,
260
+ "similarity_fct": "cos_sim"
261
+ }
262
+ ```
263
+
264
+ ### Training Hyperparameters
265
+ #### Non-Default Hyperparameters
266
+
267
+ - `eval_strategy`: steps
268
+ - `per_device_train_batch_size`: 16
269
+ - `per_device_eval_batch_size`: 16
270
+ - `learning_rate`: 2e-05
271
+ - `num_train_epochs`: 1
272
+ - `warmup_ratio`: 0.1
273
+ - `batch_sampler`: no_duplicates
274
+
275
+ #### All Hyperparameters
276
+ <details><summary>Click to expand</summary>
277
+
278
+ - `overwrite_output_dir`: False
279
+ - `do_predict`: False
280
+ - `eval_strategy`: steps
281
+ - `prediction_loss_only`: True
282
+ - `per_device_train_batch_size`: 16
283
+ - `per_device_eval_batch_size`: 16
284
+ - `per_gpu_train_batch_size`: None
285
+ - `per_gpu_eval_batch_size`: None
286
+ - `gradient_accumulation_steps`: 1
287
+ - `eval_accumulation_steps`: None
288
+ - `torch_empty_cache_steps`: None
289
+ - `learning_rate`: 2e-05
290
+ - `weight_decay`: 0.0
291
+ - `adam_beta1`: 0.9
292
+ - `adam_beta2`: 0.999
293
+ - `adam_epsilon`: 1e-08
294
+ - `max_grad_norm`: 1.0
295
+ - `num_train_epochs`: 1
296
+ - `max_steps`: -1
297
+ - `lr_scheduler_type`: linear
298
+ - `lr_scheduler_kwargs`: {}
299
+ - `warmup_ratio`: 0.1
300
+ - `warmup_steps`: 0
301
+ - `log_level`: passive
302
+ - `log_level_replica`: warning
303
+ - `log_on_each_node`: True
304
+ - `logging_nan_inf_filter`: True
305
+ - `save_safetensors`: True
306
+ - `save_on_each_node`: False
307
+ - `save_only_model`: False
308
+ - `restore_callback_states_from_checkpoint`: False
309
+ - `no_cuda`: False
310
+ - `use_cpu`: False
311
+ - `use_mps_device`: False
312
+ - `seed`: 42
313
+ - `data_seed`: None
314
+ - `jit_mode_eval`: False
315
+ - `use_ipex`: False
316
+ - `bf16`: False
317
+ - `fp16`: False
318
+ - `fp16_opt_level`: O1
319
+ - `half_precision_backend`: auto
320
+ - `bf16_full_eval`: False
321
+ - `fp16_full_eval`: False
322
+ - `tf32`: None
323
+ - `local_rank`: 0
324
+ - `ddp_backend`: None
325
+ - `tpu_num_cores`: None
326
+ - `tpu_metrics_debug`: False
327
+ - `debug`: []
328
+ - `dataloader_drop_last`: False
329
+ - `dataloader_num_workers`: 0
330
+ - `dataloader_prefetch_factor`: None
331
+ - `past_index`: -1
332
+ - `disable_tqdm`: False
333
+ - `remove_unused_columns`: True
334
+ - `label_names`: None
335
+ - `load_best_model_at_end`: False
336
+ - `ignore_data_skip`: False
337
+ - `fsdp`: []
338
+ - `fsdp_min_num_params`: 0
339
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
340
+ - `fsdp_transformer_layer_cls_to_wrap`: None
341
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
342
+ - `deepspeed`: None
343
+ - `label_smoothing_factor`: 0.0
344
+ - `optim`: adamw_torch
345
+ - `optim_args`: None
346
+ - `adafactor`: False
347
+ - `group_by_length`: False
348
+ - `length_column_name`: length
349
+ - `ddp_find_unused_parameters`: None
350
+ - `ddp_bucket_cap_mb`: None
351
+ - `ddp_broadcast_buffers`: False
352
+ - `dataloader_pin_memory`: True
353
+ - `dataloader_persistent_workers`: False
354
+ - `skip_memory_metrics`: True
355
+ - `use_legacy_prediction_loop`: False
356
+ - `push_to_hub`: False
357
+ - `resume_from_checkpoint`: None
358
+ - `hub_model_id`: None
359
+ - `hub_strategy`: every_save
360
+ - `hub_private_repo`: None
361
+ - `hub_always_push`: False
362
+ - `gradient_checkpointing`: False
363
+ - `gradient_checkpointing_kwargs`: None
364
+ - `include_inputs_for_metrics`: False
365
+ - `include_for_metrics`: []
366
+ - `eval_do_concat_batches`: True
367
+ - `fp16_backend`: auto
368
+ - `push_to_hub_model_id`: None
369
+ - `push_to_hub_organization`: None
370
+ - `mp_parameters`:
371
+ - `auto_find_batch_size`: False
372
+ - `full_determinism`: False
373
+ - `torchdynamo`: None
374
+ - `ray_scope`: last
375
+ - `ddp_timeout`: 1800
376
+ - `torch_compile`: False
377
+ - `torch_compile_backend`: None
378
+ - `torch_compile_mode`: None
379
+ - `include_tokens_per_second`: False
380
+ - `include_num_input_tokens_seen`: False
381
+ - `neftune_noise_alpha`: None
382
+ - `optim_target_modules`: None
383
+ - `batch_eval_metrics`: False
384
+ - `eval_on_start`: False
385
+ - `use_liger_kernel`: False
386
+ - `eval_use_gather_object`: False
387
+ - `average_tokens_across_devices`: False
388
+ - `prompts`: None
389
+ - `batch_sampler`: no_duplicates
390
+ - `multi_dataset_batch_sampler`: proportional
391
+
392
+ </details>
393
+
394
+ ### Framework Versions
395
+ - Python: 3.12.9
396
+ - Sentence Transformers: 4.1.0
397
+ - Transformers: 4.52.4
398
+ - PyTorch: 2.7.1
399
+ - Accelerate: 1.8.1
400
+ - Datasets: 3.6.0
401
+ - Tokenizers: 0.21.1
402
+
403
+ ## Citation
404
+
405
+ ### BibTeX
406
+
407
+ #### Sentence Transformers
408
+ ```bibtex
409
+ @inproceedings{reimers-2019-sentence-bert,
410
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
411
+ author = "Reimers, Nils and Gurevych, Iryna",
412
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
413
+ month = "11",
414
+ year = "2019",
415
+ publisher = "Association for Computational Linguistics",
416
+ url = "https://arxiv.org/abs/1908.10084",
417
+ }
418
+ ```
419
+
420
+ #### MultipleNegativesRankingLoss
421
+ ```bibtex
422
+ @misc{henderson2017efficient,
423
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
424
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
425
+ year={2017},
426
+ eprint={1705.00652},
427
+ archivePrefix={arXiv},
428
+ primaryClass={cs.CL}
429
+ }
430
+ ```
431
+
432
+ <!--
433
+ ## Glossary
434
+
435
+ *Clearly define terms in order to be accessible across audiences.*
436
+ -->
437
+
438
+ <!--
439
+ ## Model Card Authors
440
+
441
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
442
+ -->
443
+
444
+ <!--
445
+ ## Model Card Contact
446
+
447
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
448
+ -->