KhaledReda commited on
Commit
c57f589
·
verified ·
1 Parent(s): 79d01c0

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,933 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - dense
10
+ - generated_from_trainer
11
+ - dataset_size:43059870
12
+ - loss:CoSENTLoss
13
+ base_model: sentence-transformers/all-MiniLM-L6-v2
14
+ widget:
15
+ - source_sentence: baladi qeshta
16
+ sentences:
17
+ - espresso kahwah blend
18
+ - printed jersi
19
+ - baladi saucisse
20
+ - source_sentence: mug
21
+ sentences:
22
+ - moodapex 50 mg 30 tablets, moodapex, pharmacies form tablets units 0.05 gram
23
+ - dark clover, long lasting flowers clover flowers dark clover flowers flowers,
24
+ clover flowers dark clover flowers flowers, carefully crafted with attention to
25
+ detail. its realistic appearance and durable materials provide a long-lasting
26
+ decoration for any occasion. made to enhance any space snow adds a touch of elegance
27
+ and beauty to your home or event.
28
+ - floral with belt, women dress belt dress dress floral dress, belt dress dress
29
+ floral dress, gender women aeilin generic dress s features belt types of fashion
30
+ styles casual multicolor floral
31
+ - source_sentence: chopped vegetable dressing
32
+ sentences:
33
+ - the ski jersi
34
+ - carrot salad
35
+ - leafy green salad
36
+ - source_sentence: monomak
37
+ sentences:
38
+ - ipad portfolio 12.9-inch size, inch ipad portfolio ipad portfolio case ipad portfolio
39
+ organizer ipad portfolio with size inch inch ipad carrying case ipad portfolio
40
+ for large screens ipad portfolio for professionals inch ipad storage case ipad
41
+ portfolio for business use ipad portfolio with ample space inch ipad case ipad
42
+ inch cover ipad inch sleeve ipad portfolio, inch ipad case ipad inch cover ipad
43
+ inch sleeve ipad portfolio ipad portfolio case, numeric 12.9 - inch, size 12.9-inch
44
+ - fine line lightening serum
45
+ - islamic prayer wear
46
+ - source_sentence: classic shoes
47
+ sentences:
48
+ - forever skin cleansing device, silicone ultrasonic facial cleanser facial electric
49
+ cleanser ultrasonic face wash brush mini sonic face brush electric face cleanser
50
+ facial cleansing tool forever skin device skin cleansing device, electric face
51
+ cleanser facial cleansing tool forever skin device skin cleansing device, forever
52
+ silicone ultrasonic facial cleanser face wash brush facial electric cleanser all
53
+ skin type - forever offers all the benefits of deep cleansing in one compact palm-sized
54
+ device. the t-sonic pulsations deliver the unique ability to remove 99.5 of dirt
55
+ and oil as well as makeup residue and dead skin cells and exfoliate without irritating
56
+ the skin. just 1 minute of use twice daily cleanses and transforms the skin by
57
+ removing blemish-causing impurities. the mini sonic face brush is made from highly
58
+ durable body-safe hypoallergenic silicone and is non-porous to resist bacteria
59
+ build-up making it 35 x more hygienic than nylon-bristled brushes and never requiring
60
+ any replacement brush heads. lightweight completely waterproof for use in the
61
+ bath or shower and with 2 speed settings the mini is designed around your life
62
+ with each full charge lasting up to 300 uses. specification type skin cleansing
63
+ exfoliation. system power source battery. brand forever. package 1 x forever silicone
64
+ ultrasonic facial cleanser.
65
+ - v 60 ethiopian filter coffee, coffee ethiopian coffee filter coffee v 60 coffee
66
+ ahwa ethiopian ahwa ethiopian kahwa ethiopian kahwah ethiopian qahwa filter ahwa
67
+ filter kahwa filter kahwah filter qahwa kahwa kahwah qahwa v 60 ahwa v 60 ethiopian
68
+ filter ahwa v 60 ethiopian filter kahwa v 60 ethiopian filter kahwah v 60 ethiopian
69
+ filter qahwa v 60 kahwa v 60 kahwah v 60 qahwa, ethiopian filter coffee.
70
+ - polynomial equations calculator
71
+ datasets:
72
+ - KhaledReda/pairs_with_scores_v32
73
+ pipeline_tag: sentence-similarity
74
+ library_name: sentence-transformers
75
+ ---
76
+
77
+ # all-MiniLM-L6-v38-pair_score
78
+
79
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the [pairs_with_scores_v32](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
80
+
81
+ ## Model Details
82
+
83
+ ### Model Description
84
+ - **Model Type:** Sentence Transformer
85
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
86
+ - **Maximum Sequence Length:** 256 tokens
87
+ - **Output Dimensionality:** 384 dimensions
88
+ - **Similarity Function:** Cosine Similarity
89
+ - **Training Dataset:**
90
+ - [pairs_with_scores_v32](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32)
91
+ - **Language:** en
92
+ - **License:** apache-2.0
93
+
94
+ ### Model Sources
95
+
96
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
97
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
98
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
99
+
100
+ ### Full Model Architecture
101
+
102
+ ```
103
+ SentenceTransformer(
104
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
105
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
106
+ (2): Normalize()
107
+ )
108
+ ```
109
+
110
+ ## Usage
111
+
112
+ ### Direct Usage (Sentence Transformers)
113
+
114
+ First install the Sentence Transformers library:
115
+
116
+ ```bash
117
+ pip install -U sentence-transformers
118
+ ```
119
+
120
+ Then you can load this model and run inference.
121
+ ```python
122
+ from sentence_transformers import SentenceTransformer
123
+
124
+ # Download from the 🤗 Hub
125
+ model = SentenceTransformer("sentence_transformers_model_id")
126
+ # Run inference
127
+ sentences = [
128
+ 'classic shoes',
129
+ 'forever skin cleansing device, silicone ultrasonic facial cleanser facial electric cleanser ultrasonic face wash brush mini sonic face brush electric face cleanser facial cleansing tool forever skin device skin cleansing device, electric face cleanser facial cleansing tool forever skin device skin cleansing device, forever silicone ultrasonic facial cleanser face wash brush facial electric cleanser all skin type - forever offers all the benefits of deep cleansing in one compact palm-sized device. the t-sonic pulsations deliver the unique ability to remove 99.5 of dirt and oil as well as makeup residue and dead skin cells and exfoliate without irritating the skin. just 1 minute of use twice daily cleanses and transforms the skin by removing blemish-causing impurities. the mini sonic face brush is made from highly durable body-safe hypoallergenic silicone and is non-porous to resist bacteria build-up making it 35 x more hygienic than nylon-bristled brushes and never requiring any replacement brush heads. lightweight completely waterproof for use in the bath or shower and with 2 speed settings the mini is designed around your life with each full charge lasting up to 300 uses. specification type skin cleansing exfoliation. system power source battery. brand forever. package 1 x forever silicone ultrasonic facial cleanser.',
130
+ 'polynomial equations calculator',
131
+ ]
132
+ embeddings = model.encode(sentences)
133
+ print(embeddings.shape)
134
+ # [3, 384]
135
+
136
+ # Get the similarity scores for the embeddings
137
+ similarities = model.similarity(embeddings, embeddings)
138
+ print(similarities)
139
+ # tensor([[1.0000, 0.4272, 0.5532],
140
+ # [0.4272, 1.0000, 0.5415],
141
+ # [0.5532, 0.5415, 1.0000]])
142
+ ```
143
+
144
+ <!--
145
+ ### Direct Usage (Transformers)
146
+
147
+ <details><summary>Click to see the direct usage in Transformers</summary>
148
+
149
+ </details>
150
+ -->
151
+
152
+ <!--
153
+ ### Downstream Usage (Sentence Transformers)
154
+
155
+ You can finetune this model on your own dataset.
156
+
157
+ <details><summary>Click to expand</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Out-of-Scope Use
164
+
165
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
166
+ -->
167
+
168
+ <!--
169
+ ## Bias, Risks and Limitations
170
+
171
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
172
+ -->
173
+
174
+ <!--
175
+ ### Recommendations
176
+
177
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
178
+ -->
179
+
180
+ ## Training Details
181
+
182
+ ### Training Dataset
183
+
184
+ #### pairs_with_scores_v32
185
+
186
+ * Dataset: [pairs_with_scores_v32](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32) at [d05ef20](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32/tree/d05ef20215d7229707966f04c9d3c5b3322d831e)
187
+ * Size: 43,059,870 training samples
188
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
189
+ * Approximate statistics based on the first 1000 samples:
190
+ | | sentence1 | sentence2 | score |
191
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
192
+ | type | string | string | float |
193
+ | details | <ul><li>min: 3 tokens</li><li>mean: 6.09 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 41.16 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.28</li><li>max: 1.0</li></ul> |
194
+ * Samples:
195
+ | sentence1 | sentence2 | score |
196
+ |:---------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
197
+ | <code>ovenware</code> | <code>linen shirt with button down and stand up collar, men shirt long sleeves shirt men s tops button down shirt linen shirt shirt stand up collar shirt, button down shirt linen shirt shirt stand up collar shirt, gender men mix and match generic shirt s types of fashion styles casual neckline stand up collar closure style button down sleeve style long sleeves fit regular fit linen white solid occasion casual season spring summer, linen shirt with button down long sleeves and stand up collar</code> | <code>0.0</code> |
198
+ | <code>fries antipastoes</code> | <code>tealight candle holder, home and garden home decor home decor accessory home decor accessory, rings organizer coins organizer ceramic powder holder paper holder sand holder candle holder holder home decor tealight candle holder, candle holder holder home decor tealight candle holder, create a cozy atmosphere with this tealight candle holder. not just for candles this compact holder doubles as a convenient organizer for small items like rings coins or office supplies. all our products are made of our own mixture of ceramic powder paper sand and other sustainable materials to ensure its strength and sustainability. weight 120 gm</code> | <code>0.0</code> |
199
+ | <code>adults bikes hybrid</code> | <code>sea salt body exfoliate and polish</code> | <code>0.0</code> |
200
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
201
+ ```json
202
+ {
203
+ "scale": 20.0,
204
+ "similarity_fct": "pairwise_cos_sim"
205
+ }
206
+ ```
207
+
208
+ ### Evaluation Dataset
209
+
210
+ #### pairs_with_scores_v32
211
+
212
+ * Dataset: [pairs_with_scores_v32](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32) at [d05ef20](https://huggingface.co/datasets/KhaledReda/pairs_with_scores_v32/tree/d05ef20215d7229707966f04c9d3c5b3322d831e)
213
+ * Size: 216,382 evaluation samples
214
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
215
+ * Approximate statistics based on the first 1000 samples:
216
+ | | sentence1 | sentence2 | score |
217
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
218
+ | type | string | string | float |
219
+ | details | <ul><li>min: 3 tokens</li><li>mean: 6.12 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 41.67 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.27</li><li>max: 1.0</li></ul> |
220
+ * Samples:
221
+ | sentence1 | sentence2 | score |
222
+ |:--------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
223
+ | <code>cheese sauce fries</code> | <code>printed cotton scarf with fabric tassels, women scarf voile scarf shawls fabric scarf printed scarf scarf tassels scarf, fabric scarf printed scarf scarf tassels scarf, gender women mix and match generic scarf cotton black printed, printed voile scarf with fabric tassels</code> | <code>0.0</code> |
224
+ | <code>camel bag</code> | <code>camel tank top</code> | <code>0.25</code> |
225
+ | <code>scrunchie</code> | <code>spoiled babe set</code> | <code>0.75</code> |
226
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
227
+ ```json
228
+ {
229
+ "scale": 20.0,
230
+ "similarity_fct": "pairwise_cos_sim"
231
+ }
232
+ ```
233
+
234
+ ### Training Hyperparameters
235
+ #### Non-Default Hyperparameters
236
+
237
+ - `eval_strategy`: steps
238
+ - `per_device_train_batch_size`: 128
239
+ - `per_device_eval_batch_size`: 128
240
+ - `learning_rate`: 2e-05
241
+ - `num_train_epochs`: 1
242
+ - `warmup_ratio`: 0.1
243
+ - `fp16`: True
244
+
245
+ #### All Hyperparameters
246
+ <details><summary>Click to expand</summary>
247
+
248
+ - `overwrite_output_dir`: False
249
+ - `do_predict`: False
250
+ - `eval_strategy`: steps
251
+ - `prediction_loss_only`: True
252
+ - `per_device_train_batch_size`: 128
253
+ - `per_device_eval_batch_size`: 128
254
+ - `per_gpu_train_batch_size`: None
255
+ - `per_gpu_eval_batch_size`: None
256
+ - `gradient_accumulation_steps`: 1
257
+ - `eval_accumulation_steps`: None
258
+ - `torch_empty_cache_steps`: None
259
+ - `learning_rate`: 2e-05
260
+ - `weight_decay`: 0.0
261
+ - `adam_beta1`: 0.9
262
+ - `adam_beta2`: 0.999
263
+ - `adam_epsilon`: 1e-08
264
+ - `max_grad_norm`: 1.0
265
+ - `num_train_epochs`: 1
266
+ - `max_steps`: -1
267
+ - `lr_scheduler_type`: linear
268
+ - `lr_scheduler_kwargs`: {}
269
+ - `warmup_ratio`: 0.1
270
+ - `warmup_steps`: 0
271
+ - `log_level`: passive
272
+ - `log_level_replica`: warning
273
+ - `log_on_each_node`: True
274
+ - `logging_nan_inf_filter`: True
275
+ - `save_safetensors`: True
276
+ - `save_on_each_node`: False
277
+ - `save_only_model`: False
278
+ - `restore_callback_states_from_checkpoint`: False
279
+ - `no_cuda`: False
280
+ - `use_cpu`: False
281
+ - `use_mps_device`: False
282
+ - `seed`: 42
283
+ - `data_seed`: None
284
+ - `jit_mode_eval`: False
285
+ - `use_ipex`: False
286
+ - `bf16`: False
287
+ - `fp16`: True
288
+ - `fp16_opt_level`: O1
289
+ - `half_precision_backend`: auto
290
+ - `bf16_full_eval`: False
291
+ - `fp16_full_eval`: False
292
+ - `tf32`: None
293
+ - `local_rank`: 0
294
+ - `ddp_backend`: None
295
+ - `tpu_num_cores`: None
296
+ - `tpu_metrics_debug`: False
297
+ - `debug`: []
298
+ - `dataloader_drop_last`: False
299
+ - `dataloader_num_workers`: 0
300
+ - `dataloader_prefetch_factor`: None
301
+ - `past_index`: -1
302
+ - `disable_tqdm`: False
303
+ - `remove_unused_columns`: True
304
+ - `label_names`: None
305
+ - `load_best_model_at_end`: False
306
+ - `ignore_data_skip`: False
307
+ - `fsdp`: []
308
+ - `fsdp_min_num_params`: 0
309
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
310
+ - `fsdp_transformer_layer_cls_to_wrap`: None
311
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
312
+ - `deepspeed`: None
313
+ - `label_smoothing_factor`: 0.0
314
+ - `optim`: adamw_torch
315
+ - `optim_args`: None
316
+ - `adafactor`: False
317
+ - `group_by_length`: False
318
+ - `length_column_name`: length
319
+ - `ddp_find_unused_parameters`: None
320
+ - `ddp_bucket_cap_mb`: None
321
+ - `ddp_broadcast_buffers`: False
322
+ - `dataloader_pin_memory`: True
323
+ - `dataloader_persistent_workers`: False
324
+ - `skip_memory_metrics`: True
325
+ - `use_legacy_prediction_loop`: False
326
+ - `push_to_hub`: False
327
+ - `resume_from_checkpoint`: None
328
+ - `hub_model_id`: None
329
+ - `hub_strategy`: every_save
330
+ - `hub_private_repo`: None
331
+ - `hub_always_push`: False
332
+ - `hub_revision`: None
333
+ - `gradient_checkpointing`: False
334
+ - `gradient_checkpointing_kwargs`: None
335
+ - `include_inputs_for_metrics`: False
336
+ - `include_for_metrics`: []
337
+ - `eval_do_concat_batches`: True
338
+ - `fp16_backend`: auto
339
+ - `push_to_hub_model_id`: None
340
+ - `push_to_hub_organization`: None
341
+ - `mp_parameters`:
342
+ - `auto_find_batch_size`: False
343
+ - `full_determinism`: False
344
+ - `torchdynamo`: None
345
+ - `ray_scope`: last
346
+ - `ddp_timeout`: 1800
347
+ - `torch_compile`: False
348
+ - `torch_compile_backend`: None
349
+ - `torch_compile_mode`: None
350
+ - `include_tokens_per_second`: False
351
+ - `include_num_input_tokens_seen`: False
352
+ - `neftune_noise_alpha`: None
353
+ - `optim_target_modules`: None
354
+ - `batch_eval_metrics`: False
355
+ - `eval_on_start`: False
356
+ - `use_liger_kernel`: False
357
+ - `liger_kernel_config`: None
358
+ - `eval_use_gather_object`: False
359
+ - `average_tokens_across_devices`: False
360
+ - `prompts`: None
361
+ - `batch_sampler`: batch_sampler
362
+ - `multi_dataset_batch_sampler`: proportional
363
+ - `router_mapping`: {}
364
+ - `learning_rate_mapping`: {}
365
+
366
+ </details>
367
+
368
+ ### Training Logs
369
+ <details><summary>Click to expand</summary>
370
+
371
+ | Epoch | Step | Training Loss |
372
+ |:------:|:------:|:-------------:|
373
+ | 0.8505 | 286100 | 6.328 |
374
+ | 0.8508 | 286200 | 6.4337 |
375
+ | 0.8511 | 286300 | 6.3625 |
376
+ | 0.8514 | 286400 | 6.3524 |
377
+ | 0.8516 | 286500 | 6.324 |
378
+ | 0.8519 | 286600 | 6.3453 |
379
+ | 0.8522 | 286700 | 6.4266 |
380
+ | 0.8525 | 286800 | 6.3666 |
381
+ | 0.8528 | 286900 | 6.376 |
382
+ | 0.8531 | 287000 | 6.396 |
383
+ | 0.8534 | 287100 | 6.3725 |
384
+ | 0.8537 | 287200 | 6.3696 |
385
+ | 0.8540 | 287300 | 6.4024 |
386
+ | 0.8543 | 287400 | 6.3841 |
387
+ | 0.8546 | 287500 | 6.3344 |
388
+ | 0.8549 | 287600 | 6.4528 |
389
+ | 0.8552 | 287700 | 6.4161 |
390
+ | 0.8555 | 287800 | 6.3852 |
391
+ | 0.8558 | 287900 | 6.3908 |
392
+ | 0.8561 | 288000 | 6.3747 |
393
+ | 0.8564 | 288100 | 6.3385 |
394
+ | 0.8567 | 288200 | 6.3625 |
395
+ | 0.8570 | 288300 | 6.4054 |
396
+ | 0.8573 | 288400 | 6.3758 |
397
+ | 0.8576 | 288500 | 6.3604 |
398
+ | 0.8579 | 288600 | 6.3866 |
399
+ | 0.8582 | 288700 | 6.4301 |
400
+ | 0.8585 | 288800 | 6.4232 |
401
+ | 0.8588 | 288900 | 6.3781 |
402
+ | 0.8591 | 289000 | 6.4106 |
403
+ | 0.8594 | 289100 | 6.3579 |
404
+ | 0.8597 | 289200 | 6.3691 |
405
+ | 0.8600 | 289300 | 6.4222 |
406
+ | 0.8603 | 289400 | 6.3994 |
407
+ | 0.8606 | 289500 | 6.3615 |
408
+ | 0.8609 | 289600 | 6.406 |
409
+ | 0.8612 | 289700 | 6.3942 |
410
+ | 0.8615 | 289800 | 6.3811 |
411
+ | 0.8618 | 289900 | 6.3702 |
412
+ | 0.8621 | 290000 | 6.3925 |
413
+ | 0.8624 | 290100 | 6.4173 |
414
+ | 0.8626 | 290200 | 6.4267 |
415
+ | 0.8629 | 290300 | 6.3989 |
416
+ | 0.8632 | 290400 | 6.3715 |
417
+ | 0.8635 | 290500 | 6.3582 |
418
+ | 0.8638 | 290600 | 6.3659 |
419
+ | 0.8641 | 290700 | 6.3671 |
420
+ | 0.8644 | 290800 | 6.3837 |
421
+ | 0.8647 | 290900 | 6.4486 |
422
+ | 0.8650 | 291000 | 6.3993 |
423
+ | 0.8653 | 291100 | 6.3985 |
424
+ | 0.8656 | 291200 | 6.3982 |
425
+ | 0.8659 | 291300 | 6.3297 |
426
+ | 0.8662 | 291400 | 6.3726 |
427
+ | 0.8665 | 291500 | 6.3452 |
428
+ | 0.8668 | 291600 | 6.3704 |
429
+ | 0.8671 | 291700 | 6.3002 |
430
+ | 0.8674 | 291800 | 6.4093 |
431
+ | 0.8677 | 291900 | 6.4129 |
432
+ | 0.8680 | 292000 | 6.4081 |
433
+ | 0.8683 | 292100 | 6.4361 |
434
+ | 0.8686 | 292200 | 6.4205 |
435
+ | 0.8689 | 292300 | 6.4255 |
436
+ | 0.8692 | 292400 | 6.4122 |
437
+ | 0.8695 | 292500 | 6.4621 |
438
+ | 0.8698 | 292600 | 6.364 |
439
+ | 0.8701 | 292700 | 6.4073 |
440
+ | 0.8704 | 292800 | 6.3409 |
441
+ | 0.8707 | 292900 | 6.3107 |
442
+ | 0.8710 | 293000 | 6.3727 |
443
+ | 0.8713 | 293100 | 6.3447 |
444
+ | 0.8716 | 293200 | 6.4191 |
445
+ | 0.8719 | 293300 | 6.3492 |
446
+ | 0.8722 | 293400 | 6.3553 |
447
+ | 0.8725 | 293500 | 6.3768 |
448
+ | 0.8728 | 293600 | 6.3378 |
449
+ | 0.8731 | 293700 | 6.3998 |
450
+ | 0.8733 | 293800 | 6.438 |
451
+ | 0.8736 | 293900 | 6.34 |
452
+ | 0.8739 | 294000 | 6.4061 |
453
+ | 0.8742 | 294100 | 6.4552 |
454
+ | 0.8745 | 294200 | 6.2997 |
455
+ | 0.8748 | 294300 | 6.4018 |
456
+ | 0.8751 | 294400 | 6.412 |
457
+ | 0.8754 | 294500 | 6.3756 |
458
+ | 0.8757 | 294600 | 6.3983 |
459
+ | 0.8760 | 294700 | 6.3758 |
460
+ | 0.8763 | 294800 | 6.3707 |
461
+ | 0.8766 | 294900 | 6.3802 |
462
+ | 0.8769 | 295000 | 6.3767 |
463
+ | 0.8772 | 295100 | 6.4037 |
464
+ | 0.8775 | 295200 | 6.3425 |
465
+ | 0.8778 | 295300 | 6.3655 |
466
+ | 0.8781 | 295400 | 6.4575 |
467
+ | 0.8784 | 295500 | 6.4242 |
468
+ | 0.8787 | 295600 | 6.365 |
469
+ | 0.8790 | 295700 | 6.373 |
470
+ | 0.8793 | 295800 | 6.3766 |
471
+ | 0.8796 | 295900 | 6.3835 |
472
+ | 0.8799 | 296000 | 6.4327 |
473
+ | 0.8802 | 296100 | 6.3799 |
474
+ | 0.8805 | 296200 | 6.41 |
475
+ | 0.8808 | 296300 | 6.3092 |
476
+ | 0.8811 | 296400 | 6.4133 |
477
+ | 0.8814 | 296500 | 6.3952 |
478
+ | 0.8817 | 296600 | 6.3937 |
479
+ | 0.8820 | 296700 | 6.3204 |
480
+ | 0.8823 | 296800 | 6.4072 |
481
+ | 0.8826 | 296900 | 6.3577 |
482
+ | 0.8829 | 297000 | 6.3966 |
483
+ | 0.8832 | 297100 | 6.3906 |
484
+ | 0.8835 | 297200 | 6.3871 |
485
+ | 0.8838 | 297300 | 6.3546 |
486
+ | 0.8841 | 297400 | 6.3874 |
487
+ | 0.8843 | 297500 | 6.4042 |
488
+ | 0.8846 | 297600 | 6.3963 |
489
+ | 0.8849 | 297700 | 6.3708 |
490
+ | 0.8852 | 297800 | 6.3269 |
491
+ | 0.8855 | 297900 | 6.3554 |
492
+ | 0.8858 | 298000 | 6.3884 |
493
+ | 0.8861 | 298100 | 6.3645 |
494
+ | 0.8864 | 298200 | 6.4203 |
495
+ | 0.8867 | 298300 | 6.3827 |
496
+ | 0.8870 | 298400 | 6.3947 |
497
+ | 0.8873 | 298500 | 6.3989 |
498
+ | 0.8876 | 298600 | 6.3454 |
499
+ | 0.8879 | 298700 | 6.4956 |
500
+ | 0.8882 | 298800 | 6.3975 |
501
+ | 0.8885 | 298900 | 6.3643 |
502
+ | 0.8888 | 299000 | 6.3606 |
503
+ | 0.8891 | 299100 | 6.4184 |
504
+ | 0.8894 | 299200 | 6.3975 |
505
+ | 0.8897 | 299300 | 6.3836 |
506
+ | 0.8900 | 299400 | 6.3696 |
507
+ | 0.8903 | 299500 | 6.3567 |
508
+ | 0.8906 | 299600 | 6.3142 |
509
+ | 0.8909 | 299700 | 6.3703 |
510
+ | 0.8912 | 299800 | 6.3126 |
511
+ | 0.8915 | 299900 | 6.3847 |
512
+ | 0.8918 | 300000 | 6.3761 |
513
+ | 0.8921 | 300100 | 6.3673 |
514
+ | 0.8924 | 300200 | 6.3426 |
515
+ | 0.8927 | 300300 | 6.4366 |
516
+ | 0.8930 | 300400 | 6.3626 |
517
+ | 0.8933 | 300500 | 6.3549 |
518
+ | 0.8936 | 300600 | 6.3696 |
519
+ | 0.8939 | 300700 | 6.4061 |
520
+ | 0.8942 | 300800 | 6.4622 |
521
+ | 0.8945 | 300900 | 6.3447 |
522
+ | 0.8948 | 301000 | 6.386 |
523
+ | 0.8950 | 301100 | 6.3719 |
524
+ | 0.8953 | 301200 | 6.4033 |
525
+ | 0.8956 | 301300 | 6.3635 |
526
+ | 0.8959 | 301400 | 6.3179 |
527
+ | 0.8962 | 301500 | 6.3273 |
528
+ | 0.8965 | 301600 | 6.4156 |
529
+ | 0.8968 | 301700 | 6.3601 |
530
+ | 0.8971 | 301800 | 6.3754 |
531
+ | 0.8974 | 301900 | 6.4151 |
532
+ | 0.8977 | 302000 | 6.3435 |
533
+ | 0.8980 | 302100 | 6.3745 |
534
+ | 0.8983 | 302200 | 6.3563 |
535
+ | 0.8986 | 302300 | 6.3999 |
536
+ | 0.8989 | 302400 | 6.349 |
537
+ | 0.8992 | 302500 | 6.3886 |
538
+ | 0.8995 | 302600 | 6.387 |
539
+ | 0.8998 | 302700 | 6.3786 |
540
+ | 0.9001 | 302800 | 6.4126 |
541
+ | 0.9004 | 302900 | 6.3439 |
542
+ | 0.9007 | 303000 | 6.3376 |
543
+ | 0.9010 | 303100 | 6.3512 |
544
+ | 0.9013 | 303200 | 6.4281 |
545
+ | 0.9016 | 303300 | 6.3999 |
546
+ | 0.9019 | 303400 | 6.3757 |
547
+ | 0.9022 | 303500 | 6.3297 |
548
+ | 0.9025 | 303600 | 6.4042 |
549
+ | 0.9028 | 303700 | 6.3001 |
550
+ | 0.9031 | 303800 | 6.3028 |
551
+ | 0.9034 | 303900 | 6.3969 |
552
+ | 0.9037 | 304000 | 6.2983 |
553
+ | 0.9040 | 304100 | 6.3043 |
554
+ | 0.9043 | 304200 | 6.4063 |
555
+ | 0.9046 | 304300 | 6.3829 |
556
+ | 0.9049 | 304400 | 6.3786 |
557
+ | 0.9052 | 304500 | 6.4584 |
558
+ | 0.9055 | 304600 | 6.4324 |
559
+ | 0.9058 | 304700 | 6.4425 |
560
+ | 0.9060 | 304800 | 6.3995 |
561
+ | 0.9063 | 304900 | 6.3952 |
562
+ | 0.9066 | 305000 | 6.4232 |
563
+ | 0.9069 | 305100 | 6.3573 |
564
+ | 0.9072 | 305200 | 6.3585 |
565
+ | 0.9075 | 305300 | 6.4424 |
566
+ | 0.9078 | 305400 | 6.2995 |
567
+ | 0.9081 | 305500 | 6.3571 |
568
+ | 0.9084 | 305600 | 6.3175 |
569
+ | 0.9087 | 305700 | 6.3624 |
570
+ | 0.9090 | 305800 | 6.3954 |
571
+ | 0.9093 | 305900 | 6.4152 |
572
+ | 0.9096 | 306000 | 6.4059 |
573
+ | 0.9099 | 306100 | 6.4016 |
574
+ | 0.9102 | 306200 | 6.3976 |
575
+ | 0.9105 | 306300 | 6.3498 |
576
+ | 0.9108 | 306400 | 6.3638 |
577
+ | 0.9111 | 306500 | 6.4264 |
578
+ | 0.9114 | 306600 | 6.3982 |
579
+ | 0.9117 | 306700 | 6.3428 |
580
+ | 0.9120 | 306800 | 6.3601 |
581
+ | 0.9123 | 306900 | 6.3875 |
582
+ | 0.9126 | 307000 | 6.4401 |
583
+ | 0.9129 | 307100 | 6.3931 |
584
+ | 0.9132 | 307200 | 6.3875 |
585
+ | 0.9135 | 307300 | 6.3293 |
586
+ | 0.9138 | 307400 | 6.3539 |
587
+ | 0.9141 | 307500 | 6.3619 |
588
+ | 0.9144 | 307600 | 6.364 |
589
+ | 0.9147 | 307700 | 6.4567 |
590
+ | 0.9150 | 307800 | 6.393 |
591
+ | 0.9153 | 307900 | 6.4153 |
592
+ | 0.9156 | 308000 | 6.3644 |
593
+ | 0.9159 | 308100 | 6.3899 |
594
+ | 0.9162 | 308200 | 6.3986 |
595
+ | 0.9165 | 308300 | 6.3766 |
596
+ | 0.9167 | 308400 | 6.4279 |
597
+ | 0.9170 | 308500 | 6.3578 |
598
+ | 0.9173 | 308600 | 6.3891 |
599
+ | 0.9176 | 308700 | 6.3029 |
600
+ | 0.9179 | 308800 | 6.3688 |
601
+ | 0.9182 | 308900 | 6.3787 |
602
+ | 0.9185 | 309000 | 6.3935 |
603
+ | 0.9188 | 309100 | 6.4319 |
604
+ | 0.9191 | 309200 | 6.2945 |
605
+ | 0.9194 | 309300 | 6.3871 |
606
+ | 0.9197 | 309400 | 6.3338 |
607
+ | 0.9200 | 309500 | 6.3654 |
608
+ | 0.9203 | 309600 | 6.4207 |
609
+ | 0.9206 | 309700 | 6.3809 |
610
+ | 0.9209 | 309800 | 6.3798 |
611
+ | 0.9212 | 309900 | 6.3974 |
612
+ | 0.9215 | 310000 | 6.334 |
613
+ | 0.9218 | 310100 | 6.376 |
614
+ | 0.9221 | 310200 | 6.3939 |
615
+ | 0.9224 | 310300 | 6.4144 |
616
+ | 0.9227 | 310400 | 6.4375 |
617
+ | 0.9230 | 310500 | 6.316 |
618
+ | 0.9233 | 310600 | 6.3346 |
619
+ | 0.9236 | 310700 | 6.3766 |
620
+ | 0.9239 | 310800 | 6.3564 |
621
+ | 0.9242 | 310900 | 6.3643 |
622
+ | 0.9245 | 311000 | 6.3627 |
623
+ | 0.9248 | 311100 | 6.4283 |
624
+ | 0.9251 | 311200 | 6.3179 |
625
+ | 0.9254 | 311300 | 6.4113 |
626
+ | 0.9257 | 311400 | 6.3703 |
627
+ | 0.9260 | 311500 | 6.3388 |
628
+ | 0.9263 | 311600 | 6.3997 |
629
+ | 0.9266 | 311700 | 6.3813 |
630
+ | 0.9269 | 311800 | 6.3723 |
631
+ | 0.9272 | 311900 | 6.3556 |
632
+ | 0.9275 | 312000 | 6.3522 |
633
+ | 0.9277 | 312100 | 6.3661 |
634
+ | 0.9280 | 312200 | 6.405 |
635
+ | 0.9283 | 312300 | 6.4031 |
636
+ | 0.9286 | 312400 | 6.4125 |
637
+ | 0.9289 | 312500 | 6.3225 |
638
+ | 0.9292 | 312600 | 6.3887 |
639
+ | 0.9295 | 312700 | 6.3368 |
640
+ | 0.9298 | 312800 | 6.3323 |
641
+ | 0.9301 | 312900 | 6.4433 |
642
+ | 0.9304 | 313000 | 6.4155 |
643
+ | 0.9307 | 313100 | 6.3448 |
644
+ | 0.9310 | 313200 | 6.3775 |
645
+ | 0.9313 | 313300 | 6.3736 |
646
+ | 0.9316 | 313400 | 6.3611 |
647
+ | 0.9319 | 313500 | 6.3988 |
648
+ | 0.9322 | 313600 | 6.3243 |
649
+ | 0.9325 | 313700 | 6.4137 |
650
+ | 0.9328 | 313800 | 6.3663 |
651
+ | 0.9331 | 313900 | 6.3742 |
652
+ | 0.9334 | 314000 | 6.4021 |
653
+ | 0.9337 | 314100 | 6.4171 |
654
+ | 0.9340 | 314200 | 6.3948 |
655
+ | 0.9343 | 314300 | 6.3916 |
656
+ | 0.9346 | 314400 | 6.365 |
657
+ | 0.9349 | 314500 | 6.3479 |
658
+ | 0.9352 | 314600 | 6.3588 |
659
+ | 0.9355 | 314700 | 6.3247 |
660
+ | 0.9358 | 314800 | 6.3584 |
661
+ | 0.9361 | 314900 | 6.3436 |
662
+ | 0.9364 | 315000 | 6.3958 |
663
+ | 0.9367 | 315100 | 6.3424 |
664
+ | 0.9370 | 315200 | 6.3814 |
665
+ | 0.9373 | 315300 | 6.3612 |
666
+ | 0.9376 | 315400 | 6.3889 |
667
+ | 0.9379 | 315500 | 6.3591 |
668
+ | 0.9382 | 315600 | 6.3856 |
669
+ | 0.9384 | 315700 | 6.3594 |
670
+ | 0.9387 | 315800 | 6.3737 |
671
+ | 0.9390 | 315900 | 6.4489 |
672
+ | 0.9393 | 316000 | 6.2902 |
673
+ | 0.9396 | 316100 | 6.3517 |
674
+ | 0.9399 | 316200 | 6.4662 |
675
+ | 0.9402 | 316300 | 6.3684 |
676
+ | 0.9405 | 316400 | 6.362 |
677
+ | 0.9408 | 316500 | 6.3492 |
678
+ | 0.9411 | 316600 | 6.4018 |
679
+ | 0.9414 | 316700 | 6.3709 |
680
+ | 0.9417 | 316800 | 6.4048 |
681
+ | 0.9420 | 316900 | 6.3547 |
682
+ | 0.9423 | 317000 | 6.2638 |
683
+ | 0.9426 | 317100 | 6.435 |
684
+ | 0.9429 | 317200 | 6.4028 |
685
+ | 0.9432 | 317300 | 6.39 |
686
+ | 0.9435 | 317400 | 6.3688 |
687
+ | 0.9438 | 317500 | 6.3801 |
688
+ | 0.9441 | 317600 | 6.3609 |
689
+ | 0.9444 | 317700 | 6.3583 |
690
+ | 0.9447 | 317800 | 6.3339 |
691
+ | 0.9450 | 317900 | 6.3804 |
692
+ | 0.9453 | 318000 | 6.3718 |
693
+ | 0.9456 | 318100 | 6.3434 |
694
+ | 0.9459 | 318200 | 6.3765 |
695
+ | 0.9462 | 318300 | 6.3468 |
696
+ | 0.9465 | 318400 | 6.3253 |
697
+ | 0.9468 | 318500 | 6.3868 |
698
+ | 0.9471 | 318600 | 6.3906 |
699
+ | 0.9474 | 318700 | 6.4371 |
700
+ | 0.9477 | 318800 | 6.3737 |
701
+ | 0.9480 | 318900 | 6.3332 |
702
+ | 0.9483 | 319000 | 6.3698 |
703
+ | 0.9486 | 319100 | 6.3748 |
704
+ | 0.9489 | 319200 | 6.4309 |
705
+ | 0.9492 | 319300 | 6.3757 |
706
+ | 0.9494 | 319400 | 6.3615 |
707
+ | 0.9497 | 319500 | 6.366 |
708
+ | 0.9500 | 319600 | 6.3574 |
709
+ | 0.9503 | 319700 | 6.3742 |
710
+ | 0.9506 | 319800 | 6.3461 |
711
+ | 0.9509 | 319900 | 6.3063 |
712
+ | 0.9512 | 320000 | 6.3504 |
713
+ | 0.9515 | 320100 | 6.4292 |
714
+ | 0.9518 | 320200 | 6.3603 |
715
+ | 0.9521 | 320300 | 6.3664 |
716
+ | 0.9524 | 320400 | 6.4065 |
717
+ | 0.9527 | 320500 | 6.3696 |
718
+ | 0.9530 | 320600 | 6.4512 |
719
+ | 0.9533 | 320700 | 6.3765 |
720
+ | 0.9536 | 320800 | 6.319 |
721
+ | 0.9539 | 320900 | 6.3873 |
722
+ | 0.9542 | 321000 | 6.4429 |
723
+ | 0.9545 | 321100 | 6.4334 |
724
+ | 0.9548 | 321200 | 6.3168 |
725
+ | 0.9551 | 321300 | 6.4112 |
726
+ | 0.9554 | 321400 | 6.4135 |
727
+ | 0.9557 | 321500 | 6.3718 |
728
+ | 0.9560 | 321600 | 6.393 |
729
+ | 0.9563 | 321700 | 6.331 |
730
+ | 0.9566 | 321800 | 6.3811 |
731
+ | 0.9569 | 321900 | 6.3748 |
732
+ | 0.9572 | 322000 | 6.4013 |
733
+ | 0.9575 | 322100 | 6.3281 |
734
+ | 0.9578 | 322200 | 6.3634 |
735
+ | 0.9581 | 322300 | 6.3473 |
736
+ | 0.9584 | 322400 | 6.3429 |
737
+ | 0.9587 | 322500 | 6.3837 |
738
+ | 0.9590 | 322600 | 6.3855 |
739
+ | 0.9593 | 322700 | 6.3825 |
740
+ | 0.9596 | 322800 | 6.4182 |
741
+ | 0.9599 | 322900 | 6.3611 |
742
+ | 0.9601 | 323000 | 6.4276 |
743
+ | 0.9604 | 323100 | 6.3329 |
744
+ | 0.9607 | 323200 | 6.3764 |
745
+ | 0.9610 | 323300 | 6.3382 |
746
+ | 0.9613 | 323400 | 6.3084 |
747
+ | 0.9616 | 323500 | 6.3884 |
748
+ | 0.9619 | 323600 | 6.3733 |
749
+ | 0.9622 | 323700 | 6.3145 |
750
+ | 0.9625 | 323800 | 6.4082 |
751
+ | 0.9628 | 323900 | 6.2616 |
752
+ | 0.9631 | 324000 | 6.3564 |
753
+ | 0.9634 | 324100 | 6.4159 |
754
+ | 0.9637 | 324200 | 6.3898 |
755
+ | 0.9640 | 324300 | 6.3522 |
756
+ | 0.9643 | 324400 | 6.3905 |
757
+ | 0.9646 | 324500 | 6.3628 |
758
+ | 0.9649 | 324600 | 6.3219 |
759
+ | 0.9652 | 324700 | 6.4094 |
760
+ | 0.9655 | 324800 | 6.4043 |
761
+ | 0.9658 | 324900 | 6.405 |
762
+ | 0.9661 | 325000 | 6.3272 |
763
+ | 0.9664 | 325100 | 6.3852 |
764
+ | 0.9667 | 325200 | 6.4279 |
765
+ | 0.9670 | 325300 | 6.385 |
766
+ | 0.9673 | 325400 | 6.432 |
767
+ | 0.9676 | 325500 | 6.4317 |
768
+ | 0.9679 | 325600 | 6.3754 |
769
+ | 0.9682 | 325700 | 6.4305 |
770
+ | 0.9685 | 325800 | 6.313 |
771
+ | 0.9688 | 325900 | 6.3338 |
772
+ | 0.9691 | 326000 | 6.4271 |
773
+ | 0.9694 | 326100 | 6.4092 |
774
+ | 0.9697 | 326200 | 6.3071 |
775
+ | 0.9700 | 326300 | 6.3712 |
776
+ | 0.9703 | 326400 | 6.3486 |
777
+ | 0.9706 | 326500 | 6.3041 |
778
+ | 0.9709 | 326600 | 6.3464 |
779
+ | 0.9711 | 326700 | 6.3351 |
780
+ | 0.9714 | 326800 | 6.3166 |
781
+ | 0.9717 | 326900 | 6.3343 |
782
+ | 0.9720 | 327000 | 6.403 |
783
+ | 0.9723 | 327100 | 6.3923 |
784
+ | 0.9726 | 327200 | 6.4203 |
785
+ | 0.9729 | 327300 | 6.3716 |
786
+ | 0.9732 | 327400 | 6.3341 |
787
+ | 0.9735 | 327500 | 6.3253 |
788
+ | 0.9738 | 327600 | 6.3648 |
789
+ | 0.9741 | 327700 | 6.4148 |
790
+ | 0.9744 | 327800 | 6.3431 |
791
+ | 0.9747 | 327900 | 6.3149 |
792
+ | 0.9750 | 328000 | 6.3697 |
793
+ | 0.9753 | 328100 | 6.3777 |
794
+ | 0.9756 | 328200 | 6.3446 |
795
+ | 0.9759 | 328300 | 6.3484 |
796
+ | 0.9762 | 328400 | 6.3118 |
797
+ | 0.9765 | 328500 | 6.3657 |
798
+ | 0.9768 | 328600 | 6.4045 |
799
+ | 0.9771 | 328700 | 6.3776 |
800
+ | 0.9774 | 328800 | 6.3609 |
801
+ | 0.9777 | 328900 | 6.3024 |
802
+ | 0.9780 | 329000 | 6.4298 |
803
+ | 0.9783 | 329100 | 6.3598 |
804
+ | 0.9786 | 329200 | 6.3555 |
805
+ | 0.9789 | 329300 | 6.3915 |
806
+ | 0.9792 | 329400 | 6.3807 |
807
+ | 0.9795 | 329500 | 6.2983 |
808
+ | 0.9798 | 329600 | 6.371 |
809
+ | 0.9801 | 329700 | 6.3647 |
810
+ | 0.9804 | 329800 | 6.3892 |
811
+ | 0.9807 | 329900 | 6.3543 |
812
+ | 0.9810 | 330000 | 6.4178 |
813
+ | 0.9813 | 330100 | 6.3228 |
814
+ | 0.9816 | 330200 | 6.3684 |
815
+ | 0.9818 | 330300 | 6.3711 |
816
+ | 0.9821 | 330400 | 6.3717 |
817
+ | 0.9824 | 330500 | 6.3976 |
818
+ | 0.9827 | 330600 | 6.3483 |
819
+ | 0.9830 | 330700 | 6.335 |
820
+ | 0.9833 | 330800 | 6.385 |
821
+ | 0.9836 | 330900 | 6.3772 |
822
+ | 0.9839 | 331000 | 6.3027 |
823
+ | 0.9842 | 331100 | 6.3634 |
824
+ | 0.9845 | 331200 | 6.3261 |
825
+ | 0.9848 | 331300 | 6.3708 |
826
+ | 0.9851 | 331400 | 6.3993 |
827
+ | 0.9854 | 331500 | 6.3759 |
828
+ | 0.9857 | 331600 | 6.3485 |
829
+ | 0.9860 | 331700 | 6.3717 |
830
+ | 0.9863 | 331800 | 6.3776 |
831
+ | 0.9866 | 331900 | 6.4366 |
832
+ | 0.9869 | 332000 | 6.4023 |
833
+ | 0.9872 | 332100 | 6.3978 |
834
+ | 0.9875 | 332200 | 6.3382 |
835
+ | 0.9878 | 332300 | 6.3474 |
836
+ | 0.9881 | 332400 | 6.4122 |
837
+ | 0.9884 | 332500 | 6.3809 |
838
+ | 0.9887 | 332600 | 6.322 |
839
+ | 0.9890 | 332700 | 6.344 |
840
+ | 0.9893 | 332800 | 6.2637 |
841
+ | 0.9896 | 332900 | 6.4016 |
842
+ | 0.9899 | 333000 | 6.3826 |
843
+ | 0.9902 | 333100 | 6.4467 |
844
+ | 0.9905 | 333200 | 6.4596 |
845
+ | 0.9908 | 333300 | 6.3065 |
846
+ | 0.9911 | 333400 | 6.4057 |
847
+ | 0.9914 | 333500 | 6.435 |
848
+ | 0.9917 | 333600 | 6.3398 |
849
+ | 0.9920 | 333700 | 6.3741 |
850
+ | 0.9923 | 333800 | 6.3069 |
851
+ | 0.9926 | 333900 | 6.3457 |
852
+ | 0.9928 | 334000 | 6.3884 |
853
+ | 0.9931 | 334100 | 6.4078 |
854
+ | 0.9934 | 334200 | 6.3242 |
855
+ | 0.9937 | 334300 | 6.3621 |
856
+ | 0.9940 | 334400 | 6.3515 |
857
+ | 0.9943 | 334500 | 6.4017 |
858
+ | 0.9946 | 334600 | 6.4629 |
859
+ | 0.9949 | 334700 | 6.3686 |
860
+ | 0.9952 | 334800 | 6.3224 |
861
+ | 0.9955 | 334900 | 6.386 |
862
+ | 0.9958 | 335000 | 6.3899 |
863
+ | 0.9961 | 335100 | 6.3488 |
864
+ | 0.9964 | 335200 | 6.4117 |
865
+ | 0.9967 | 335300 | 6.3988 |
866
+ | 0.9970 | 335400 | 6.3536 |
867
+ | 0.9973 | 335500 | 6.3861 |
868
+ | 0.9976 | 335600 | 6.3383 |
869
+ | 0.9979 | 335700 | 6.3848 |
870
+ | 0.9982 | 335800 | 6.4582 |
871
+ | 0.9985 | 335900 | 6.3452 |
872
+ | 0.9988 | 336000 | 6.3651 |
873
+ | 0.9991 | 336100 | 6.3704 |
874
+ | 0.9994 | 336200 | 6.3801 |
875
+ | 0.9997 | 336300 | 6.3701 |
876
+ | 1.0000 | 336400 | 6.4452 |
877
+
878
+ </details>
879
+
880
+ ### Framework Versions
881
+ - Python: 3.12.3
882
+ - Sentence Transformers: 5.1.0
883
+ - Transformers: 4.55.4
884
+ - PyTorch: 2.6.0+cu124
885
+ - Accelerate: 1.10.1
886
+ - Datasets: 4.0.0
887
+ - Tokenizers: 0.21.4
888
+
889
+ ## Citation
890
+
891
+ ### BibTeX
892
+
893
+ #### Sentence Transformers
894
+ ```bibtex
895
+ @inproceedings{reimers-2019-sentence-bert,
896
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
897
+ author = "Reimers, Nils and Gurevych, Iryna",
898
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
899
+ month = "11",
900
+ year = "2019",
901
+ publisher = "Association for Computational Linguistics",
902
+ url = "https://arxiv.org/abs/1908.10084",
903
+ }
904
+ ```
905
+
906
+ #### CoSENTLoss
907
+ ```bibtex
908
+ @online{kexuefm-8847,
909
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
910
+ author={Su Jianlin},
911
+ year={2022},
912
+ month={Jan},
913
+ url={https://kexue.fm/archives/8847},
914
+ }
915
+ ```
916
+
917
+ <!--
918
+ ## Glossary
919
+
920
+ *Clearly define terms in order to be accessible across audiences.*
921
+ -->
922
+
923
+ <!--
924
+ ## Model Card Authors
925
+
926
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
927
+ -->
928
+
929
+ <!--
930
+ ## Model Card Contact
931
+
932
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
933
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.55.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.0",
4
+ "transformers": "4.55.4",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b728441a78d9a166a21e80d1e6df0783bcdd569889dabf6d66c4b18b057b6a07
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff