IrvinTopi commited on
Commit
2f467ca
·
verified ·
1 Parent(s): 1f52183

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,805 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:12753278
9
+ - loss:MarginMSELoss
10
+ base_model: PaDaS-Lab/xlm-roberta-base-msmarco
11
+ widget:
12
+ - source_sentence: Hur gammal måste jag vara för att betta på Melodifestivalen?
13
+ sentences:
14
+ - För att registrera ett spelkonto och betta online måste du vara över 18 år i Sverige.
15
+ - Ja, för att delta i PrisPicks plattformen måste man vara minst 18 år gammal. Denna
16
+ åldersgräns kan vara högre i vissa jurisdiktioner, så det rekommenderas att potentiella
17
+ användare kontrollerar de specifika ålderskraven för deras plats.
18
+ - Bu sorunun cevabı barındırdığınız sistemin büyüklüğüne, trafiğine ve optimizasyonuna
19
+ göre değişmektedir. Çok iyi optimize edilmiş bir sisteminiz (script vb.) ve gelen
20
+ trafiği optimum düzeyde karşılayacak donanımınız varsa minimum düzeyde bir sunucu
21
+ yeterli olacaktır. Ancak iyi optimize edilmemiş bir sistem ve sunucu için farklı
22
+ alternatifler aramanız gerekebilir. En iyi sunucu nedir sorusunun cevabı, sisteminize
23
+ ve trafiğinize göre değişebilir.
24
+ - source_sentence: क्या लॉजिकल रीजनिंग यूजीसी नेट परीक्षा का हिस्सा है?
25
+ sentences:
26
+ - हां, लॉजिकल रीजनिंग यूजीसी नेट परीक्षा का हिस्सा है।
27
+ - 'Per il momento, non sono ancora entrate in vigore sul massimale minimo per le
28
+ polizze rc professionale medici. Teniamo conto però di una cosa: se si lavora
29
+ (e si è lavorato nei dieci anni precedenti) esclusivamente come dipendenti o specializzandi
30
+ presso l’SSN, dobbiamo sapere che la rivalsa massima dell’SSN sarà plafonata al
31
+ triplo del reddito annuo lordo del medico.
32
+
33
+ Se invece si lavora in libera professione, non c’è alcun limite. Consigliamo comunque
34
+ di scegliere massimali non inferiori al milione di euro.'
35
+ - यूजीसी नेट की परीक्षा साल में दो बार आयोजित की जाती है। प्रथम परीक्षा जून में
36
+ और द्वितीय परीक्षा दिसंबर महीने में नेशनल टेस्टिंग एजेंसी द्वारा आयोजित की जाती
37
+ है।
38
+ - source_sentence: Car Accident Lawyer in Denver, CO
39
+ sentences:
40
+ - A Vinsa Telêmaco Borba é uma empresa com sede no município de e fica localizada
41
+ na Al. Washington Luiz, 490 – Alto das Oliveiras – Telêmaco Borba – PR.
42
+ - When you are in need of a skilled car accident lawyer, a lawyer in the Denver,
43
+ CO area, don’t wait to talk to a lawyer from The Law Offices of Cliff Enten. With
44
+ years of legal experience, they have provided excellent results for their clients
45
+ in many types of personal injury areas. They are knowledgeable about the tactics
46
+ the other party will use to get the highest possible compensation amount. For
47
+ more information set up an appointment with them now.
48
+ - After seeking immediate medical attention for your injuries, you may have your
49
+ case evaluated by a professional Denver personal injury lawyer at Mintz Law Firm
50
+ and obtain expert legal advice. Regardless of the type of accident or event and
51
+ the apparent extent of your injuries, a lawyer can help you pursue the compensation
52
+ owed to you because the negligent parties are responsible for the damage. After
53
+ any car accident, slip and fall, or animal bite, don’t delay contacting our legal
54
+ team to help you strategize and take care of your case.
55
+ - source_sentence: Wie wird Blood Suckers gespielt?
56
+ sentences:
57
+ - Ja, die erste Version von Blood Suckers war so erfolgreich, dass es mittlerweile
58
+ sogar eine zweite Variante gibt. Der Automat Blood Suckers 2 wurde 2017 veröffentlicht.
59
+ Hier gibt es das gleiche Motto und 25 Gewinnlinien. Zwar wird die Geschichte diesmal
60
+ weiter erzählt.
61
+ - La bozza del prodotto ordinato ti arriverà entro 24 ore dal tuo acquisto tramite
62
+ e-mail. Se non riesci a visualizzarla ti consigliamo di controllare nello spam
63
+ della tua posta elettronica. Non ti preoccupare puoi trovare la tua bozza anche
64
+ nella sezione i miei ordini. In corrispondenza del prodotto acquistato troverai
65
+ un pulsante
66
+ - Sie spielen auf einem Raster von 5 Walzen und drei Reihen mit 25 Gewinnlinien.
67
+ Beim Blood Suckers Spiel gewinnen Sie von links nach rechts.
68
+ - source_sentence: Kiedy rozpoczyna się kurs?
69
+ sentences:
70
+ - 'Para convertirte en representante de ventas debes entender lo que son los productos
71
+ de tecnología o del sector turístico para vender de manera efectiva. Utiliza tus
72
+ conexiones y conocimiento, obtén clientes y genera ingresos. Debes familiarizarte
73
+ con los procesos de ventas y marketing de productos de tecnología y/o de viaje,
74
+ y estar activamente preparado para promover la plataforma moonstride.
75
+
76
+ Para unirte a nuestro programa, tu negocio debe ser legítimo y estar bien establecido.
77
+
78
+ moonstride realizará un proceso de selección para asegurar la calidad de nuestros
79
+ representantes'
80
+ - Nowe kursy grupowe zaczynają się w połowie lutego. Dokładna data jest przekazana
81
+ kursantom tydzień przed rozpoczęciem kursu.
82
+ - Kurs startuje dla Ciebie w momencie, kiedy się na niego zapiszesz. Chwilę później
83
+ otrzymasz pierwszą lekcję oraz obiecany do niej BONUS w postaci e-booka „100+
84
+ nieoczywistych pomysłów na e-sklep”. Kolejne lekcje będziemy wysyłać co 3 dni.
85
+ pipeline_tag: sentence-similarity
86
+ library_name: sentence-transformers
87
+ ---
88
+
89
+ # SentenceTransformer based on PaDaS-Lab/xlm-roberta-base-msmarco
90
+
91
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [PaDaS-Lab/xlm-roberta-base-msmarco](https://huggingface.co/PaDaS-Lab/xlm-roberta-base-msmarco). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Sentence Transformer
97
+ - **Base model:** [PaDaS-Lab/xlm-roberta-base-msmarco](https://huggingface.co/PaDaS-Lab/xlm-roberta-base-msmarco) <!-- at revision cd02f4c38b71baa0dc6b3fcdd86a3b6bd407ef55 -->
98
+ - **Maximum Sequence Length:** 512 tokens
99
+ - **Output Dimensionality:** 768 dimensions
100
+ - **Similarity Function:** Cosine Similarity
101
+ <!-- - **Training Dataset:** Unknown -->
102
+ <!-- - **Language:** Unknown -->
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
109
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
110
+
111
+ ### Full Model Architecture
112
+
113
+ ```
114
+ SentenceTransformer(
115
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
116
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
117
+ )
118
+ ```
119
+
120
+ ## Usage
121
+
122
+ ### Direct Usage (Sentence Transformers)
123
+
124
+ First install the Sentence Transformers library:
125
+
126
+ ```bash
127
+ pip install -U sentence-transformers
128
+ ```
129
+
130
+ Then you can load this model and run inference.
131
+ ```python
132
+ from sentence_transformers import SentenceTransformer
133
+
134
+ # Download from the 🤗 Hub
135
+ model = SentenceTransformer("sentence_transformers_model_id")
136
+ # Run inference
137
+ sentences = [
138
+ 'Kiedy rozpoczyna się kurs?',
139
+ 'Kurs startuje dla Ciebie w momencie, kiedy się na niego zapiszesz. Chwilę później otrzymasz pierwszą lekcję oraz obiecany do niej BONUS w postaci e-booka „100+ nieoczywistych pomysłów na e-sklep”. Kolejne lekcje będziemy wysyłać co 3 dni.',
140
+ 'Nowe kursy grupowe zaczynają się w połowie lutego. Dokładna data jest przekazana kursantom tydzień przed rozpoczęciem kursu.',
141
+ ]
142
+ embeddings = model.encode(sentences)
143
+ print(embeddings.shape)
144
+ # [3, 768]
145
+
146
+ # Get the similarity scores for the embeddings
147
+ similarities = model.similarity(embeddings, embeddings)
148
+ print(similarities)
149
+ # tensor([[1.0000, 0.9920, 0.9929],
150
+ # [0.9920, 1.0000, 0.9964],
151
+ # [0.9929, 0.9964, 1.0000]])
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ <!--
179
+ ## Bias, Risks and Limitations
180
+
181
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
182
+ -->
183
+
184
+ <!--
185
+ ### Recommendations
186
+
187
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
188
+ -->
189
+
190
+ ## Training Details
191
+
192
+ ### Training Dataset
193
+
194
+ #### Unnamed Dataset
195
+
196
+ * Size: 12,753,278 training samples
197
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, <code>sentence_2</code>, and <code>label</code>
198
+ * Approximate statistics based on the first 1000 samples:
199
+ | | sentence_0 | sentence_1 | sentence_2 | label |
200
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------|
201
+ | type | string | string | string | float |
202
+ | details | <ul><li>min: 6 tokens</li><li>mean: 15.49 tokens</li><li>max: 119 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 74.63 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 102.27 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: -0.97</li><li>mean: 0.43</li><li>max: 1.0</li></ul> |
203
+ * Samples:
204
+ | sentence_0 | sentence_1 | sentence_2 | label |
205
+ |:-----------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------|
206
+ | <code>Как найти актуальное зеркало Betwinner?</code> | <code>Букмекер старается обеспечить доступ к сайту, поэтому ссылки на зеркала обновляются ежедневно. Чтобы быть в курсе всех новостей, рекомендуется подписаться на почтовую рассылку и соцсети.</code> | <code>Зеркало BetWinner можно найти в интернете</code> | <code>-0.9661979675292969</code> |
207
+ | <code>Jakie są minimalne zakłady w bakaracie?</code> | <code>Minimalny zakład w bakaracie zależy od konkretnej gry, w którą grasz. Mini Baccarat ma zwykle niskie limity zakładów, co czyni go atrakcyjnym dla nowych graczy. Istnieją też wersje gry w bakarata dla high-rollerów, które nakładają wyższy minimalny zakład.</code> | <code>Bakarat to jedna z popularniejszych gier hazardowych. To także najprostsza karciana gra kasynowa. Bakarat charakteryzuje się także niską przewagą kasyna nad graczem. Co za tym idzie stawki wygranych w niej nie są wysokie (najlepiej wyceniany jest zakład na remis, ale wtedy przewaga kasyna bardzo wzrasta - 14,44 proc. przy grze 6 taliami kart i 14,36 proc. przy grze 8 taliami kart). Przy zakładzie na gracza kasyno ma przewagę na poziomie 1,24 proc., a przy zakładzie na bankiera - na poziomie 1,06 proc.</code> | <code>0.579345703125</code> |
208
+ | <code>Come scegliere il massimale assicurazione professionale medici?</code> | <code>Per il momento, non sono ancora entrate in vigore sul massimale minimo per le polizze rc professionale medici. Teniamo conto però di una cosa: se si lavora (e si è lavorato nei dieci anni precedenti) esclusivamente come dipendenti o specializzandi presso l’SSN, dobbiamo sapere che la rivalsa massima dell’SSN sarà plafonata al triplo del reddito annuo lordo del medico.<br>Se invece si lavora in libera professione, non c’è alcun limite. Consigliamo comunque di scegliere massimali non inferiori al milione di euro.</code> | <code>È un modello alternativo al modello standard dell’assicurazione malattia di base. Usufruite delle stesse prestazioni del modello standard dell’assicurazione malattia di base, ma pagate un premio inferiore. In cambio, accettate di consultare in primo luogo il medico di famiglia che avete scelto. Il medico di famiglia, chiamato anche «medico di primo ricorso (MPR)», vi cura e, se necessario, vi indirizza verso uno specialista. Ciò permette di evitare inutili consulti e contribuisce a ridurre i costi sanitari. Se il vostro medico di famiglia o un altro medico vi indirizza verso uno specialista, dovete chiedere al medico di rilasciarvi un attestato, chiamato anche «buono di delega». Alcuni medici ce lo inviano elettronicamente. In caso contrario, potete chiederlo al medico che vi ha raccomandato lo specialista (basta una semplice annotazione firmata, con indicati il tipo di specialista raccomandato e la durata di validità dell’attestato). Potete inviarci tale documento per posta o tramite ...</code> | <code>0.9451904296875</code> |
209
+ * Loss: [<code>MarginMSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#marginmseloss)
210
+
211
+ ### Training Hyperparameters
212
+ #### Non-Default Hyperparameters
213
+
214
+ - `per_device_train_batch_size`: 64
215
+ - `per_device_eval_batch_size`: 64
216
+ - `num_train_epochs`: 1
217
+ - `fp16`: True
218
+ - `multi_dataset_batch_sampler`: round_robin
219
+
220
+ #### All Hyperparameters
221
+ <details><summary>Click to expand</summary>
222
+
223
+ - `overwrite_output_dir`: False
224
+ - `do_predict`: False
225
+ - `eval_strategy`: no
226
+ - `prediction_loss_only`: True
227
+ - `per_device_train_batch_size`: 64
228
+ - `per_device_eval_batch_size`: 64
229
+ - `per_gpu_train_batch_size`: None
230
+ - `per_gpu_eval_batch_size`: None
231
+ - `gradient_accumulation_steps`: 1
232
+ - `eval_accumulation_steps`: None
233
+ - `torch_empty_cache_steps`: None
234
+ - `learning_rate`: 5e-05
235
+ - `weight_decay`: 0.0
236
+ - `adam_beta1`: 0.9
237
+ - `adam_beta2`: 0.999
238
+ - `adam_epsilon`: 1e-08
239
+ - `max_grad_norm`: 1
240
+ - `num_train_epochs`: 1
241
+ - `max_steps`: -1
242
+ - `lr_scheduler_type`: linear
243
+ - `lr_scheduler_kwargs`: {}
244
+ - `warmup_ratio`: 0.0
245
+ - `warmup_steps`: 0
246
+ - `log_level`: passive
247
+ - `log_level_replica`: warning
248
+ - `log_on_each_node`: True
249
+ - `logging_nan_inf_filter`: True
250
+ - `save_safetensors`: True
251
+ - `save_on_each_node`: False
252
+ - `save_only_model`: False
253
+ - `restore_callback_states_from_checkpoint`: False
254
+ - `no_cuda`: False
255
+ - `use_cpu`: False
256
+ - `use_mps_device`: False
257
+ - `seed`: 42
258
+ - `data_seed`: None
259
+ - `jit_mode_eval`: False
260
+ - `bf16`: False
261
+ - `fp16`: True
262
+ - `fp16_opt_level`: O1
263
+ - `half_precision_backend`: auto
264
+ - `bf16_full_eval`: False
265
+ - `fp16_full_eval`: False
266
+ - `tf32`: None
267
+ - `local_rank`: 0
268
+ - `ddp_backend`: None
269
+ - `tpu_num_cores`: None
270
+ - `tpu_metrics_debug`: False
271
+ - `debug`: []
272
+ - `dataloader_drop_last`: False
273
+ - `dataloader_num_workers`: 0
274
+ - `dataloader_prefetch_factor`: None
275
+ - `past_index`: -1
276
+ - `disable_tqdm`: False
277
+ - `remove_unused_columns`: True
278
+ - `label_names`: None
279
+ - `load_best_model_at_end`: False
280
+ - `ignore_data_skip`: False
281
+ - `fsdp`: []
282
+ - `fsdp_min_num_params`: 0
283
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
284
+ - `fsdp_transformer_layer_cls_to_wrap`: None
285
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
286
+ - `parallelism_config`: None
287
+ - `deepspeed`: None
288
+ - `label_smoothing_factor`: 0.0
289
+ - `optim`: adamw_torch_fused
290
+ - `optim_args`: None
291
+ - `adafactor`: False
292
+ - `group_by_length`: False
293
+ - `length_column_name`: length
294
+ - `project`: huggingface
295
+ - `trackio_space_id`: trackio
296
+ - `ddp_find_unused_parameters`: None
297
+ - `ddp_bucket_cap_mb`: None
298
+ - `ddp_broadcast_buffers`: False
299
+ - `dataloader_pin_memory`: True
300
+ - `dataloader_persistent_workers`: False
301
+ - `skip_memory_metrics`: True
302
+ - `use_legacy_prediction_loop`: False
303
+ - `push_to_hub`: False
304
+ - `resume_from_checkpoint`: None
305
+ - `hub_model_id`: None
306
+ - `hub_strategy`: every_save
307
+ - `hub_private_repo`: None
308
+ - `hub_always_push`: False
309
+ - `hub_revision`: None
310
+ - `gradient_checkpointing`: False
311
+ - `gradient_checkpointing_kwargs`: None
312
+ - `include_inputs_for_metrics`: False
313
+ - `include_for_metrics`: []
314
+ - `eval_do_concat_batches`: True
315
+ - `fp16_backend`: auto
316
+ - `push_to_hub_model_id`: None
317
+ - `push_to_hub_organization`: None
318
+ - `mp_parameters`:
319
+ - `auto_find_batch_size`: False
320
+ - `full_determinism`: False
321
+ - `torchdynamo`: None
322
+ - `ray_scope`: last
323
+ - `ddp_timeout`: 1800
324
+ - `torch_compile`: False
325
+ - `torch_compile_backend`: None
326
+ - `torch_compile_mode`: None
327
+ - `include_tokens_per_second`: False
328
+ - `include_num_input_tokens_seen`: no
329
+ - `neftune_noise_alpha`: None
330
+ - `optim_target_modules`: None
331
+ - `batch_eval_metrics`: False
332
+ - `eval_on_start`: False
333
+ - `use_liger_kernel`: False
334
+ - `liger_kernel_config`: None
335
+ - `eval_use_gather_object`: False
336
+ - `average_tokens_across_devices`: True
337
+ - `prompts`: None
338
+ - `batch_sampler`: batch_sampler
339
+ - `multi_dataset_batch_sampler`: round_robin
340
+ - `router_mapping`: {}
341
+ - `learning_rate_mapping`: {}
342
+
343
+ </details>
344
+
345
+ ### Training Logs
346
+ <details><summary>Click to expand</summary>
347
+
348
+ | Epoch | Step | Training Loss |
349
+ |:------:|:------:|:-------------:|
350
+ | 0.0025 | 500 | 3.2897 |
351
+ | 0.0050 | 1000 | 0.1515 |
352
+ | 0.0075 | 1500 | 0.1374 |
353
+ | 0.0100 | 2000 | 0.1319 |
354
+ | 0.0125 | 2500 | 0.1322 |
355
+ | 0.0151 | 3000 | 0.1294 |
356
+ | 0.0176 | 3500 | 0.1254 |
357
+ | 0.0201 | 4000 | 0.1234 |
358
+ | 0.0226 | 4500 | 0.1201 |
359
+ | 0.0251 | 5000 | 0.1196 |
360
+ | 0.0276 | 5500 | 0.1215 |
361
+ | 0.0301 | 6000 | 0.1174 |
362
+ | 0.0326 | 6500 | 0.1184 |
363
+ | 0.0351 | 7000 | 0.1176 |
364
+ | 0.0376 | 7500 | 0.1152 |
365
+ | 0.0401 | 8000 | 0.1141 |
366
+ | 0.0427 | 8500 | 0.1137 |
367
+ | 0.0452 | 9000 | 0.1144 |
368
+ | 0.0477 | 9500 | 0.1132 |
369
+ | 0.0502 | 10000 | 0.1123 |
370
+ | 0.0527 | 10500 | 0.1117 |
371
+ | 0.0552 | 11000 | 0.1117 |
372
+ | 0.0577 | 11500 | 0.1102 |
373
+ | 0.0602 | 12000 | 0.109 |
374
+ | 0.0627 | 12500 | 0.1101 |
375
+ | 0.0652 | 13000 | 0.1079 |
376
+ | 0.0677 | 13500 | 0.1106 |
377
+ | 0.0703 | 14000 | 0.1097 |
378
+ | 0.0728 | 14500 | 0.1075 |
379
+ | 0.0753 | 15000 | 0.1046 |
380
+ | 0.0778 | 15500 | 0.1078 |
381
+ | 0.0803 | 16000 | 0.1061 |
382
+ | 0.0828 | 16500 | 0.1057 |
383
+ | 0.0853 | 17000 | 0.1054 |
384
+ | 0.0878 | 17500 | 0.1067 |
385
+ | 0.0903 | 18000 | 0.1048 |
386
+ | 0.0928 | 18500 | 0.1033 |
387
+ | 0.0953 | 19000 | 0.104 |
388
+ | 0.0979 | 19500 | 0.102 |
389
+ | 0.1004 | 20000 | 0.1023 |
390
+ | 0.1029 | 20500 | 0.101 |
391
+ | 0.1054 | 21000 | 0.1035 |
392
+ | 0.1079 | 21500 | 0.102 |
393
+ | 0.1104 | 22000 | 0.1018 |
394
+ | 0.1129 | 22500 | 0.1015 |
395
+ | 0.1154 | 23000 | 0.1003 |
396
+ | 0.1179 | 23500 | 0.1005 |
397
+ | 0.1204 | 24000 | 0.0998 |
398
+ | 0.1229 | 24500 | 0.099 |
399
+ | 0.1255 | 25000 | 0.1001 |
400
+ | 0.1280 | 25500 | 0.0979 |
401
+ | 0.1305 | 26000 | 0.1001 |
402
+ | 0.1330 | 26500 | 0.0995 |
403
+ | 0.1355 | 27000 | 0.0992 |
404
+ | 0.1380 | 27500 | 0.098 |
405
+ | 0.1405 | 28000 | 0.0986 |
406
+ | 0.1430 | 28500 | 0.0987 |
407
+ | 0.1455 | 29000 | 0.0972 |
408
+ | 0.1480 | 29500 | 0.0964 |
409
+ | 0.1505 | 30000 | 0.0967 |
410
+ | 0.1531 | 30500 | 0.0969 |
411
+ | 0.1556 | 31000 | 0.0954 |
412
+ | 0.1581 | 31500 | 0.0972 |
413
+ | 0.1606 | 32000 | 0.0973 |
414
+ | 0.1631 | 32500 | 0.096 |
415
+ | 0.1656 | 33000 | 0.0952 |
416
+ | 0.1681 | 33500 | 0.0974 |
417
+ | 0.1706 | 34000 | 0.0945 |
418
+ | 0.1731 | 34500 | 0.0936 |
419
+ | 0.1756 | 35000 | 0.0945 |
420
+ | 0.1782 | 35500 | 0.0946 |
421
+ | 0.1807 | 36000 | 0.0942 |
422
+ | 0.1832 | 36500 | 0.0955 |
423
+ | 0.1857 | 37000 | 0.0948 |
424
+ | 0.1882 | 37500 | 0.0925 |
425
+ | 0.1907 | 38000 | 0.0929 |
426
+ | 0.1932 | 38500 | 0.0934 |
427
+ | 0.1957 | 39000 | 0.0939 |
428
+ | 0.1982 | 39500 | 0.0933 |
429
+ | 0.2007 | 40000 | 0.0937 |
430
+ | 0.2032 | 40500 | 0.0916 |
431
+ | 0.2058 | 41000 | 0.0932 |
432
+ | 0.2083 | 41500 | 0.0921 |
433
+ | 0.2108 | 42000 | 0.0912 |
434
+ | 0.2133 | 42500 | 0.0906 |
435
+ | 0.2158 | 43000 | 0.0905 |
436
+ | 0.2183 | 43500 | 0.09 |
437
+ | 0.2208 | 44000 | 0.0906 |
438
+ | 0.2233 | 44500 | 0.092 |
439
+ | 0.2258 | 45000 | 0.0906 |
440
+ | 0.2283 | 45500 | 0.0908 |
441
+ | 0.2308 | 46000 | 0.0916 |
442
+ | 0.2334 | 46500 | 0.0907 |
443
+ | 0.2359 | 47000 | 0.0899 |
444
+ | 0.2384 | 47500 | 0.089 |
445
+ | 0.2409 | 48000 | 0.0909 |
446
+ | 0.2434 | 48500 | 0.0889 |
447
+ | 0.2459 | 49000 | 0.0896 |
448
+ | 0.2484 | 49500 | 0.088 |
449
+ | 0.2509 | 50000 | 0.09 |
450
+ | 0.2534 | 50500 | 0.0879 |
451
+ | 0.2559 | 51000 | 0.0885 |
452
+ | 0.2584 | 51500 | 0.0886 |
453
+ | 0.2610 | 52000 | 0.0896 |
454
+ | 0.2635 | 52500 | 0.0886 |
455
+ | 0.2660 | 53000 | 0.0876 |
456
+ | 0.2685 | 53500 | 0.0881 |
457
+ | 0.2710 | 54000 | 0.0886 |
458
+ | 0.2735 | 54500 | 0.0865 |
459
+ | 0.2760 | 55000 | 0.0874 |
460
+ | 0.2785 | 55500 | 0.0878 |
461
+ | 0.2810 | 56000 | 0.0874 |
462
+ | 0.2835 | 56500 | 0.0872 |
463
+ | 0.2860 | 57000 | 0.0866 |
464
+ | 0.2886 | 57500 | 0.0875 |
465
+ | 0.2911 | 58000 | 0.0876 |
466
+ | 0.2936 | 58500 | 0.0872 |
467
+ | 0.2961 | 59000 | 0.0857 |
468
+ | 0.2986 | 59500 | 0.0867 |
469
+ | 0.3011 | 60000 | 0.0862 |
470
+ | 0.3036 | 60500 | 0.0849 |
471
+ | 0.3061 | 61000 | 0.0863 |
472
+ | 0.3086 | 61500 | 0.0849 |
473
+ | 0.3111 | 62000 | 0.0857 |
474
+ | 0.3136 | 62500 | 0.084 |
475
+ | 0.3162 | 63000 | 0.0857 |
476
+ | 0.3187 | 63500 | 0.0853 |
477
+ | 0.3212 | 64000 | 0.0849 |
478
+ | 0.3237 | 64500 | 0.0842 |
479
+ | 0.3262 | 65000 | 0.0851 |
480
+ | 0.3287 | 65500 | 0.085 |
481
+ | 0.3312 | 66000 | 0.0837 |
482
+ | 0.3337 | 66500 | 0.0839 |
483
+ | 0.3362 | 67000 | 0.0836 |
484
+ | 0.3387 | 67500 | 0.0845 |
485
+ | 0.3412 | 68000 | 0.0844 |
486
+ | 0.3438 | 68500 | 0.0844 |
487
+ | 0.3463 | 69000 | 0.0839 |
488
+ | 0.3488 | 69500 | 0.084 |
489
+ | 0.3513 | 70000 | 0.083 |
490
+ | 0.3538 | 70500 | 0.0843 |
491
+ | 0.3563 | 71000 | 0.082 |
492
+ | 0.3588 | 71500 | 0.0834 |
493
+ | 0.3613 | 72000 | 0.0826 |
494
+ | 0.3638 | 72500 | 0.0833 |
495
+ | 0.3663 | 73000 | 0.0843 |
496
+ | 0.3688 | 73500 | 0.0821 |
497
+ | 0.3714 | 74000 | 0.0822 |
498
+ | 0.3739 | 74500 | 0.0823 |
499
+ | 0.3764 | 75000 | 0.0818 |
500
+ | 0.3789 | 75500 | 0.0836 |
501
+ | 0.3814 | 76000 | 0.0813 |
502
+ | 0.3839 | 76500 | 0.0829 |
503
+ | 0.3864 | 77000 | 0.0828 |
504
+ | 0.3889 | 77500 | 0.0799 |
505
+ | 0.3914 | 78000 | 0.0819 |
506
+ | 0.3939 | 78500 | 0.0815 |
507
+ | 0.3964 | 79000 | 0.0812 |
508
+ | 0.3990 | 79500 | 0.0803 |
509
+ | 0.4015 | 80000 | 0.0819 |
510
+ | 0.4040 | 80500 | 0.081 |
511
+ | 0.4065 | 81000 | 0.0798 |
512
+ | 0.4090 | 81500 | 0.0811 |
513
+ | 0.4115 | 82000 | 0.0806 |
514
+ | 0.4140 | 82500 | 0.0812 |
515
+ | 0.4165 | 83000 | 0.0801 |
516
+ | 0.4190 | 83500 | 0.0803 |
517
+ | 0.4215 | 84000 | 0.0812 |
518
+ | 0.4240 | 84500 | 0.0809 |
519
+ | 0.4266 | 85000 | 0.0802 |
520
+ | 0.4291 | 85500 | 0.0801 |
521
+ | 0.4316 | 86000 | 0.08 |
522
+ | 0.4341 | 86500 | 0.079 |
523
+ | 0.4366 | 87000 | 0.0803 |
524
+ | 0.4391 | 87500 | 0.08 |
525
+ | 0.4416 | 88000 | 0.0802 |
526
+ | 0.4441 | 88500 | 0.0799 |
527
+ | 0.4466 | 89000 | 0.0795 |
528
+ | 0.4491 | 89500 | 0.0787 |
529
+ | 0.4516 | 90000 | 0.0784 |
530
+ | 0.4542 | 90500 | 0.0781 |
531
+ | 0.4567 | 91000 | 0.0802 |
532
+ | 0.4592 | 91500 | 0.0781 |
533
+ | 0.4617 | 92000 | 0.0796 |
534
+ | 0.4642 | 92500 | 0.0774 |
535
+ | 0.4667 | 93000 | 0.0794 |
536
+ | 0.4692 | 93500 | 0.0786 |
537
+ | 0.4717 | 94000 | 0.079 |
538
+ | 0.4742 | 94500 | 0.0786 |
539
+ | 0.4767 | 95000 | 0.0778 |
540
+ | 0.4792 | 95500 | 0.0782 |
541
+ | 0.4818 | 96000 | 0.0777 |
542
+ | 0.4843 | 96500 | 0.0773 |
543
+ | 0.4868 | 97000 | 0.0762 |
544
+ | 0.4893 | 97500 | 0.0774 |
545
+ | 0.4918 | 98000 | 0.0796 |
546
+ | 0.4943 | 98500 | 0.0764 |
547
+ | 0.4968 | 99000 | 0.0781 |
548
+ | 0.4993 | 99500 | 0.0778 |
549
+ | 0.5018 | 100000 | 0.0774 |
550
+ | 0.5043 | 100500 | 0.0767 |
551
+ | 0.5069 | 101000 | 0.0769 |
552
+ | 0.5094 | 101500 | 0.0784 |
553
+ | 0.5119 | 102000 | 0.0769 |
554
+ | 0.5144 | 102500 | 0.0773 |
555
+ | 0.5169 | 103000 | 0.0776 |
556
+ | 0.5194 | 103500 | 0.0761 |
557
+ | 0.5219 | 104000 | 0.0768 |
558
+ | 0.5244 | 104500 | 0.0763 |
559
+ | 0.5269 | 105000 | 0.0772 |
560
+ | 0.5294 | 105500 | 0.076 |
561
+ | 0.5319 | 106000 | 0.0776 |
562
+ | 0.5345 | 106500 | 0.0768 |
563
+ | 0.5370 | 107000 | 0.0754 |
564
+ | 0.5395 | 107500 | 0.0759 |
565
+ | 0.5420 | 108000 | 0.0764 |
566
+ | 0.5445 | 108500 | 0.0764 |
567
+ | 0.5470 | 109000 | 0.0766 |
568
+ | 0.5495 | 109500 | 0.0762 |
569
+ | 0.5520 | 110000 | 0.0749 |
570
+ | 0.5545 | 110500 | 0.075 |
571
+ | 0.5570 | 111000 | 0.0754 |
572
+ | 0.5595 | 111500 | 0.0755 |
573
+ | 0.5621 | 112000 | 0.0753 |
574
+ | 0.5646 | 112500 | 0.0747 |
575
+ | 0.5671 | 113000 | 0.0754 |
576
+ | 0.5696 | 113500 | 0.0756 |
577
+ | 0.5721 | 114000 | 0.074 |
578
+ | 0.5746 | 114500 | 0.0759 |
579
+ | 0.5771 | 115000 | 0.0755 |
580
+ | 0.5796 | 115500 | 0.0757 |
581
+ | 0.5821 | 116000 | 0.0744 |
582
+ | 0.5846 | 116500 | 0.0732 |
583
+ | 0.5871 | 117000 | 0.0745 |
584
+ | 0.5897 | 117500 | 0.0748 |
585
+ | 0.5922 | 118000 | 0.0724 |
586
+ | 0.5947 | 118500 | 0.0739 |
587
+ | 0.5972 | 119000 | 0.0749 |
588
+ | 0.5997 | 119500 | 0.0755 |
589
+ | 0.6022 | 120000 | 0.0735 |
590
+ | 0.6047 | 120500 | 0.0742 |
591
+ | 0.6072 | 121000 | 0.0738 |
592
+ | 0.6097 | 121500 | 0.0733 |
593
+ | 0.6122 | 122000 | 0.0728 |
594
+ | 0.6147 | 122500 | 0.0745 |
595
+ | 0.6173 | 123000 | 0.0741 |
596
+ | 0.6198 | 123500 | 0.0726 |
597
+ | 0.6223 | 124000 | 0.0744 |
598
+ | 0.6248 | 124500 | 0.0743 |
599
+ | 0.6273 | 125000 | 0.0732 |
600
+ | 0.6298 | 125500 | 0.0731 |
601
+ | 0.6323 | 126000 | 0.0729 |
602
+ | 0.6348 | 126500 | 0.0737 |
603
+ | 0.6373 | 127000 | 0.0735 |
604
+ | 0.6398 | 127500 | 0.0738 |
605
+ | 0.6423 | 128000 | 0.0731 |
606
+ | 0.6449 | 128500 | 0.0736 |
607
+ | 0.6474 | 129000 | 0.0728 |
608
+ | 0.6499 | 129500 | 0.073 |
609
+ | 0.6524 | 130000 | 0.0733 |
610
+ | 0.6549 | 130500 | 0.073 |
611
+ | 0.6574 | 131000 | 0.073 |
612
+ | 0.6599 | 131500 | 0.0732 |
613
+ | 0.6624 | 132000 | 0.0723 |
614
+ | 0.6649 | 132500 | 0.0732 |
615
+ | 0.6674 | 133000 | 0.0724 |
616
+ | 0.6699 | 133500 | 0.0722 |
617
+ | 0.6725 | 134000 | 0.0724 |
618
+ | 0.6750 | 134500 | 0.0726 |
619
+ | 0.6775 | 135000 | 0.0728 |
620
+ | 0.6800 | 135500 | 0.0717 |
621
+ | 0.6825 | 136000 | 0.0722 |
622
+ | 0.6850 | 136500 | 0.0729 |
623
+ | 0.6875 | 137000 | 0.0715 |
624
+ | 0.6900 | 137500 | 0.072 |
625
+ | 0.6925 | 138000 | 0.072 |
626
+ | 0.6950 | 138500 | 0.0722 |
627
+ | 0.6975 | 139000 | 0.0718 |
628
+ | 0.7001 | 139500 | 0.0728 |
629
+ | 0.7026 | 140000 | 0.0718 |
630
+ | 0.7051 | 140500 | 0.0726 |
631
+ | 0.7076 | 141000 | 0.0707 |
632
+ | 0.7101 | 141500 | 0.072 |
633
+ | 0.7126 | 142000 | 0.0706 |
634
+ | 0.7151 | 142500 | 0.0706 |
635
+ | 0.7176 | 143000 | 0.0708 |
636
+ | 0.7201 | 143500 | 0.0717 |
637
+ | 0.7226 | 144000 | 0.0713 |
638
+ | 0.7251 | 144500 | 0.0723 |
639
+ | 0.7277 | 145000 | 0.0709 |
640
+ | 0.7302 | 145500 | 0.0709 |
641
+ | 0.7327 | 146000 | 0.0706 |
642
+ | 0.7352 | 146500 | 0.0713 |
643
+ | 0.7377 | 147000 | 0.0709 |
644
+ | 0.7402 | 147500 | 0.0703 |
645
+ | 0.7427 | 148000 | 0.0709 |
646
+ | 0.7452 | 148500 | 0.0702 |
647
+ | 0.7477 | 149000 | 0.0705 |
648
+ | 0.7502 | 149500 | 0.0707 |
649
+ | 0.7527 | 150000 | 0.0702 |
650
+ | 0.7553 | 150500 | 0.0696 |
651
+ | 0.7578 | 151000 | 0.0701 |
652
+ | 0.7603 | 151500 | 0.0707 |
653
+ | 0.7628 | 152000 | 0.0703 |
654
+ | 0.7653 | 152500 | 0.0703 |
655
+ | 0.7678 | 153000 | 0.0711 |
656
+ | 0.7703 | 153500 | 0.0706 |
657
+ | 0.7728 | 154000 | 0.0701 |
658
+ | 0.7753 | 154500 | 0.0699 |
659
+ | 0.7778 | 155000 | 0.0704 |
660
+ | 0.7803 | 155500 | 0.07 |
661
+ | 0.7829 | 156000 | 0.0701 |
662
+ | 0.7854 | 156500 | 0.0697 |
663
+ | 0.7879 | 157000 | 0.0698 |
664
+ | 0.7904 | 157500 | 0.0699 |
665
+ | 0.7929 | 158000 | 0.069 |
666
+ | 0.7954 | 158500 | 0.0703 |
667
+ | 0.7979 | 159000 | 0.0696 |
668
+ | 0.8004 | 159500 | 0.0701 |
669
+ | 0.8029 | 160000 | 0.069 |
670
+ | 0.8054 | 160500 | 0.0687 |
671
+ | 0.8079 | 161000 | 0.069 |
672
+ | 0.8105 | 161500 | 0.0692 |
673
+ | 0.8130 | 162000 | 0.069 |
674
+ | 0.8155 | 162500 | 0.0688 |
675
+ | 0.8180 | 163000 | 0.0681 |
676
+ | 0.8205 | 163500 | 0.0688 |
677
+ | 0.8230 | 164000 | 0.0699 |
678
+ | 0.8255 | 164500 | 0.0677 |
679
+ | 0.8280 | 165000 | 0.0687 |
680
+ | 0.8305 | 165500 | 0.0696 |
681
+ | 0.8330 | 166000 | 0.0686 |
682
+ | 0.8355 | 166500 | 0.069 |
683
+ | 0.8381 | 167000 | 0.0692 |
684
+ | 0.8406 | 167500 | 0.0698 |
685
+ | 0.8431 | 168000 | 0.0684 |
686
+ | 0.8456 | 168500 | 0.0681 |
687
+ | 0.8481 | 169000 | 0.0683 |
688
+ | 0.8506 | 169500 | 0.0701 |
689
+ | 0.8531 | 170000 | 0.0697 |
690
+ | 0.8556 | 170500 | 0.0688 |
691
+ | 0.8581 | 171000 | 0.0689 |
692
+ | 0.8606 | 171500 | 0.0689 |
693
+ | 0.8632 | 172000 | 0.0687 |
694
+ | 0.8657 | 172500 | 0.0693 |
695
+ | 0.8682 | 173000 | 0.0678 |
696
+ | 0.8707 | 173500 | 0.0688 |
697
+ | 0.8732 | 174000 | 0.0686 |
698
+ | 0.8757 | 174500 | 0.0695 |
699
+ | 0.8782 | 175000 | 0.0679 |
700
+ | 0.8807 | 175500 | 0.0686 |
701
+ | 0.8832 | 176000 | 0.0683 |
702
+ | 0.8857 | 176500 | 0.068 |
703
+ | 0.8882 | 177000 | 0.0688 |
704
+ | 0.8908 | 177500 | 0.0696 |
705
+ | 0.8933 | 178000 | 0.0682 |
706
+ | 0.8958 | 178500 | 0.0686 |
707
+ | 0.8983 | 179000 | 0.0679 |
708
+ | 0.9008 | 179500 | 0.0687 |
709
+ | 0.9033 | 180000 | 0.0677 |
710
+ | 0.9058 | 180500 | 0.0693 |
711
+ | 0.9083 | 181000 | 0.0685 |
712
+ | 0.9108 | 181500 | 0.0682 |
713
+ | 0.9133 | 182000 | 0.0689 |
714
+ | 0.9158 | 182500 | 0.0682 |
715
+ | 0.9184 | 183000 | 0.0679 |
716
+ | 0.9209 | 183500 | 0.0682 |
717
+ | 0.9234 | 184000 | 0.0678 |
718
+ | 0.9259 | 184500 | 0.0685 |
719
+ | 0.9284 | 185000 | 0.0673 |
720
+ | 0.9309 | 185500 | 0.0676 |
721
+ | 0.9334 | 186000 | 0.068 |
722
+ | 0.9359 | 186500 | 0.0678 |
723
+ | 0.9384 | 187000 | 0.0679 |
724
+ | 0.9409 | 187500 | 0.0674 |
725
+ | 0.9434 | 188000 | 0.068 |
726
+ | 0.9460 | 188500 | 0.0679 |
727
+ | 0.9485 | 189000 | 0.0673 |
728
+ | 0.9510 | 189500 | 0.0663 |
729
+ | 0.9535 | 190000 | 0.068 |
730
+ | 0.9560 | 190500 | 0.0672 |
731
+ | 0.9585 | 191000 | 0.0668 |
732
+ | 0.9610 | 191500 | 0.0665 |
733
+ | 0.9635 | 192000 | 0.0679 |
734
+ | 0.9660 | 192500 | 0.0678 |
735
+ | 0.9685 | 193000 | 0.0667 |
736
+ | 0.9710 | 193500 | 0.068 |
737
+ | 0.9736 | 194000 | 0.0669 |
738
+ | 0.9761 | 194500 | 0.0686 |
739
+ | 0.9786 | 195000 | 0.0682 |
740
+ | 0.9811 | 195500 | 0.0673 |
741
+ | 0.9836 | 196000 | 0.0682 |
742
+ | 0.9861 | 196500 | 0.0675 |
743
+ | 0.9886 | 197000 | 0.0669 |
744
+ | 0.9911 | 197500 | 0.0669 |
745
+ | 0.9936 | 198000 | 0.0686 |
746
+ | 0.9961 | 198500 | 0.068 |
747
+ | 0.9986 | 199000 | 0.0667 |
748
+
749
+ </details>
750
+
751
+ ### Framework Versions
752
+ - Python: 3.10.4
753
+ - Sentence Transformers: 5.2.0
754
+ - Transformers: 4.57.3
755
+ - PyTorch: 2.9.1+cu128
756
+ - Accelerate: 1.12.0
757
+ - Datasets: 2.21.0
758
+ - Tokenizers: 0.22.1
759
+
760
+ ## Citation
761
+
762
+ ### BibTeX
763
+
764
+ #### Sentence Transformers
765
+ ```bibtex
766
+ @inproceedings{reimers-2019-sentence-bert,
767
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
768
+ author = "Reimers, Nils and Gurevych, Iryna",
769
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
770
+ month = "11",
771
+ year = "2019",
772
+ publisher = "Association for Computational Linguistics",
773
+ url = "https://arxiv.org/abs/1908.10084",
774
+ }
775
+ ```
776
+
777
+ #### MarginMSELoss
778
+ ```bibtex
779
+ @misc{hofstätter2021improving,
780
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
781
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
782
+ year={2021},
783
+ eprint={2010.02666},
784
+ archivePrefix={arXiv},
785
+ primaryClass={cs.IR}
786
+ }
787
+ ```
788
+
789
+ <!--
790
+ ## Glossary
791
+
792
+ *Clearly define terms in order to be accessible across audiences.*
793
+ -->
794
+
795
+ <!--
796
+ ## Model Card Authors
797
+
798
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
799
+ -->
800
+
801
+ <!--
802
+ ## Model Card Contact
803
+
804
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
805
+ -->
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "XLMRobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "transformers_version": "4.57.3",
24
+ "type_vocab_size": 1,
25
+ "use_cache": true,
26
+ "vocab_size": 250002
27
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.2.0",
4
+ "transformers": "4.57.3",
5
+ "pytorch": "2.9.1+cu128"
6
+ },
7
+ "prompts": {
8
+ "query": "",
9
+ "document": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine",
13
+ "model_type": "SentenceTransformer"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1ff3b11660b1042786c3e11b41af42f014efd5c98607962e064fb2e4dae9172
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizerFast",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }