cyberbabooshka commited on
Commit
192b5c5
·
verified ·
1 Parent(s): 11d81c9

End of training

Browse files
Files changed (1) hide show
  1. README.md +29 -95
README.md CHANGED
@@ -35,20 +35,20 @@ dataset_prepared_path: last_run_prepared
35
  chat_template: jinja
36
  chat_template_jinja: >-
37
  {%- for message in messages %}
38
- {{- '<|im_start|>' + message.role + '\n' + message.content.lstrip('\n') + '<|im_end|>' + '\n' }}
39
  {%- endfor %}
40
- {%- if add_generation_prompt %}
41
- {{- '<|im_start|>assistant\n' }}
42
- {%- endif %}
43
 
44
  datasets:
45
  - path: cyberbabooshka/MNLP_M2_mcqa_dataset
46
  name: cooldown
47
  split: train
48
  type: chat_template
 
49
  field_messages: messages
50
- train_on_eos: turn
51
- train_on_eot: turn
52
  message_property_mappings:
53
  role: role
54
  content: content
@@ -63,9 +63,10 @@ test_datasets:
63
  name: mcqa
64
  split: test
65
  type: chat_template
 
66
  field_messages: messages
67
- train_on_eos: turn
68
- train_on_eot: turn
69
  message_property_mappings:
70
  role: role
71
  content: content
@@ -138,7 +139,7 @@ plugins:
138
 
139
  This model is a fine-tuned version of [cyberbabooshka/base_noreasoning](https://huggingface.co/cyberbabooshka/base_noreasoning) on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset.
140
  It achieves the following results on the evaluation set:
141
- - Loss: 0.6948
142
 
143
  ## Model description
144
 
@@ -169,97 +170,30 @@ The following hyperparameters were used during training:
169
  No additional optimizer arguments
170
  - lr_scheduler_type: cosine
171
  - lr_scheduler_warmup_steps: 100
172
- - training_steps: 8427
173
 
174
  ### Training results
175
 
176
  | Training Loss | Epoch | Step | Validation Loss |
177
  |:-------------:|:------:|:----:|:---------------:|
178
- | No log | 0.0001 | 1 | 1.6287 |
179
- | 0.8833 | 0.0119 | 100 | 1.0481 |
180
- | 0.9171 | 0.0237 | 200 | 0.8743 |
181
- | 0.8829 | 0.0356 | 300 | 0.8209 |
182
- | 0.8754 | 0.0475 | 400 | 0.7994 |
183
- | 0.9141 | 0.0593 | 500 | 0.7838 |
184
- | 0.846 | 0.0712 | 600 | 0.7706 |
185
- | 0.8654 | 0.0831 | 700 | 0.7624 |
186
- | 0.8746 | 0.0949 | 800 | 0.7596 |
187
- | 0.8095 | 0.1068 | 900 | 0.7537 |
188
- | 0.8326 | 0.1187 | 1000 | 0.7486 |
189
- | 0.7802 | 0.1305 | 1100 | 0.7441 |
190
- | 0.7945 | 0.1424 | 1200 | 0.7413 |
191
- | 0.8447 | 0.1543 | 1300 | 0.7383 |
192
- | 0.8657 | 0.1661 | 1400 | 0.7370 |
193
- | 0.8529 | 0.1780 | 1500 | 0.7328 |
194
- | 0.8627 | 0.1899 | 1600 | 0.7317 |
195
- | 0.9215 | 0.2017 | 1700 | 0.7311 |
196
- | 0.8398 | 0.2136 | 1800 | 0.7288 |
197
- | 0.7294 | 0.2255 | 1900 | 0.7262 |
198
- | 0.8668 | 0.2373 | 2000 | 0.7248 |
199
- | 0.8546 | 0.2492 | 2100 | 0.7237 |
200
- | 0.8681 | 0.2611 | 2200 | 0.7245 |
201
- | 0.8425 | 0.2729 | 2300 | 0.7212 |
202
- | 0.8625 | 0.2848 | 2400 | 0.7192 |
203
- | 0.8138 | 0.2967 | 2500 | 0.7197 |
204
- | 0.8737 | 0.3085 | 2600 | 0.7173 |
205
- | 0.8683 | 0.3204 | 2700 | 0.7158 |
206
- | 0.8583 | 0.3323 | 2800 | 0.7169 |
207
- | 0.9352 | 0.3441 | 2900 | 0.7148 |
208
- | 0.8032 | 0.3560 | 3000 | 0.7139 |
209
- | 0.7456 | 0.3679 | 3100 | 0.7138 |
210
- | 0.8462 | 0.3797 | 3200 | 0.7166 |
211
- | 0.8454 | 0.3916 | 3300 | 0.7120 |
212
- | 0.8931 | 0.4035 | 3400 | 0.7112 |
213
- | 0.841 | 0.4153 | 3500 | 0.7105 |
214
- | 0.8027 | 0.4272 | 3600 | 0.7108 |
215
- | 0.8887 | 0.4391 | 3700 | 0.7094 |
216
- | 0.8851 | 0.4509 | 3800 | 0.7084 |
217
- | 0.8999 | 0.4628 | 3900 | 0.7098 |
218
- | 0.9079 | 0.4747 | 4000 | 0.7079 |
219
- | 0.9058 | 0.4865 | 4100 | 0.7065 |
220
- | 0.8883 | 0.4984 | 4200 | 0.7064 |
221
- | 0.8636 | 0.5103 | 4300 | 0.7057 |
222
- | 0.8513 | 0.5221 | 4400 | 0.7054 |
223
- | 0.8058 | 0.5340 | 4500 | 0.7058 |
224
- | 0.8477 | 0.5459 | 4600 | 0.7049 |
225
- | 0.8483 | 0.5577 | 4700 | 0.7045 |
226
- | 0.8862 | 0.5696 | 4800 | 0.7034 |
227
- | 0.737 | 0.5815 | 4900 | 0.7042 |
228
- | 0.8815 | 0.5933 | 5000 | 0.7025 |
229
- | 0.8401 | 0.6052 | 5100 | 0.7020 |
230
- | 0.8101 | 0.6171 | 5200 | 0.7020 |
231
- | 0.8407 | 0.6289 | 5300 | 0.7011 |
232
- | 0.8468 | 0.6408 | 5400 | 0.7004 |
233
- | 0.9202 | 0.6527 | 5500 | 0.7006 |
234
- | 0.8086 | 0.6645 | 5600 | 0.7009 |
235
- | 0.7938 | 0.6764 | 5700 | 0.6993 |
236
- | 0.8017 | 0.6883 | 5800 | 0.6999 |
237
- | 0.8412 | 0.7001 | 5900 | 0.6988 |
238
- | 1.0098 | 0.7120 | 6000 | 0.6986 |
239
- | 0.9157 | 0.7239 | 6100 | 0.6980 |
240
- | 0.8587 | 0.7357 | 6200 | 0.6978 |
241
- | 0.8509 | 0.7476 | 6300 | 0.6980 |
242
- | 0.8622 | 0.7595 | 6400 | 0.6969 |
243
- | 0.8177 | 0.7713 | 6500 | 0.6967 |
244
- | 0.78 | 0.7832 | 6600 | 0.6971 |
245
- | 0.9008 | 0.7951 | 6700 | 0.6967 |
246
- | 0.8658 | 0.8069 | 6800 | 0.6957 |
247
- | 0.8972 | 0.8188 | 6900 | 0.6960 |
248
- | 0.9381 | 0.8307 | 7000 | 0.6955 |
249
- | 0.8473 | 0.8425 | 7100 | 0.6954 |
250
- | 0.8018 | 0.8544 | 7200 | 0.6951 |
251
- | 0.8809 | 0.8663 | 7300 | 0.6953 |
252
- | 0.8334 | 0.8781 | 7400 | 0.6951 |
253
- | 0.8557 | 0.8900 | 7500 | 0.6951 |
254
- | 0.8457 | 0.9019 | 7600 | 0.6949 |
255
- | 0.8905 | 0.9137 | 7700 | 0.6949 |
256
- | 0.7979 | 0.9256 | 7800 | 0.6951 |
257
- | 0.8879 | 0.9375 | 7900 | 0.6950 |
258
- | 0.8433 | 0.9493 | 8000 | 0.6947 |
259
- | 0.8765 | 0.9612 | 8100 | 0.6949 |
260
- | 0.8428 | 0.9731 | 8200 | 0.6950 |
261
- | 0.813 | 0.9849 | 8300 | 0.6948 |
262
- | 0.8144 | 0.9968 | 8400 | 0.6948 |
263
 
264
 
265
  ### Framework versions
 
35
  chat_template: jinja
36
  chat_template_jinja: >-
37
  {%- for message in messages %}
38
+ {{- message.content.strip('\n') + '\n' }}
39
  {%- endfor %}
40
+ {{- '<|im_end|>' }}
41
+
 
42
 
43
  datasets:
44
  - path: cyberbabooshka/MNLP_M2_mcqa_dataset
45
  name: cooldown
46
  split: train
47
  type: chat_template
48
+ chat_template: tokenizer_default
49
  field_messages: messages
50
+ train_on_eos: all
51
+ train_on_eot: all
52
  message_property_mappings:
53
  role: role
54
  content: content
 
63
  name: mcqa
64
  split: test
65
  type: chat_template
66
+ chat_template: tokenizer_default
67
  field_messages: messages
68
+ train_on_eos: all
69
+ train_on_eot: all
70
  message_property_mappings:
71
  role: role
72
  content: content
 
139
 
140
  This model is a fine-tuned version of [cyberbabooshka/base_noreasoning](https://huggingface.co/cyberbabooshka/base_noreasoning) on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset.
141
  It achieves the following results on the evaluation set:
142
+ - Loss: 0.6772
143
 
144
  ## Model description
145
 
 
170
  No additional optimizer arguments
171
  - lr_scheduler_type: cosine
172
  - lr_scheduler_warmup_steps: 100
173
+ - training_steps: 8438
174
 
175
  ### Training results
176
 
177
  | Training Loss | Epoch | Step | Validation Loss |
178
  |:-------------:|:------:|:----:|:---------------:|
179
+ | No log | 0.0001 | 1 | 2.2371 |
180
+ | 0.8956 | 0.0593 | 500 | 0.7674 |
181
+ | 0.9093 | 0.1185 | 1000 | 0.7335 |
182
+ | 0.8544 | 0.1778 | 1500 | 0.7159 |
183
+ | 0.8503 | 0.2370 | 2000 | 0.7074 |
184
+ | 0.8781 | 0.2963 | 2500 | 0.7016 |
185
+ | 0.8171 | 0.3555 | 3000 | 0.6968 |
186
+ | 0.9179 | 0.4148 | 3500 | 0.6930 |
187
+ | 0.845 | 0.4740 | 4000 | 0.6895 |
188
+ | 0.8885 | 0.5333 | 4500 | 0.6865 |
189
+ | 0.9432 | 0.5926 | 5000 | 0.6844 |
190
+ | 0.7451 | 0.6518 | 5500 | 0.6825 |
191
+ | 0.8675 | 0.7111 | 6000 | 0.6811 |
192
+ | 0.8606 | 0.7703 | 6500 | 0.6793 |
193
+ | 0.8602 | 0.8000 | 6750 | 0.6793 |
194
+ | 0.8458 | 0.8296 | 7000 | 0.6778 |
195
+ | 0.9051 | 0.8888 | 7500 | 0.6772 |
196
+ | 0.8589 | 0.9481 | 8000 | 0.6772 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
 
198
 
199
  ### Framework versions