End of training
Browse files
README.md
CHANGED
|
@@ -35,20 +35,20 @@ dataset_prepared_path: last_run_prepared
|
|
| 35 |
chat_template: jinja
|
| 36 |
chat_template_jinja: >-
|
| 37 |
{%- for message in messages %}
|
| 38 |
-
{{-
|
| 39 |
{%- endfor %}
|
| 40 |
-
{
|
| 41 |
-
|
| 42 |
-
{%- endif %}
|
| 43 |
|
| 44 |
datasets:
|
| 45 |
- path: cyberbabooshka/MNLP_M2_mcqa_dataset
|
| 46 |
name: cooldown
|
| 47 |
split: train
|
| 48 |
type: chat_template
|
|
|
|
| 49 |
field_messages: messages
|
| 50 |
-
train_on_eos:
|
| 51 |
-
train_on_eot:
|
| 52 |
message_property_mappings:
|
| 53 |
role: role
|
| 54 |
content: content
|
|
@@ -63,9 +63,10 @@ test_datasets:
|
|
| 63 |
name: mcqa
|
| 64 |
split: test
|
| 65 |
type: chat_template
|
|
|
|
| 66 |
field_messages: messages
|
| 67 |
-
train_on_eos:
|
| 68 |
-
train_on_eot:
|
| 69 |
message_property_mappings:
|
| 70 |
role: role
|
| 71 |
content: content
|
|
@@ -138,7 +139,7 @@ plugins:
|
|
| 138 |
|
| 139 |
This model is a fine-tuned version of [cyberbabooshka/base_noreasoning](https://huggingface.co/cyberbabooshka/base_noreasoning) on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset.
|
| 140 |
It achieves the following results on the evaluation set:
|
| 141 |
-
- Loss: 0.
|
| 142 |
|
| 143 |
## Model description
|
| 144 |
|
|
@@ -169,97 +170,30 @@ The following hyperparameters were used during training:
|
|
| 169 |
No additional optimizer arguments
|
| 170 |
- lr_scheduler_type: cosine
|
| 171 |
- lr_scheduler_warmup_steps: 100
|
| 172 |
-
- training_steps:
|
| 173 |
|
| 174 |
### Training results
|
| 175 |
|
| 176 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 177 |
|:-------------:|:------:|:----:|:---------------:|
|
| 178 |
-
| No log | 0.0001 | 1 |
|
| 179 |
-
| 0.
|
| 180 |
-
| 0.
|
| 181 |
-
| 0.
|
| 182 |
-
| 0.
|
| 183 |
-
| 0.
|
| 184 |
-
| 0.
|
| 185 |
-
| 0.
|
| 186 |
-
| 0.
|
| 187 |
-
| 0.
|
| 188 |
-
| 0.
|
| 189 |
-
| 0.
|
| 190 |
-
| 0.
|
| 191 |
-
| 0.
|
| 192 |
-
| 0.
|
| 193 |
-
| 0.
|
| 194 |
-
| 0.
|
| 195 |
-
| 0.
|
| 196 |
-
| 0.8398 | 0.2136 | 1800 | 0.7288 |
|
| 197 |
-
| 0.7294 | 0.2255 | 1900 | 0.7262 |
|
| 198 |
-
| 0.8668 | 0.2373 | 2000 | 0.7248 |
|
| 199 |
-
| 0.8546 | 0.2492 | 2100 | 0.7237 |
|
| 200 |
-
| 0.8681 | 0.2611 | 2200 | 0.7245 |
|
| 201 |
-
| 0.8425 | 0.2729 | 2300 | 0.7212 |
|
| 202 |
-
| 0.8625 | 0.2848 | 2400 | 0.7192 |
|
| 203 |
-
| 0.8138 | 0.2967 | 2500 | 0.7197 |
|
| 204 |
-
| 0.8737 | 0.3085 | 2600 | 0.7173 |
|
| 205 |
-
| 0.8683 | 0.3204 | 2700 | 0.7158 |
|
| 206 |
-
| 0.8583 | 0.3323 | 2800 | 0.7169 |
|
| 207 |
-
| 0.9352 | 0.3441 | 2900 | 0.7148 |
|
| 208 |
-
| 0.8032 | 0.3560 | 3000 | 0.7139 |
|
| 209 |
-
| 0.7456 | 0.3679 | 3100 | 0.7138 |
|
| 210 |
-
| 0.8462 | 0.3797 | 3200 | 0.7166 |
|
| 211 |
-
| 0.8454 | 0.3916 | 3300 | 0.7120 |
|
| 212 |
-
| 0.8931 | 0.4035 | 3400 | 0.7112 |
|
| 213 |
-
| 0.841 | 0.4153 | 3500 | 0.7105 |
|
| 214 |
-
| 0.8027 | 0.4272 | 3600 | 0.7108 |
|
| 215 |
-
| 0.8887 | 0.4391 | 3700 | 0.7094 |
|
| 216 |
-
| 0.8851 | 0.4509 | 3800 | 0.7084 |
|
| 217 |
-
| 0.8999 | 0.4628 | 3900 | 0.7098 |
|
| 218 |
-
| 0.9079 | 0.4747 | 4000 | 0.7079 |
|
| 219 |
-
| 0.9058 | 0.4865 | 4100 | 0.7065 |
|
| 220 |
-
| 0.8883 | 0.4984 | 4200 | 0.7064 |
|
| 221 |
-
| 0.8636 | 0.5103 | 4300 | 0.7057 |
|
| 222 |
-
| 0.8513 | 0.5221 | 4400 | 0.7054 |
|
| 223 |
-
| 0.8058 | 0.5340 | 4500 | 0.7058 |
|
| 224 |
-
| 0.8477 | 0.5459 | 4600 | 0.7049 |
|
| 225 |
-
| 0.8483 | 0.5577 | 4700 | 0.7045 |
|
| 226 |
-
| 0.8862 | 0.5696 | 4800 | 0.7034 |
|
| 227 |
-
| 0.737 | 0.5815 | 4900 | 0.7042 |
|
| 228 |
-
| 0.8815 | 0.5933 | 5000 | 0.7025 |
|
| 229 |
-
| 0.8401 | 0.6052 | 5100 | 0.7020 |
|
| 230 |
-
| 0.8101 | 0.6171 | 5200 | 0.7020 |
|
| 231 |
-
| 0.8407 | 0.6289 | 5300 | 0.7011 |
|
| 232 |
-
| 0.8468 | 0.6408 | 5400 | 0.7004 |
|
| 233 |
-
| 0.9202 | 0.6527 | 5500 | 0.7006 |
|
| 234 |
-
| 0.8086 | 0.6645 | 5600 | 0.7009 |
|
| 235 |
-
| 0.7938 | 0.6764 | 5700 | 0.6993 |
|
| 236 |
-
| 0.8017 | 0.6883 | 5800 | 0.6999 |
|
| 237 |
-
| 0.8412 | 0.7001 | 5900 | 0.6988 |
|
| 238 |
-
| 1.0098 | 0.7120 | 6000 | 0.6986 |
|
| 239 |
-
| 0.9157 | 0.7239 | 6100 | 0.6980 |
|
| 240 |
-
| 0.8587 | 0.7357 | 6200 | 0.6978 |
|
| 241 |
-
| 0.8509 | 0.7476 | 6300 | 0.6980 |
|
| 242 |
-
| 0.8622 | 0.7595 | 6400 | 0.6969 |
|
| 243 |
-
| 0.8177 | 0.7713 | 6500 | 0.6967 |
|
| 244 |
-
| 0.78 | 0.7832 | 6600 | 0.6971 |
|
| 245 |
-
| 0.9008 | 0.7951 | 6700 | 0.6967 |
|
| 246 |
-
| 0.8658 | 0.8069 | 6800 | 0.6957 |
|
| 247 |
-
| 0.8972 | 0.8188 | 6900 | 0.6960 |
|
| 248 |
-
| 0.9381 | 0.8307 | 7000 | 0.6955 |
|
| 249 |
-
| 0.8473 | 0.8425 | 7100 | 0.6954 |
|
| 250 |
-
| 0.8018 | 0.8544 | 7200 | 0.6951 |
|
| 251 |
-
| 0.8809 | 0.8663 | 7300 | 0.6953 |
|
| 252 |
-
| 0.8334 | 0.8781 | 7400 | 0.6951 |
|
| 253 |
-
| 0.8557 | 0.8900 | 7500 | 0.6951 |
|
| 254 |
-
| 0.8457 | 0.9019 | 7600 | 0.6949 |
|
| 255 |
-
| 0.8905 | 0.9137 | 7700 | 0.6949 |
|
| 256 |
-
| 0.7979 | 0.9256 | 7800 | 0.6951 |
|
| 257 |
-
| 0.8879 | 0.9375 | 7900 | 0.6950 |
|
| 258 |
-
| 0.8433 | 0.9493 | 8000 | 0.6947 |
|
| 259 |
-
| 0.8765 | 0.9612 | 8100 | 0.6949 |
|
| 260 |
-
| 0.8428 | 0.9731 | 8200 | 0.6950 |
|
| 261 |
-
| 0.813 | 0.9849 | 8300 | 0.6948 |
|
| 262 |
-
| 0.8144 | 0.9968 | 8400 | 0.6948 |
|
| 263 |
|
| 264 |
|
| 265 |
### Framework versions
|
|
|
|
| 35 |
chat_template: jinja
|
| 36 |
chat_template_jinja: >-
|
| 37 |
{%- for message in messages %}
|
| 38 |
+
{{- message.content.strip('\n') + '\n' }}
|
| 39 |
{%- endfor %}
|
| 40 |
+
{{- '<|im_end|>' }}
|
| 41 |
+
|
|
|
|
| 42 |
|
| 43 |
datasets:
|
| 44 |
- path: cyberbabooshka/MNLP_M2_mcqa_dataset
|
| 45 |
name: cooldown
|
| 46 |
split: train
|
| 47 |
type: chat_template
|
| 48 |
+
chat_template: tokenizer_default
|
| 49 |
field_messages: messages
|
| 50 |
+
train_on_eos: all
|
| 51 |
+
train_on_eot: all
|
| 52 |
message_property_mappings:
|
| 53 |
role: role
|
| 54 |
content: content
|
|
|
|
| 63 |
name: mcqa
|
| 64 |
split: test
|
| 65 |
type: chat_template
|
| 66 |
+
chat_template: tokenizer_default
|
| 67 |
field_messages: messages
|
| 68 |
+
train_on_eos: all
|
| 69 |
+
train_on_eot: all
|
| 70 |
message_property_mappings:
|
| 71 |
role: role
|
| 72 |
content: content
|
|
|
|
| 139 |
|
| 140 |
This model is a fine-tuned version of [cyberbabooshka/base_noreasoning](https://huggingface.co/cyberbabooshka/base_noreasoning) on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset.
|
| 141 |
It achieves the following results on the evaluation set:
|
| 142 |
+
- Loss: 0.6772
|
| 143 |
|
| 144 |
## Model description
|
| 145 |
|
|
|
|
| 170 |
No additional optimizer arguments
|
| 171 |
- lr_scheduler_type: cosine
|
| 172 |
- lr_scheduler_warmup_steps: 100
|
| 173 |
+
- training_steps: 8438
|
| 174 |
|
| 175 |
### Training results
|
| 176 |
|
| 177 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 178 |
|:-------------:|:------:|:----:|:---------------:|
|
| 179 |
+
| No log | 0.0001 | 1 | 2.2371 |
|
| 180 |
+
| 0.8956 | 0.0593 | 500 | 0.7674 |
|
| 181 |
+
| 0.9093 | 0.1185 | 1000 | 0.7335 |
|
| 182 |
+
| 0.8544 | 0.1778 | 1500 | 0.7159 |
|
| 183 |
+
| 0.8503 | 0.2370 | 2000 | 0.7074 |
|
| 184 |
+
| 0.8781 | 0.2963 | 2500 | 0.7016 |
|
| 185 |
+
| 0.8171 | 0.3555 | 3000 | 0.6968 |
|
| 186 |
+
| 0.9179 | 0.4148 | 3500 | 0.6930 |
|
| 187 |
+
| 0.845 | 0.4740 | 4000 | 0.6895 |
|
| 188 |
+
| 0.8885 | 0.5333 | 4500 | 0.6865 |
|
| 189 |
+
| 0.9432 | 0.5926 | 5000 | 0.6844 |
|
| 190 |
+
| 0.7451 | 0.6518 | 5500 | 0.6825 |
|
| 191 |
+
| 0.8675 | 0.7111 | 6000 | 0.6811 |
|
| 192 |
+
| 0.8606 | 0.7703 | 6500 | 0.6793 |
|
| 193 |
+
| 0.8602 | 0.8000 | 6750 | 0.6793 |
|
| 194 |
+
| 0.8458 | 0.8296 | 7000 | 0.6778 |
|
| 195 |
+
| 0.9051 | 0.8888 | 7500 | 0.6772 |
|
| 196 |
+
| 0.8589 | 0.9481 | 8000 | 0.6772 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
|
| 199 |
### Framework versions
|