6609d0bf53122d36d71855333b8a0b23
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 3.9567
- Data Size: 1.0
- Epoch Runtime: 192.4977
- Mse: 0.9895
- Mae: 0.8095
- R2: 0.5573
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 115.6284 | 0 | 5.8382 | 28.9074 | 4.8139 | -11.9313 |
| No log | 1 | 179 | 1602.5880 | 0.0078 | 6.7554 | 400.6450 | 17.8926 | -178.2229 |
| No log | 2 | 358 | 8.9211 | 0.0156 | 12.3620 | 2.2311 | 1.2819 | 0.0020 |
| No log | 3 | 537 | 9.4214 | 0.0312 | 28.3318 | 2.3562 | 1.3232 | -0.0540 |
| No log | 4 | 716 | 10.6515 | 0.0625 | 40.3602 | 2.6636 | 1.3415 | -0.1915 |
| No log | 5 | 895 | 9.8013 | 0.125 | 59.3143 | 2.4510 | 1.3016 | -0.0964 |
| 16.7373 | 6 | 1074 | 5.6460 | 0.25 | 93.2231 | 1.4121 | 0.9905 | 0.3683 |
| 5.7196 | 7 | 1253 | 4.3240 | 0.5 | 140.8782 | 1.0814 | 0.8347 | 0.5163 |
| 3.9842 | 8.0 | 1432 | 4.6427 | 1.0 | 226.9675 | 1.1611 | 0.8728 | 0.4806 |
| 2.7011 | 9.0 | 1611 | 4.6092 | 1.0 | 194.4204 | 1.1528 | 0.8548 | 0.4843 |
| 2.2374 | 10.0 | 1790 | 3.4777 | 1.0 | 197.2961 | 0.8699 | 0.7507 | 0.6109 |
| 1.6668 | 11.0 | 1969 | 4.9638 | 1.0 | 197.2742 | 1.2414 | 0.8617 | 0.4447 |
| 1.5425 | 12.0 | 2148 | 5.2001 | 1.0 | 190.3786 | 1.3006 | 0.9018 | 0.4182 |
| 1.3228 | 13.0 | 2327 | 5.1050 | 1.0 | 197.3229 | 1.2765 | 0.9036 | 0.4290 |
| 1.0798 | 14.0 | 2506 | 3.9567 | 1.0 | 192.4977 | 0.9895 | 0.8095 | 0.5573 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 1
Model tree for contemmcm/6609d0bf53122d36d71855333b8a0b23
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B