train_gsm8k_42_1767887013
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:
- Loss: 0.4860
- Num Input Tokens Seen: 15257168
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.5935 | 0.5001 | 1682 | 0.6109 | 759696 |
| 0.5817 | 1.0003 | 3364 | 0.5479 | 1526904 |
| 0.6792 | 1.5004 | 5046 | 0.5285 | 2285816 |
| 0.5452 | 2.0006 | 6728 | 0.5159 | 3053328 |
| 0.481 | 2.5007 | 8410 | 0.5097 | 3817312 |
| 0.459 | 3.0009 | 10092 | 0.5050 | 4575728 |
| 0.4884 | 3.5010 | 11774 | 0.5004 | 5338880 |
| 0.6733 | 4.0012 | 13456 | 0.4964 | 6102824 |
| 0.4547 | 4.5013 | 15138 | 0.4958 | 6869672 |
| 0.5045 | 5.0015 | 16820 | 0.4923 | 7627984 |
| 0.4488 | 5.5016 | 18502 | 0.4908 | 8390160 |
| 0.4681 | 6.0018 | 20184 | 0.4897 | 9156000 |
| 0.5548 | 6.5019 | 21866 | 0.4884 | 9922384 |
| 0.4557 | 7.0021 | 23548 | 0.4883 | 10685720 |
| 0.4885 | 7.5022 | 25230 | 0.4873 | 11449192 |
| 0.6288 | 8.0024 | 26912 | 0.4869 | 12210264 |
| 0.4648 | 8.5025 | 28594 | 0.4862 | 12971896 |
| 0.4467 | 9.0027 | 30276 | 0.4860 | 13739344 |
| 0.5898 | 9.5028 | 31958 | 0.4860 | 14497808 |
Framework versions
- PEFT 0.17.1
- Transformers 4.51.3
- Pytorch 2.9.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- -
Model tree for rbelanec/train_gsm8k_42_1767887013
Base model
meta-llama/Meta-Llama-3-8B-Instruct