train_gsm8k_789_1760637939

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4606
  • Num Input Tokens Seen: 34722248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5183 1.0 1682 0.5123 1739480
0.4214 2.0 3364 0.4863 3478568
0.4153 3.0 5046 0.4658 5217760
0.3497 4.0 6728 0.4606 6949888
0.4135 5.0 8410 0.4619 8687904
0.3944 6.0 10092 0.4797 10421288
0.2726 7.0 11774 0.5065 12155264
0.2978 8.0 13456 0.5409 13889536
0.2498 9.0 15138 0.6085 15631248
0.2111 10.0 16820 0.6649 17370104
0.1538 11.0 18502 0.7406 19100344
0.12 12.0 20184 0.8414 20834120
0.0464 13.0 21866 0.9916 22566752
0.0451 14.0 23548 1.0680 24305592
0.0349 15.0 25230 1.1824 26037952
0.0359 16.0 26912 1.2705 27770056
0.0154 17.0 28594 1.3853 29506864
0.0174 18.0 30276 1.4368 31245432
0.0103 19.0 31958 1.4876 32980080
0.012 20.0 33640 1.4942 34722248

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_789_1760637939

Adapter
(2405)
this model