train_gsm8k_42_1760637597

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5047
  • Num Input Tokens Seen: 34797032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.9545 1.0 1682 0.9514 1738904
0.696 2.0 3364 0.6189 3481872
0.6072 3.0 5046 0.5705 5222160
0.4677 4.0 6728 0.5498 6964040
0.5573 5.0 8410 0.5374 8703920
0.4639 6.0 10092 0.5292 10444208
0.5226 7.0 11774 0.5232 12184024
0.6014 8.0 13456 0.5187 13925976
0.5909 9.0 15138 0.5149 15667472
0.5348 10.0 16820 0.5122 17407120
0.4864 11.0 18502 0.5102 19145240
0.4748 12.0 20184 0.5086 20882704
0.5335 13.0 21866 0.5073 22623936
0.525 14.0 23548 0.5063 24361736
0.5153 15.0 25230 0.5057 26106632
0.4356 16.0 26912 0.5052 27845256
0.515 17.0 28594 0.5049 29579704
0.5429 18.0 30276 0.5047 31322600
0.4554 19.0 31958 0.5047 33057576
0.5521 20.0 33640 0.5047 34797032

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_42_1760637597

Adapter
(2402)
this model