train_gsm8k_456_1760637827

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5208
  • Num Input Tokens Seen: 34715672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.0371 1.0 1682 0.9566 1733152
0.5626 2.0 3364 0.6308 3469824
0.6303 3.0 5046 0.5850 5206688
0.5709 4.0 6728 0.5655 6940592
0.5423 5.0 8410 0.5536 8670720
0.5046 6.0 10092 0.5453 10405952
0.5171 7.0 11774 0.5393 12141496
0.5568 8.0 13456 0.5347 13881648
0.4835 9.0 15138 0.5312 15618544
0.5408 10.0 16820 0.5283 17357848
0.5287 11.0 18502 0.5262 19092400
0.6044 12.0 20184 0.5246 20828128
0.5055 13.0 21866 0.5234 22562912
0.5093 14.0 23548 0.5226 24304376
0.476 15.0 25230 0.5216 26038152
0.4612 16.0 26912 0.5214 27771064
0.5536 17.0 28594 0.5210 29508400
0.5091 18.0 30276 0.5209 31244288
0.5361 19.0 31958 0.5210 32976960
0.5375 20.0 33640 0.5208 34715672

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_456_1760637827

Adapter
(2120)
this model

Evaluation results