train_gsm8k_1756729617

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7543
  • Num Input Tokens Seen: 15155440

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5784 0.5001 1682 0.5330 759008
0.6186 1.0003 3364 0.5055 1517864
0.3864 1.5004 5046 0.4863 2273400
0.5165 2.0006 6728 0.4738 3037160
0.4022 2.5007 8410 0.4690 3795592
0.4049 3.0009 10092 0.4641 4555528
0.3764 3.5010 11774 0.4717 5314760
0.3987 4.0012 13456 0.4671 6070808
0.3137 4.5013 15138 0.4844 6830920
0.3333 5.0015 16820 0.4907 7584184
0.386 5.5016 18502 0.5128 8338312
0.2978 6.0018 20184 0.5198 9097632
0.1936 6.5019 21866 0.5751 9855664
0.19 7.0021 23548 0.5805 10613216
0.1894 7.5022 25230 0.6503 11365376
0.1987 8.0024 26912 0.6465 12128624
0.2069 8.5025 28594 0.7107 12889056
0.1638 9.0027 30276 0.7147 13643432
0.1469 9.5028 31958 0.7518 14398584

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_1756729617

Adapter
(2401)
this model