train_gsm8k_42_1760637596

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3668
  • Num Input Tokens Seen: 34797032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.359 1.0 1682 1.3778 1738904
1.3101 2.0 3364 1.3702 3481872
1.4164 3.0 5046 1.3674 5222160
1.3298 4.0 6728 1.3674 6964040
1.4647 5.0 8410 1.3673 8703920
1.4088 6.0 10092 1.3674 10444208
1.247 7.0 11774 1.3677 12184024
1.4788 8.0 13456 1.3670 13925976
1.4338 9.0 15138 1.3675 15667472
1.2503 10.0 16820 1.3668 17407120
1.398 11.0 18502 1.3673 19145240
1.2493 12.0 20184 1.3671 20882704
1.4611 13.0 21866 1.3674 22623936
1.3645 14.0 23548 1.3670 24361736
1.511 15.0 25230 1.3672 26106632
1.4059 16.0 26912 1.3672 27845256
1.3993 17.0 28594 1.3672 29579704
1.4476 18.0 30276 1.3672 31322600
1.3274 19.0 31958 1.3672 33057576
1.5706 20.0 33640 1.3672 34797032

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_42_1760637596

Adapter
(2404)
this model