train_gsm8k_1754652177

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0554
  • Num Input Tokens Seen: 17277648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6555 0.5 841 0.5746 865376
0.6015 1.0 1682 0.5439 1731768
0.4986 1.5 2523 0.5237 2596664
0.471 2.0 3364 0.5140 3464008
0.5184 2.5 4205 0.5065 4329160
0.546 3.0 5046 0.4999 5197240
0.5276 3.5 5887 0.4964 6061624
0.5402 4.0 6728 0.4954 6920632
0.495 4.5 7569 0.4891 7784408
0.4388 5.0 8410 0.4841 8646936
0.4923 5.5 9251 0.4819 9505560
0.3781 6.0 10092 0.4783 10374192
0.5345 6.5 10933 0.4764 11237008
0.465 7.0 11774 0.4744 12101200
0.5092 7.5 12615 0.4737 12959728
0.3985 8.0 13456 0.4723 13828800
0.5009 8.5 14297 0.4714 14696832
0.4596 9.0 15138 0.4710 15552184
0.4526 9.5 15979 0.4710 16413528
0.3702 10.0 16820 0.4707 17277648

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_1754652177

Adapter
(2124)
this model

Evaluation results