train_gsm8k_1755694509

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7482
  • Num Input Tokens Seen: 15155440

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5743 0.5001 1682 0.5314 759008
0.6211 1.0003 3364 0.5054 1517864
0.3877 1.5004 5046 0.4875 2273400
0.5209 2.0006 6728 0.4729 3037160
0.4078 2.5007 8410 0.4679 3795592
0.401 3.0009 10092 0.4644 4555528
0.375 3.5010 11774 0.4729 5314760
0.3951 4.0012 13456 0.4656 6070808
0.3148 4.5013 15138 0.4866 6830920
0.3418 5.0015 16820 0.4912 7584184
0.3706 5.5016 18502 0.5232 8338312
0.2833 6.0018 20184 0.5268 9097632
0.1896 6.5019 21866 0.5774 9855664
0.2015 7.0021 23548 0.5712 10613216
0.184 7.5022 25230 0.6563 11365376
0.2018 8.0024 26912 0.6483 12128624
0.2047 8.5025 28594 0.7077 12889056
0.1674 9.0027 30276 0.7110 13643432
0.1423 9.5028 31958 0.7519 14398584

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_1755694509

Adapter
(2402)
this model