train_gsm8k_42_1767887013

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4860
  • Num Input Tokens Seen: 15257168

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5935 0.5001 1682 0.6109 759696
0.5817 1.0003 3364 0.5479 1526904
0.6792 1.5004 5046 0.5285 2285816
0.5452 2.0006 6728 0.5159 3053328
0.481 2.5007 8410 0.5097 3817312
0.459 3.0009 10092 0.5050 4575728
0.4884 3.5010 11774 0.5004 5338880
0.6733 4.0012 13456 0.4964 6102824
0.4547 4.5013 15138 0.4958 6869672
0.5045 5.0015 16820 0.4923 7627984
0.4488 5.5016 18502 0.4908 8390160
0.4681 6.0018 20184 0.4897 9156000
0.5548 6.5019 21866 0.4884 9922384
0.4557 7.0021 23548 0.4883 10685720
0.4885 7.5022 25230 0.4873 11449192
0.6288 8.0024 26912 0.4869 12210264
0.4648 8.5025 28594 0.4862 12971896
0.4467 9.0027 30276 0.4860 13739344
0.5898 9.5028 31958 0.4860 14497808

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_42_1767887013

Adapter
(2402)
this model