train_gsm8k_42_1760637592

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5109
  • Num Input Tokens Seen: 30997232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.458 2.0 2990 0.5270 3099904
0.5182 4.0 5980 0.4897 6201744
0.4238 6.0 8970 0.4812 9301328
0.373 8.0 11960 0.4764 12395616
0.4743 10.0 14950 0.4776 15495968
0.4045 12.0 17940 0.4830 18605040
0.4969 14.0 20930 0.4932 21708432
0.3155 16.0 23920 0.5015 24806352
0.3953 18.0 26910 0.5096 27897344
0.391 20.0 29900 0.5109 30997232

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_42_1760637592

Adapter
(2403)
this model