train_gsm8k_456_1760637823

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4582
  • Num Input Tokens Seen: 34715672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5763 1.0 1682 0.5152 1733152
0.4466 2.0 3364 0.4911 3469824
0.494 3.0 5046 0.4777 5206688
0.49 4.0 6728 0.4710 6940592
0.4609 5.0 8410 0.4642 8670720
0.4278 6.0 10092 0.4620 10405952
0.4217 7.0 11774 0.4605 12141496
0.4437 8.0 13456 0.4582 13881648
0.3743 9.0 15138 0.4598 15618544
0.4087 10.0 16820 0.4619 17357848
0.4177 11.0 18502 0.4599 19092400
0.4703 12.0 20184 0.4680 20828128
0.3937 13.0 21866 0.4674 22562912
0.386 14.0 23548 0.4706 24304376
0.3788 15.0 25230 0.4742 26038152
0.3471 16.0 26912 0.4795 27771064
0.4133 17.0 28594 0.4848 29508400
0.3481 18.0 30276 0.4869 31244288
0.3554 19.0 31958 0.4879 32976960
0.3729 20.0 33640 0.4878 34715672

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_456_1760637823

Adapter
(2399)
this model