train_gsm8k_123_1760637709

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4466
  • Num Input Tokens Seen: 34679720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5609 1.0 1682 0.4951 1738584
0.444 2.0 3364 0.4797 3477544
0.5094 3.0 5046 0.4648 5217560
0.5101 4.0 6728 0.4575 6947896
0.4052 5.0 8410 0.4518 8680664
0.3402 6.0 10092 0.4502 10414768
0.4342 7.0 11774 0.4504 12148592
0.3496 8.0 13456 0.4466 13883456
0.4198 9.0 15138 0.4483 15613432
0.3061 10.0 16820 0.4502 17345296
0.3443 11.0 18502 0.4494 19082288
0.4635 12.0 20184 0.4531 20812896
0.3567 13.0 21866 0.4589 22544592
0.3681 14.0 23548 0.4605 24280408
0.3171 15.0 25230 0.4641 26007520
0.3373 16.0 26912 0.4712 27741272
0.3833 17.0 28594 0.4749 29470344
0.3457 18.0 30276 0.4776 31206528
0.3426 19.0 31958 0.4782 32940696
0.3324 20.0 33640 0.4784 34679720

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_123_1760637709

Adapter
(2402)
this model