train_gsm8k_456_1760637826

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3670
  • Num Input Tokens Seen: 34715672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.4368 1.0 1682 1.3780 1733152
1.2597 2.0 3364 1.3700 3469824
1.4624 3.0 5046 1.3677 5206688
1.3455 4.0 6728 1.3674 6940592
1.2476 5.0 8410 1.3671 8670720
1.4678 6.0 10092 1.3673 10405952
1.2484 7.0 11774 1.3674 12141496
1.2448 8.0 13456 1.3672 13881648
1.3418 9.0 15138 1.3672 15618544
1.3421 10.0 16820 1.3670 17357848
1.3243 11.0 18502 1.3672 19092400
1.5747 12.0 20184 1.3674 20828128
1.3693 13.0 21866 1.3671 22562912
1.3084 14.0 23548 1.3673 24304376
1.3728 15.0 25230 1.3676 26038152
1.2376 16.0 26912 1.3670 27771064
1.3527 17.0 28594 1.3671 29508400
1.3263 18.0 30276 1.3671 31244288
1.3912 19.0 31958 1.3671 32976960
1.3835 20.0 33640 1.3671 34715672

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_gsm8k_456_1760637826

Adapter
(2403)
this model