train_hellaswag_456_1760637854

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4622
  • Num Input Tokens Seen: 218351424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.46 1.0 8979 0.4625 10917968
0.4624 2.0 17958 0.4628 21834304
0.4626 3.0 26937 0.4622 32747296
0.464 4.0 35916 0.4624 43666592
0.4626 5.0 44895 0.4625 54575648
0.4623 6.0 53874 0.4626 65491248
0.4597 7.0 62853 0.4622 76405264
0.4664 8.0 71832 0.4625 87319216
0.4624 9.0 80811 0.4626 98235568
0.4639 10.0 89790 0.4623 109159872
0.4629 11.0 98769 0.4626 120071152
0.4637 12.0 107748 0.4626 130995232
0.4658 13.0 116727 0.4625 141910672
0.4597 14.0 125706 0.4624 152831088
0.4629 15.0 134685 0.4626 163756480
0.464 16.0 143664 0.4626 174682064
0.4633 17.0 152643 0.4625 185591248
0.4624 18.0 161622 0.4626 196510528
0.461 19.0 170601 0.4627 207424736
0.4632 20.0 179580 0.4623 218351424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_456_1760637854

Adapter
(2105)
this model

Evaluation results