train_hellaswag_456_1760637857

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7093
  • Num Input Tokens Seen: 218351424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.0742 1.0 8979 1.0175 10917968
0.8217 2.0 17958 0.7741 21834304
0.4915 3.0 26937 0.7171 32747296
0.5894 4.0 35916 0.7116 43666592
0.5193 5.0 44895 0.7111 54575648
0.7684 6.0 53874 0.7108 65491248
0.8571 7.0 62853 0.7188 76405264
0.9533 8.0 71832 0.7093 87319216
0.6646 9.0 80811 0.7220 98235568
0.7685 10.0 89790 0.7222 109159872
0.6413 11.0 98769 0.7191 120071152
0.8741 12.0 107748 0.7190 130995232
0.6947 13.0 116727 0.7254 141910672
1.1252 14.0 125706 0.7254 152831088
0.8754 15.0 134685 0.7254 163756480
0.7325 16.0 143664 0.7254 174682064
0.7487 17.0 152643 0.7254 185591248
0.7894 18.0 161622 0.7254 196510528
0.6409 19.0 170601 0.7254 207424736
0.6316 20.0 179580 0.7254 218351424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_456_1760637857

Adapter
(2102)
this model

Evaluation results