train_hellaswag_456_1760637856

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0580
  • Num Input Tokens Seen: 218351424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0282 1.0 8979 0.0639 10917968
0.0277 2.0 17958 0.0580 21834304
0.0007 3.0 26937 0.1041 32747296
0.0777 4.0 35916 0.1005 43666592
0.0001 5.0 44895 0.1140 54575648
0.0003 6.0 53874 0.1220 65491248
0.0001 7.0 62853 0.1111 76405264
0.0 8.0 71832 0.1342 87319216
0.0 9.0 80811 0.1392 98235568
0.0001 10.0 89790 0.1392 109159872
0.0 11.0 98769 0.1611 120071152
0.0 12.0 107748 0.1659 130995232
0.0 13.0 116727 0.2099 141910672
0.0 14.0 125706 0.1669 152831088
0.0 15.0 134685 0.2066 163756480
0.0 16.0 143664 0.2414 174682064
0.0 17.0 152643 0.2472 185591248
0.0 18.0 161622 0.2462 196510528
0.0 19.0 170601 0.2463 207424736
0.0 20.0 179580 0.2471 218351424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_456_1760637856

Adapter
(2103)
this model

Evaluation results