train_hellaswag_456_1760637859

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0748
  • Num Input Tokens Seen: 218351424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1228 1.0 8979 0.1702 10917968
0.0768 2.0 17958 0.1169 21834304
0.0285 3.0 26937 0.0974 32747296
0.0757 4.0 35916 0.0888 43666592
0.038 5.0 44895 0.0840 54575648
0.0734 6.0 53874 0.0783 65491248
0.0625 7.0 62853 0.0769 76405264
0.1263 8.0 71832 0.0764 87319216
0.1133 9.0 80811 0.0766 98235568
0.0037 10.0 89790 0.0758 109159872
0.0199 11.0 98769 0.0748 120071152
0.0819 12.0 107748 0.0788 130995232
0.018 13.0 116727 0.0777 141910672
0.0472 14.0 125706 0.0795 152831088
0.0686 15.0 134685 0.0781 163756480
0.037 16.0 143664 0.0783 174682064
0.0912 17.0 152643 0.0801 185591248
0.0014 18.0 161622 0.0800 196510528
0.0433 19.0 170601 0.0803 207424736
0.0904 20.0 179580 0.0806 218351424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_456_1760637859

Adapter
(2100)
this model

Evaluation results