train_hellaswag_42_1760637628

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7116
  • Num Input Tokens Seen: 218263888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.8732 1.0 8979 1.0233 10917120
0.7955 2.0 17958 0.7851 21836032
0.7456 3.0 26937 0.7186 32746560
0.766 4.0 35916 0.7116 43661424
0.6636 5.0 44895 0.7137 54578912
0.6306 6.0 53874 0.7167 65488016
0.8581 7.0 62853 0.7175 76410304
0.7264 8.0 71832 0.7148 87327296
0.7082 9.0 80811 0.7139 98229232
0.5871 10.0 89790 0.7167 109127968
0.6326 11.0 98769 0.7145 120042688
0.8029 12.0 107748 0.7155 130954720
0.7298 13.0 116727 0.7155 141874656
0.6079 14.0 125706 0.7155 152783392
0.5391 15.0 134685 0.7155 163694096
0.7002 16.0 143664 0.7155 174604544
0.5754 17.0 152643 0.7155 185523328
0.7789 18.0 161622 0.7155 196433472
0.5864 19.0 170601 0.7155 207345200
0.7214 20.0 179580 0.7155 218263888

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_42_1760637628

Adapter
(2389)
this model