train_hellaswag_42_1760637629

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0726
  • Num Input Tokens Seen: 218263888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1113 1.0 8979 0.1684 10917120
0.1152 2.0 17958 0.1126 21836032
0.0899 3.0 26937 0.0936 32746560
0.0937 4.0 35916 0.0841 43661424
0.1188 5.0 44895 0.0806 54578912
0.0533 6.0 53874 0.0765 65488016
0.1069 7.0 62853 0.0745 76410304
0.033 8.0 71832 0.0738 87327296
0.0927 9.0 80811 0.0726 98229232
0.0786 10.0 89790 0.0734 109127968
0.0082 11.0 98769 0.0738 120042688
0.1095 12.0 107748 0.0750 130954720
0.1746 13.0 116727 0.0731 141874656
0.0093 14.0 125706 0.0746 152783392
0.0103 15.0 134685 0.0773 163694096
0.0435 16.0 143664 0.0763 174604544
0.0049 17.0 152643 0.0771 185523328
0.0213 18.0 161622 0.0772 196433472
0.0049 19.0 170601 0.0775 207345200
0.0021 20.0 179580 0.0772 218263888

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_42_1760637629

Adapter
(2389)
this model