train_hellaswag_123_1768397592

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0957
  • Num Input Tokens Seen: 99752256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1492 0.5000 8979 0.1932 4982272
0.1247 1.0001 17958 0.1141 9982144
0.0003 1.5001 26937 0.1143 14966208
0.1381 2.0001 35916 0.1120 19955568
0.057 2.5001 44895 0.1184 24935680
0.0005 3.0002 53874 0.0957 29935424
0.2342 3.5002 62853 0.1080 34914400
0.1786 4.0002 71832 0.1074 39901200
0.0001 4.5003 80811 0.1236 44885728
0.0002 5.0003 89790 0.1097 49884080
0.0003 5.5003 98769 0.1193 54873456
0.0001 6.0003 107748 0.1186 59861808
0.001 6.5004 116727 0.1335 64855584
0.0 7.0004 125706 0.1356 69848320
0.0 7.5004 134685 0.1572 74839776
0.2546 8.0004 143664 0.1441 79831776
0.0001 8.5005 152643 0.1576 84820208
0.0 9.0005 161622 0.1536 89803776
0.0001 9.5005 170601 0.1575 94782720

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_123_1768397592

Adapter
(2369)
this model