train_hellaswag_123_1760637744

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7105
  • Num Input Tokens Seen: 218506144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.9379 1.0 8979 1.0204 10932896
0.8461 2.0 17958 0.7757 21856400
0.9398 3.0 26937 0.7126 32797696
0.4784 4.0 35916 0.7125 43715520
0.8555 5.0 44895 0.7155 54639040
0.932 6.0 53874 0.7154 65562352
0.6653 7.0 62853 0.7138 76495264
0.7713 8.0 71832 0.7160 87424000
0.7152 9.0 80811 0.7209 98355744
0.8747 10.0 89790 0.7105 109279616
0.5919 11.0 98769 0.7139 120190896
0.5049 12.0 107748 0.7132 131118336
0.6564 13.0 116727 0.7163 142033584
0.5045 14.0 125706 0.7132 152960704
0.5756 15.0 134685 0.7132 163884192
0.6997 16.0 143664 0.7132 174816592
0.7721 17.0 152643 0.7132 185740864
0.6682 18.0 161622 0.7132 196657440
0.877 19.0 170601 0.7132 207581424
0.8934 20.0 179580 0.7132 218506144

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_123_1760637744

Adapter
(2101)
this model

Evaluation results