train_hellaswag_42_1760637625

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0506
  • Num Input Tokens Seen: 218263888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4621 1.0 8979 0.4625 10917120
0.0788 2.0 17958 0.0731 21836032
0.0643 3.0 26937 0.0575 32746560
0.4636 4.0 35916 0.4627 43661424
0.4607 5.0 44895 0.4626 54578912
0.465 6.0 53874 0.4625 65488016
0.467 7.0 62853 0.4628 76410304
0.4664 8.0 71832 0.4625 87327296
0.4627 9.0 80811 0.4623 98229232
0.0294 10.0 89790 0.0517 109127968
0.0184 11.0 98769 0.0506 120042688
0.0492 12.0 107748 0.0529 130954720
0.0642 13.0 116727 0.0507 141874656
0.0047 14.0 125706 0.0540 152783392
0.0084 15.0 134685 0.0607 163694096
0.0131 16.0 143664 0.0624 174604544
0.0037 17.0 152643 0.0652 185523328
0.0023 18.0 161622 0.0654 196433472
0.0018 19.0 170601 0.0653 207345200
0.0006 20.0 179580 0.0654 218263888

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_42_1760637625

Adapter
(2100)
this model

Evaluation results