train_hellaswag_1755694505

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4628
  • Num Input Tokens Seen: 99399984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4699 0.5000 8979 0.4644 4965184
0.4693 1.0001 17958 0.4654 9947520
0.4692 1.5001 26937 0.4633 14913744
0.4586 2.0001 35916 0.4630 19885840
0.4646 2.5001 44895 0.4629 24848464
0.4647 3.0002 53874 0.4629 29830256
0.4622 3.5002 62853 0.4625 34790608
0.4649 4.0002 71832 0.4629 39759792
0.4638 4.5003 80811 0.4635 44726624
0.453 5.0003 89790 0.4627 49707056
0.4681 5.5003 98769 0.4628 54679744
0.4631 6.0003 107748 0.4630 59650096
0.4642 6.5004 116727 0.4624 64626528
0.4601 7.0004 125706 0.4629 69601680
0.4524 7.5004 134685 0.4630 74575408
0.4561 8.0004 143664 0.4629 79549952
0.4605 8.5005 152643 0.4626 84520688
0.4683 9.0005 161622 0.4630 89486320
0.4643 9.5005 170601 0.4632 94447840

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_1755694505

Adapter
(2389)
this model