train_mmlu_1754751378

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2521
  • Num Input Tokens Seen: 488118104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2361 0.5000 11233 0.3173 24389728
0.0327 1.0000 22466 0.2957 48789280
0.1219 1.5001 33699 0.2521 73201984
0.066 2.0001 44932 0.2651 97620120
0.2566 2.5001 56165 0.2855 122127480
0.2288 3.0001 67398 0.2779 146471872
0.1661 3.5002 78631 0.2676 170850208
0.2506 4.0002 89864 0.2578 195267312
0.1615 4.5002 101097 0.2810 219639056
0.373 5.0002 112330 0.2782 244095744
0.3857 5.5002 123563 0.2833 268478944
0.3265 6.0003 134796 0.2790 292933144
0.0213 6.5003 146029 0.2818 317335480
0.2185 7.0003 157262 0.2768 341742832
0.2843 7.5003 168495 0.2796 366182192
0.179 8.0004 179728 0.2794 390549264
0.0382 8.5004 190961 0.2805 414924016
0.2071 9.0004 202194 0.2795 439335448
0.3409 9.5004 213427 0.2805 463693944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1754751378

Adapter
(2107)
this model

Evaluation results