train_mmlu_1754507479

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1757
  • Num Input Tokens Seen: 488118104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0827 0.5000 11233 0.1879 24389728
0.1014 1.0000 22466 0.1757 48789280
0.1202 1.5001 33699 0.1851 73201984
0.0139 2.0001 44932 0.1832 97620120
0.124 2.5001 56165 0.2316 122127480
0.1483 3.0001 67398 0.2331 146471872
0.082 3.5002 78631 0.2339 170850208
0.0045 4.0002 89864 0.2496 195267312
0.0006 4.5002 101097 0.3019 219639056
0.0044 5.0002 112330 0.3395 244095744
0.0001 5.5002 123563 0.4161 268478944
0.0002 6.0003 134796 0.3811 292933144
0.0 6.5003 146029 0.4440 317335480
0.0 7.0003 157262 0.4781 341742832
0.0 7.5003 168495 0.5315 366182192
0.0 8.0004 179728 0.5296 390549264
0.0 8.5004 190961 0.5551 414924016
0.0 9.0004 202194 0.6271 439335448
0.0 9.5004 213427 0.6736 463693944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1754507479

Adapter
(2124)
this model

Evaluation results