train_mmlu_1755681415

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1708
  • Num Input Tokens Seen: 488118104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1364 0.5000 11233 0.2498 24389728
0.0552 1.0000 22466 0.2182 48789280
0.2115 1.5001 33699 0.2011 73201984
0.0865 2.0001 44932 0.1919 97620120
0.0963 2.5001 56165 0.1872 122127480
0.1485 3.0001 67398 0.1822 146471872
0.1541 3.5002 78631 0.1787 170850208
0.1574 4.0002 89864 0.1760 195267312
0.0981 4.5002 101097 0.1755 219639056
0.1848 5.0002 112330 0.1736 244095744
0.1731 5.5002 123563 0.1730 268478944
0.1704 6.0003 134796 0.1713 292933144
0.0277 6.5003 146029 0.1724 317335480
0.1044 7.0003 157262 0.1709 341742832
0.1996 7.5003 168495 0.1710 366182192
0.1288 8.0004 179728 0.1710 390549264
0.0802 8.5004 190961 0.1711 414924016
0.088 9.0004 202194 0.1708 439335448
0.1749 9.5004 213427 0.1710 463693944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1755681415

Adapter
(2187)
this model