train_mmlu_1755694502

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mmlu dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6919
  • Num Input Tokens Seen: 431580728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6913 0.5000 22465 0.6950 21578576
0.6438 1.0000 44930 0.6946 43161200
0.682 1.5000 67395 0.6930 64739536
0.6658 2.0000 89860 0.6937 86326488
0.6823 2.5001 112325 0.6926 107942872
0.71 3.0001 134790 0.6973 129482000
0.709 3.5001 157255 0.7155 151041936
0.6879 4.0001 179720 0.6937 172640384
0.6909 4.5001 202185 0.6929 194176752
0.6842 5.0001 224650 0.6927 215777904
0.6954 5.5001 247115 0.6916 237357552
0.6841 6.0001 269580 0.6928 258952232
0.7005 6.5001 292045 0.6921 280508424
0.6848 7.0002 314510 0.6916 302103728
0.6768 7.5002 336975 0.6926 323714608
0.7113 8.0002 359440 0.6929 345271088
0.702 8.5002 381905 0.6928 366850992
0.7148 9.0002 404370 0.6916 388444744
0.716 9.5002 426835 0.6924 410013448

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mmlu_1755694502

Adapter
(2124)
this model

Evaluation results