train_multirc_789_1770226022

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1418
  • Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2742 1.0 6130 0.1656 13229504
0.2849 2.0 12260 0.1418 26459312
0.0059 3.0 18390 0.1697 39686560
0.0028 4.0 24520 0.1787 52924864
0.0023 5.0 30650 0.2422 66146528
0.0003 6.0 36780 0.2691 79364192
0.0001 7.0 42910 0.2888 92568704
0.0001 8.0 49040 0.3459 105788704
0.0002 9.0 55170 0.2941 119004560
0.1706 10.0 61300 0.3212 132223184
0.0 11.0 67430 0.3899 145445936
0.0733 12.0 73560 0.3885 158686320
0.0 13.0 79690 0.4526 171910720
0.0 14.0 85820 0.4464 185139040
0.0 15.0 91950 0.5905 198344384
0.0 16.0 98080 0.6026 211581728
0.0 17.0 104210 0.6137 224787008
0.0 18.0 110340 0.6311 238003072
0.0 19.0 116470 0.6385 251207552
0.0 20.0 122600 0.6409 264395536

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
113
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770226022

Adapter
(2202)
this model