train_multirc_789_1770179268

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2398
  • Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2743 1.0 6130 0.1446 13229504
0.3508 2.0 12260 0.1319 26459312
0.0471 3.0 18390 0.1330 39686560
0.0523 4.0 24520 0.1273 52924864
0.0214 5.0 30650 0.1310 66146528
0.0673 6.0 36780 0.1295 79364192
0.0444 7.0 42910 0.1361 92568704
0.1785 8.0 49040 0.1272 105788704
0.1028 9.0 55170 0.1462 119004560
0.1022 10.0 61300 0.1377 132223184
0.128 11.0 67430 0.1476 145445936
0.0337 12.0 73560 0.1638 158686320
0.0081 13.0 79690 0.1614 171910720
0.005 14.0 85820 0.1706 185139040
0.0011 15.0 91950 0.2127 198344384
0.0468 16.0 98080 0.2110 211581728
0.0088 17.0 104210 0.2478 224787008
0.0057 18.0 110340 0.2599 238003072
0.0047 19.0 116470 0.2635 251207552
0.0801 20.0 122600 0.2689 264395536

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
94
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770179268

Adapter
(2202)
this model