train_multirc_789_1770347547

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1373
  • Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3515 1.0 6130 0.1758 13229504
0.4644 2.0 12260 0.1555 26459312
0.0673 3.0 18390 0.1501 39686560
0.0434 4.0 24520 0.1467 52924864
0.0446 5.0 30650 0.1434 66146528
0.1041 6.0 36780 0.1441 79364192
0.0612 7.0 42910 0.1468 92568704
0.1859 8.0 49040 0.1373 105788704
0.0601 9.0 55170 0.1433 119004560
0.1029 10.0 61300 0.1435 132223184
0.1188 11.0 67430 0.1397 145445936
0.0692 12.0 73560 0.1439 158686320
0.0733 13.0 79690 0.1437 171910720
0.0066 14.0 85820 0.1449 185139040
0.0955 15.0 91950 0.1477 198344384
0.2482 16.0 98080 0.1450 211581728
0.0487 17.0 104210 0.1476 224787008
0.1336 18.0 110340 0.1471 238003072
0.1346 19.0 116470 0.1484 251207552
0.1536 20.0 122600 0.1481 264395536

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
104
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770347547

Adapter
(2202)
this model