train_multirc_789_1770132513

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3186
  • Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3101 1.0 6130 0.3239 13229504
0.258 2.0 12260 0.3212 26459312
0.2858 3.0 18390 0.3202 39686560
0.3227 4.0 24520 0.3193 52924864
0.3292 5.0 30650 0.3198 66146528
0.3253 6.0 36780 0.3195 79364192
0.4473 7.0 42910 0.3259 92568704
0.324 8.0 49040 0.3208 105788704
0.2602 9.0 55170 0.3190 119004560
0.2671 10.0 61300 0.3194 132223184
0.3743 11.0 67430 0.3214 145445936
0.3852 12.0 73560 0.3194 158686320
0.3533 13.0 79690 0.3192 171910720
0.2744 14.0 85820 0.3186 185139040
0.2871 15.0 91950 0.3200 198344384
0.2853 16.0 98080 0.3240 211581728
0.4111 17.0 104210 0.3244 224787008
0.3044 18.0 110340 0.3247 238003072
0.284 19.0 116470 0.3254 251207552
0.2732 20.0 122600 0.3258 264395536

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
105
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770132513

Adapter
(2202)
this model