train_multirc_42_1762193638

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2830
  • Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3298 1.0 6130 0.3221 13256608
0.2759 2.0 12260 0.3199 26510112
0.2767 3.0 18390 0.3203 39755376
0.3458 4.0 24520 0.3165 53010912
0.3172 5.0 30650 0.3173 66248576
0.3261 6.0 36780 0.3177 79495984
0.3408 7.0 42910 0.3164 92713360
0.3469 8.0 49040 0.3167 105934480
0.3889 9.0 55170 0.3194 119164864
0.2609 10.0 61300 0.3174 132392640
0.4061 11.0 67430 0.3280 145641920
0.2662 12.0 73560 0.3165 158902432
0.3582 13.0 79690 0.3145 172144032
0.3373 14.0 85820 0.3180 185378480
0.3475 15.0 91950 0.2939 198621168
0.2775 16.0 98080 0.2842 211855376
0.2925 17.0 104210 0.2841 225105296
0.3079 18.0 110340 0.2830 238352272
0.354 19.0 116470 0.2837 251594480
0.3419 20.0 122600 0.2838 264840880

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762193638

Adapter
(2100)
this model

Evaluation results