train_multirc_42_1762438236

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1316
  • Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1685 1.0 6130 0.1692 13256608
0.0664 2.0 12260 0.1500 26510112
0.0817 3.0 18390 0.1415 39755376
0.2479 4.0 24520 0.1383 53010912
0.1819 5.0 30650 0.1354 66248576
0.1082 6.0 36780 0.1367 79495984
0.1441 7.0 42910 0.1316 92713360
0.0245 8.0 49040 0.1318 105934480
0.2177 9.0 55170 0.1329 119164864
0.0982 10.0 61300 0.1324 132392640
0.3213 11.0 67430 0.1341 145641920
0.1288 12.0 73560 0.1346 158902432
0.1889 13.0 79690 0.1392 172144032
0.3426 14.0 85820 0.1362 185378480
0.237 15.0 91950 0.1357 198621168
0.1404 16.0 98080 0.1367 211855376
0.1051 17.0 104210 0.1377 225105296
0.2527 18.0 110340 0.1379 238352272
0.0182 19.0 116470 0.1388 251594480
0.0545 20.0 122600 0.1379 264840880

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762438236

Adapter
(2099)
this model

Evaluation results