train_multirc_456_1767559857

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1269
  • Num Input Tokens Seen: 264580656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.225 1.0 6130 0.1736 13210560
0.1187 2.0 12260 0.1532 26427632
0.085 3.0 18390 0.1430 39656608
0.048 4.0 24520 0.1354 52911264
0.2157 5.0 30650 0.1320 66151456
0.1981 6.0 36780 0.1298 79368416
0.0619 7.0 42910 0.1307 92601264
0.0476 8.0 49040 0.1302 105825424
0.0639 9.0 55170 0.1283 119051808
0.0558 10.0 61300 0.1269 132282512
0.1441 11.0 67430 0.1269 145503424
0.0301 12.0 73560 0.1301 158718688
0.3464 13.0 79690 0.1273 171968272
0.0198 14.0 85820 0.1273 185192912
0.0466 15.0 91950 0.1328 198406496
0.1896 16.0 98080 0.1315 211654512
0.0075 17.0 104210 0.1332 224875552
0.2091 18.0 110340 0.1315 238097088
0.132 19.0 116470 0.1323 251336240
0.0651 20.0 122600 0.1315 264580656

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_456_1767559857

Adapter
(2202)
this model