train_winogrande_789_1760637959

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 6.6388
  • Num Input Tokens Seen: 38393344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
7.0523 1.0 9090 7.1398 1919360
6.6441 2.0 18180 6.7344 3838064
6.5258 3.0 27270 6.6521 5755984
6.6584 4.0 36360 6.6504 7675760
6.5786 5.0 45450 6.6429 9596528
6.7721 6.0 54540 6.6517 11515248
6.3773 7.0 63630 6.6517 13435888
6.435 8.0 72720 6.6445 15356016
6.6438 9.0 81810 6.6479 17274448
6.5259 10.0 90900 6.6560 19194672
6.6663 11.0 99990 6.6626 21115984
6.68 12.0 109080 6.6448 23036144
6.5956 13.0 118170 6.6388 24955120
7.0557 14.0 127260 6.6388 26874400
6.8413 15.0 136350 6.6388 28793728
6.6219 16.0 145440 6.6388 30713760
6.633 17.0 154530 6.6388 32634016
6.7333 18.0 163620 6.6388 34554208
6.5471 19.0 172710 6.6388 36474880
6.6363 20.0 181800 6.6388 38393344

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_789_1760637959

Adapter
(2107)
this model

Evaluation results