train_winogrande_456_1760637846

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0648
  • Num Input Tokens Seen: 38395408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1968 1.0 9090 0.1597 1919808
0.0788 2.0 18180 0.0975 3839104
0.0772 3.0 27270 0.0804 5758016
0.0358 4.0 36360 0.0748 7678560
0.0224 5.0 45450 0.0699 9598912
0.1415 6.0 54540 0.0685 11518656
0.0745 7.0 63630 0.0664 13438320
0.0214 8.0 72720 0.0658 15358064
0.1006 9.0 81810 0.0648 17278064
0.2738 10.0 90900 0.0699 19196144
0.0162 11.0 99990 0.0696 21117200
0.0013 12.0 109080 0.0718 23037584
0.0559 13.0 118170 0.0793 24956720
0.0916 14.0 127260 0.0801 26875344
0.0007 15.0 136350 0.0828 28793344
0.0226 16.0 145440 0.0860 30713568
0.0427 17.0 154530 0.0895 32635088
0.0093 18.0 163620 0.0888 34555376
0.1062 19.0 172710 0.0889 36474544
0.0049 20.0 181800 0.0879 38395408

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_456_1760637846

Adapter
(2155)
this model

Evaluation results