train_winogrande_123_1760637728

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2312
  • Num Input Tokens Seen: 38394016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2304 1.0 9090 0.2313 1918144
0.2276 2.0 18180 0.2321 3838192
0.2308 3.0 27270 0.2314 5757648
0.2309 4.0 36360 0.2313 7676976
0.2279 5.0 45450 0.2316 9596496
0.2284 6.0 54540 0.2313 11516256
0.2319 7.0 63630 0.2313 13435600
0.2319 8.0 72720 0.2314 15356752
0.2329 9.0 81810 0.2315 17276752
0.2313 10.0 90900 0.2313 19196064
0.2297 11.0 99990 0.2312 21115472
0.2297 12.0 109080 0.2313 23035440
0.2308 13.0 118170 0.2314 24955600
0.2308 14.0 127260 0.2313 26875344
0.2308 15.0 136350 0.2313 28795600
0.2303 16.0 145440 0.2313 30715008
0.2329 17.0 154530 0.2313 32634912
0.2314 18.0 163620 0.2313 34554080
0.2308 19.0 172710 0.2314 36472448
0.2314 20.0 181800 0.2313 38394016

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_123_1760637728

Adapter
(2188)
this model