train_winogrande_1755694506

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2307
  • Num Input Tokens Seen: 30120720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2294 0.5000 9090 0.2313 1506080
0.2336 1.0001 18180 0.2312 3011568
0.2294 1.5001 27270 0.2312 4517568
0.2525 2.0001 36360 0.2334 6023712
0.2341 2.5001 45450 0.2320 7529008
0.2378 3.0002 54540 0.2319 9035904
0.2386 3.5002 63630 0.2334 10541968
0.2444 4.0002 72720 0.2311 12047824
0.2238 4.5002 81810 0.2324 13553584
0.2212 5.0003 90900 0.2319 15059504
0.2224 5.5003 99990 0.2314 16564784
0.2274 6.0003 109080 0.2313 18071824
0.2381 6.5004 118170 0.2310 19578608
0.2357 7.0004 127260 0.2316 21084064
0.2271 7.5004 136350 0.2311 22590736
0.2437 8.0004 145440 0.2315 24096816
0.2355 8.5005 154530 0.2310 25603904
0.2264 9.0005 163620 0.2309 27109968
0.2341 9.5005 172710 0.2303 28615376

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_winogrande_1755694506

Adapter
(2394)
this model