train_wsc_101112_1760361439

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3450
  • Num Input Tokens Seen: 1471184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.0817 1.504 188 0.8057 74288
0.3497 3.008 376 0.3515 147040
0.3625 4.5120 564 0.3450 221408
0.343 6.016 752 0.3483 294736
0.3678 7.52 940 0.3569 368400
0.3504 9.024 1128 0.3474 441968
0.3431 10.528 1316 0.3493 514960
0.3332 12.032 1504 0.3504 588032
0.3361 13.536 1692 0.3494 662784
0.3525 15.04 1880 0.3474 735760
0.3521 16.544 2068 0.3508 809088
0.3428 18.048 2256 0.3512 883568
0.3558 19.552 2444 0.3522 958720
0.3381 21.056 2632 0.3522 1031776
0.3469 22.56 2820 0.3544 1105632
0.352 24.064 3008 0.3535 1179856
0.3525 25.568 3196 0.3525 1253280
0.3581 27.072 3384 0.3506 1327824
0.3318 28.576 3572 0.3541 1400944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760361439

Adapter
(2405)
this model