train_wsc_101112_1760373109

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0793
  • Num Input Tokens Seen: 1471184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3521 1.504 188 0.4224 74288
0.3545 3.008 376 0.3610 147040
0.3617 4.5120 564 0.3474 221408
0.3523 6.016 752 0.3498 294736
0.3773 7.52 940 0.3696 368400
0.3547 9.024 1128 0.3556 441968
0.3446 10.528 1316 0.3625 514960
0.3445 12.032 1504 0.3630 588032
0.3342 13.536 1692 0.3680 662784
0.3396 15.04 1880 0.3744 735760
0.3547 16.544 2068 0.3644 809088
0.3341 18.048 2256 0.3969 883568
0.3488 19.552 2444 0.4179 958720
0.2865 21.056 2632 0.4510 1031776
0.2557 22.56 2820 0.5495 1105632
0.2666 24.064 3008 0.7044 1179856
0.2655 25.568 3196 0.8358 1253280
0.1862 27.072 3384 0.9530 1327824
0.2195 28.576 3572 1.0787 1400944

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760373109

Adapter
(2121)
this model

Evaluation results