train_wsc_789_1760445502

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4253
  • Num Input Tokens Seen: 1462816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3839 1.504 188 0.3746 73440
0.4057 3.008 376 0.3742 147296
0.3465 4.5120 564 0.3450 221744
0.3561 6.016 752 0.3477 293760
0.3112 7.52 940 0.3595 367808
0.3527 9.024 1128 0.3530 440288
0.3569 10.528 1316 0.3493 512864
0.35 12.032 1504 0.3531 587120
0.3494 13.536 1692 0.3555 660352
0.3545 15.04 1880 0.3597 733072
0.3291 16.544 2068 0.3663 805824
0.359 18.048 2256 0.3645 880160
0.3391 19.552 2444 0.3615 955488
0.3396 21.056 2632 0.3751 1028544
0.3387 22.56 2820 0.3819 1101536
0.3643 24.064 3008 0.3972 1174032
0.3289 25.568 3196 0.4119 1247296
0.3515 27.072 3384 0.4189 1320640
0.324 28.576 3572 0.4278 1394256

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760445502

Adapter
(2116)
this model

Evaluation results