train_wsc_456_1768397601

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3353
  • Num Input Tokens Seen: 434784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5806 0.5020 125 0.3903 21856
0.3905 1.0040 250 0.3492 43616
0.3177 1.5060 375 0.3473 65728
0.3597 2.0080 500 0.3353 87264
0.3412 2.5100 625 0.3670 108800
0.3437 3.0120 750 0.3367 130752
0.3563 3.5141 875 0.3360 153248
0.3521 4.0161 1000 0.3649 174560
0.3514 4.5181 1125 0.3627 196896
0.3449 5.0201 1250 0.3473 218384
0.3636 5.5221 1375 0.3436 239584
0.3541 6.0241 1500 0.3560 261760
0.3572 6.5261 1625 0.3578 283744
0.4247 7.0281 1750 0.3530 305552
0.3511 7.5301 1875 0.3504 326736
0.3561 8.0321 2000 0.3551 349328
0.3485 8.5341 2125 0.3589 371040
0.3365 9.0361 2250 0.3598 392704
0.3674 9.5382 2375 0.3632 414608

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_456_1768397601

Adapter
(2398)
this model