train_wsc_101112_1760637996

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3487
  • Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3582 1.0 125 0.3539 48944
0.3572 2.0 250 0.3494 98080
0.3497 3.0 375 0.3487 146624
0.5069 4.0 500 0.3682 196192
0.3625 5.0 625 0.3678 245216
0.4367 6.0 750 0.3667 294128
0.3587 7.0 875 0.3624 342416
0.3383 8.0 1000 0.3683 391552
0.3744 9.0 1125 0.3944 440848
0.3203 10.0 1250 0.6999 488816
0.3141 11.0 1375 0.6510 537840
0.3207 12.0 1500 1.2558 586624
0.4775 13.0 1625 1.5535 635776
0.1342 14.0 1750 1.7732 684416
0.2525 15.0 1875 2.2903 733488
0.1895 16.0 2000 2.8244 782688
0.1377 17.0 2125 3.0176 831792
0.0095 18.0 2250 3.2261 881328
0.1878 19.0 2375 3.2861 930656
0.1642 20.0 2500 3.3436 980224

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760637996

Adapter
(2394)
this model