train_wsc_123_1760637653

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3524
  • Num Input Tokens Seen: 977568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5986 1.0 125 0.4532 49376
0.326 2.0 250 0.3678 98240
0.3663 3.0 375 0.3566 147648
0.3278 4.0 500 0.3679 197024
0.3373 5.0 625 0.3590 245472
0.3462 6.0 750 0.3524 293616
0.3101 7.0 875 0.3766 343040
0.3438 8.0 1000 0.3564 392080
0.3271 9.0 1125 0.3568 440848
0.3371 10.0 1250 0.3545 490000
0.3258 11.0 1375 0.3746 538944
0.3757 12.0 1500 0.4974 587536
0.2312 13.0 1625 0.6398 636208
0.2677 14.0 1750 0.8562 685120
0.2486 15.0 1875 1.2159 734352
0.1011 16.0 2000 1.8319 782368
0.0904 17.0 2125 2.1079 831888
0.0639 18.0 2250 2.3978 880112
0.0751 19.0 2375 2.5049 928992
0.0183 20.0 2500 2.5212 977568

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760637653

Adapter
(2105)
this model

Evaluation results