train_wsc_123_1760637652

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 6.9383
  • Num Input Tokens Seen: 977568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3549 1.0 125 0.4709 49376
0.343 2.0 250 0.3660 98240
0.3706 3.0 375 0.3483 147648
0.3322 4.0 500 0.3553 197024
0.3417 5.0 625 0.3575 245472
0.3543 6.0 750 0.3492 293616
0.3127 7.0 875 0.3661 343040
0.346 8.0 1000 0.3488 392080
0.326 9.0 1125 0.3519 440848
0.3382 10.0 1250 0.3510 490000
0.3372 11.0 1375 0.3552 538944
0.3673 12.0 1500 0.3610 587536
0.3505 13.0 1625 0.3524 636208
0.3504 14.0 1750 0.3494 685120
0.34 15.0 1875 0.3535 734352
0.3549 16.0 2000 0.3510 782368
0.3345 17.0 2125 0.3498 831888
0.3504 18.0 2250 0.3540 880112
0.3479 19.0 2375 0.3544 928992
0.3351 20.0 2500 0.3543 977568

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760637652

Adapter
(2124)
this model

Evaluation results