train_wsc_789_1760637880

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3484
  • Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
7.7829 1.0 125 5.3146 48896
0.4495 2.0 250 0.7327 97760
0.354 3.0 375 0.4015 146816
0.3678 4.0 500 0.3694 195376
0.4158 5.0 625 0.4583 244368
0.3997 6.0 750 0.3579 293088
0.3524 7.0 875 0.3502 341856
0.3817 8.0 1000 0.3707 390544
0.3803 9.0 1125 0.3625 439264
0.3775 10.0 1250 0.3508 487904
0.369 11.0 1375 0.3764 536960
0.3614 12.0 1500 0.3484 585712
0.3603 13.0 1625 0.3600 634464
0.3783 14.0 1750 0.3639 682800
0.3633 15.0 1875 0.3538 731376
0.3462 16.0 2000 0.3501 779936
0.3554 17.0 2125 0.3579 828880
0.3616 18.0 2250 0.3541 877920
0.3537 19.0 2375 0.3561 927488
0.3401 20.0 2500 0.3537 976592

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760637880

Adapter
(2105)
this model

Evaluation results