train_wsc_101112_1760637993

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3499
  • Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3854 1.0 125 0.3560 48944
0.378 2.0 250 0.3499 98080
0.3542 3.0 375 0.5215 146624
0.4084 4.0 500 0.4326 196192
0.3484 5.0 625 0.3583 245216
0.4506 6.0 750 0.3547 294128
0.3989 7.0 875 0.3950 342416
0.3869 8.0 1000 0.3655 391552
0.3819 9.0 1125 0.3625 440848
0.3661 10.0 1250 0.3595 488816
0.3635 11.0 1375 0.3560 537840
0.3325 12.0 1500 0.3659 586624
0.3585 13.0 1625 0.3602 635776
0.3502 14.0 1750 0.3591 684416
0.3623 15.0 1875 0.3555 733488
0.3458 16.0 2000 0.3504 782688
0.3402 17.0 2125 0.3567 831792
0.3469 18.0 2250 0.3544 881328
0.3362 19.0 2375 0.3563 930656
0.3384 20.0 2500 0.3544 980224

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760637993

Adapter
(2105)
this model

Evaluation results