train_wsc_42_1760607963

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4025
  • Num Input Tokens Seen: 1308280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5193 1.5045 167 0.4603 65984
0.3582 3.0090 334 0.3512 131096
0.3353 4.5135 501 0.3546 196400
0.336 6.0180 668 0.3593 261392
0.3773 7.5225 835 0.3542 326864
0.3457 9.0270 1002 0.3513 391800
0.3425 10.5315 1169 0.3527 458568
0.3413 12.0360 1336 0.3508 523312
0.3708 13.5405 1503 0.3493 589824
0.3444 15.0450 1670 0.3611 655200
0.3376 16.5495 1837 0.3530 721016
0.3434 18.0541 2004 0.3523 787016
0.3607 19.5586 2171 0.3575 853744
0.3251 21.0631 2338 0.3595 918752
0.3562 22.5676 2505 0.3617 984472
0.3432 24.0721 2672 0.3722 1050088
0.3507 25.5766 2839 0.3856 1115632
0.3168 27.0811 3006 0.3961 1181344
0.3276 28.5856 3173 0.4033 1246872

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760607963

Adapter
(2116)
this model

Evaluation results