train_wsc_42_1760610259

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3516
  • Num Input Tokens Seen: 1308280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4049 1.5045 167 0.3704 65984
1.0566 3.0090 334 0.3765 131096
0.3436 4.5135 501 0.3724 196400
0.3427 6.0180 668 0.3657 261392
0.3817 7.5225 835 0.3642 326864
0.3408 9.0270 1002 0.3554 391800
0.3607 10.5315 1169 0.3472 458568
0.3327 12.0360 1336 0.3611 523312
0.3368 13.5405 1503 0.3606 589824
0.3569 15.0450 1670 0.3554 655200
0.3574 16.5495 1837 0.3501 721016
0.3559 18.0541 2004 0.3501 787016
0.3614 19.5586 2171 0.3489 853744
0.3341 21.0631 2338 0.3523 918752
0.358 22.5676 2505 0.3522 984472
0.3428 24.0721 2672 0.3537 1050088
0.3811 25.5766 2839 0.3560 1115632
0.3353 27.0811 3006 0.3513 1181344
0.3558 28.5856 3173 0.3544 1246872

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760610259

Adapter
(2106)
this model

Evaluation results