train_wsc_42_1760443873

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3569
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4427 1.504 188 0.3598 73872
0.3941 3.008 376 0.3463 148192
0.3445 4.5120 564 0.3619 221984
0.3308 6.016 752 0.3545 295616
0.3487 7.52 940 0.3585 370688
0.3494 9.024 1128 0.3646 444448
0.3451 10.528 1316 0.3590 519088
0.3591 12.032 1504 0.3540 592272
0.3523 13.536 1692 0.3514 667952
0.3234 15.04 1880 0.3507 741072
0.3398 16.544 2068 0.3577 815840
0.3573 18.048 2256 0.3503 889584
0.3381 19.552 2444 0.3558 964576
0.3371 21.056 2632 0.3589 1038032
0.3279 22.56 2820 0.3565 1112112
0.3365 24.064 3008 0.3550 1186496
0.347 25.568 3196 0.3581 1261104
0.342 27.072 3384 0.3601 1336384
0.3626 28.576 3572 0.3537 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760443873

Adapter
(2124)
this model

Evaluation results