train_wsc_42_1760363944

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6464
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4135 1.504 188 0.3474 73872
0.3674 3.008 376 0.4268 148192
0.3441 4.5120 564 0.3681 221984
0.3382 6.016 752 0.3580 295616
0.3549 7.52 940 0.3638 370688
0.3539 9.024 1128 0.3563 444448
0.3447 10.528 1316 0.3541 519088
0.3618 12.032 1504 0.3523 592272
0.3498 13.536 1692 0.3542 667952
0.3222 15.04 1880 0.3531 741072
0.338 16.544 2068 0.3632 815840
0.3497 18.048 2256 0.3581 889584
0.3471 19.552 2444 0.3708 964576
0.2972 21.056 2632 0.4019 1038032
0.2949 22.56 2820 0.4555 1112112
0.3299 24.064 3008 0.4626 1186496
0.3598 25.568 3196 0.5001 1261104
0.2818 27.072 3384 0.5159 1336384
0.2656 28.576 3572 0.5256 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760363944

Adapter
(2117)
this model

Evaluation results