train_wsc_42_1760359134

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3484
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.415 1.504 188 0.4330 73872
0.3796 3.008 376 0.4075 148192
0.3429 4.5120 564 0.3568 221984
0.3465 6.016 752 0.3773 295616
0.3674 7.52 940 0.4207 370688
0.352 9.024 1128 0.3497 444448
0.3545 10.528 1316 0.3616 519088
0.3804 12.032 1504 0.3539 592272
0.3496 13.536 1692 0.3573 667952
0.3196 15.04 1880 0.3484 741072
0.3604 16.544 2068 0.3551 815840
0.3709 18.048 2256 0.3631 889584
0.3366 19.552 2444 0.3566 964576
0.3348 21.056 2632 0.3572 1038032
0.3218 22.56 2820 0.3534 1112112
0.3363 24.064 3008 0.3571 1186496
0.3564 25.568 3196 0.3548 1261104
0.3453 27.072 3384 0.3564 1336384
0.3606 28.576 3572 0.3562 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760359134

Adapter
(2106)
this model

Evaluation results