train_wsc_42_1760366705

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3526
  • Num Input Tokens Seen: 1481040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3796 1.504 188 0.3541 73872
0.396 3.008 376 0.3562 148192
0.3494 4.5120 564 0.3525 221984
0.3352 6.016 752 0.3456 295616
0.3512 7.52 940 0.3530 370688
0.3431 9.024 1128 0.3536 444448
0.3414 10.528 1316 0.3470 519088
0.3497 12.032 1504 0.3503 592272
0.3491 13.536 1692 0.3479 667952
0.327 15.04 1880 0.3459 741072
0.3528 16.544 2068 0.3497 815840
0.3509 18.048 2256 0.3495 889584
0.3418 19.552 2444 0.3467 964576
0.3447 21.056 2632 0.3517 1038032
0.3357 22.56 2820 0.3474 1112112
0.3435 24.064 3008 0.3512 1186496
0.3516 25.568 3196 0.3487 1261104
0.3483 27.072 3384 0.3479 1336384
0.3466 28.576 3572 0.3521 1410480

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760366705

Adapter
(2107)
this model

Evaluation results