train_wsc_42_1760637536

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3549
  • Num Input Tokens Seen: 985952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
6.1999 1.0 125 5.0939 49104
0.5007 2.0 250 0.5214 98400
0.4231 3.0 375 0.3933 147712
0.371 4.0 500 0.3615 196320
0.3531 5.0 625 0.4567 245520
0.3797 6.0 750 0.4273 294976
0.3769 7.0 875 0.3809 344320
0.3729 8.0 1000 0.3906 393840
0.3446 9.0 1125 0.3553 443168
0.3702 10.0 1250 0.3585 492304
0.4168 11.0 1375 0.3591 541504
0.3828 12.0 1500 0.3549 590864
0.3636 13.0 1625 0.3585 640656
0.3249 14.0 1750 0.3583 689776
0.3689 15.0 1875 0.3554 739024
0.3544 16.0 2000 0.3621 788480
0.3224 17.0 2125 0.3574 837600
0.3574 18.0 2250 0.3581 887088
0.3582 19.0 2375 0.3589 936768
0.3613 20.0 2500 0.3601 985952

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760637536

Adapter
(2398)
this model