train_wsc_789_1760637881

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3476
  • Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.941 1.0 125 0.3639 48896
0.3426 2.0 250 0.3880 97760
0.3768 3.0 375 0.3608 146816
0.3598 4.0 500 0.3508 195376
0.348 5.0 625 0.3473 244368
0.3456 6.0 750 0.3425 293088
0.3561 7.0 875 0.3527 341856
0.3491 8.0 1000 0.3487 390544
0.3501 9.0 1125 0.3447 439264
0.3527 10.0 1250 0.3474 487904
0.3496 11.0 1375 0.3489 536960
0.3402 12.0 1500 0.3493 585712
0.3486 13.0 1625 0.3470 634464
0.3431 14.0 1750 0.3461 682800
0.3508 15.0 1875 0.3475 731376
0.3501 16.0 2000 0.3478 779936
0.3494 17.0 2125 0.3522 828880
0.3543 18.0 2250 0.3491 877920
0.3342 19.0 2375 0.3469 927488
0.3416 20.0 2500 0.3469 976592

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760637881

Adapter
(2117)
this model

Evaluation results