train_wsc_789_1760637884

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5816
  • Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4207 1.0 125 0.6380 48896
0.5887 2.0 250 0.6308 97760
0.9575 3.0 375 0.6143 146816
0.5472 4.0 500 0.6036 195376
0.5809 5.0 625 0.6037 244368
0.761 6.0 750 0.5903 293088
0.6357 7.0 875 0.5836 341856
0.4514 8.0 1000 0.5884 390544
0.7934 9.0 1125 0.5911 439264
0.6754 10.0 1250 0.5825 487904
0.3472 11.0 1375 0.5829 536960
0.8128 12.0 1500 0.5905 585712
0.6479 13.0 1625 0.5839 634464
0.3784 14.0 1750 0.5882 682800
0.39 15.0 1875 0.5818 731376
0.5261 16.0 2000 0.5816 779936
0.6438 17.0 2125 0.5899 828880
0.5272 18.0 2250 0.5834 877920
0.5616 19.0 2375 0.5873 927488
0.5713 20.0 2500 0.5868 976592

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760637884

Adapter
(2187)
this model