train_wsc_101112_1760637994

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4448
  • Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.359 1.0 125 0.3505 48944
0.3577 2.0 250 0.3474 98080
0.3573 3.0 375 0.4423 146624
0.4117 4.0 500 0.3616 196192
0.3537 5.0 625 0.3593 245216
0.3432 6.0 750 0.3444 294128
0.359 7.0 875 0.3507 342416
0.3432 8.0 1000 0.3508 391552
0.3542 9.0 1125 0.3508 440848
0.3538 10.0 1250 0.3524 488816
0.343 11.0 1375 0.3504 537840
0.343 12.0 1500 0.3526 586624
0.3434 13.0 1625 0.3593 635776
0.3304 14.0 1750 0.3679 684416
0.3452 15.0 1875 0.3741 733488
0.3542 16.0 2000 0.3641 782688
0.349 17.0 2125 0.3981 831792
0.3058 18.0 2250 0.4066 881328
0.3421 19.0 2375 0.4308 930656
0.3058 20.0 2500 0.4317 980224

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760637994

Adapter
(2188)
this model