train_wsc_1756729607

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3526
  • Num Input Tokens Seen: 437760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5349 0.5020 125 1.2114 22304
0.4306 1.0040 250 0.4652 44064
0.3766 1.5060 375 0.3868 65808
0.3159 2.0080 500 0.3830 88048
0.3958 2.5100 625 0.3676 109696
0.3521 3.0120 750 0.3507 131872
0.3677 3.5141 875 0.3535 154416
0.3426 4.0161 1000 0.3507 176048
0.3393 4.5181 1125 0.3546 198432
0.3601 5.0201 1250 0.3592 219680
0.3422 5.5221 1375 0.3506 241136
0.3609 6.0241 1500 0.3502 263616
0.3457 6.5261 1625 0.3554 285424
0.315 7.0281 1750 0.3651 307792
0.3149 7.5301 1875 0.3626 329840
0.3441 8.0321 2000 0.3485 351552
0.3574 8.5341 2125 0.3516 373424
0.3673 9.0361 2250 0.3545 395616
0.3419 9.5382 2375 0.3475 417520

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_1756729607

Adapter
(2124)
this model

Evaluation results