train_wsc_42_1760637539

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5938
  • Num Input Tokens Seen: 985952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.8147 1.0 125 0.6524 49104
0.4967 2.0 250 0.6470 98400
0.5281 3.0 375 0.6317 147712
0.7148 4.0 500 0.6330 196320
0.3508 5.0 625 0.6176 245520
0.6425 6.0 750 0.6080 294976
0.6332 7.0 875 0.6174 344320
0.5817 8.0 1000 0.6056 393840
0.7317 9.0 1125 0.6044 443168
0.44 10.0 1250 0.6090 492304
0.436 11.0 1375 0.6092 541504
0.6191 12.0 1500 0.6039 590864
0.7263 13.0 1625 0.6131 640656
0.7171 14.0 1750 0.6160 689776
0.3567 15.0 1875 0.5938 739024
0.7266 16.0 2000 0.6085 788480
0.6884 17.0 2125 0.6029 837600
0.3534 18.0 2250 0.6025 887088
0.5975 19.0 2375 0.6147 936768
0.4996 20.0 2500 0.6148 985952

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760637539

Adapter
(2116)
this model

Evaluation results