train_wic_101112_1760638033

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2493
  • Num Input Tokens Seen: 8443576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3019 1.0 1222 0.2607 421736
0.3062 2.0 2444 0.2493 844280
0.2214 3.0 3666 0.2933 1266928
0.2227 4.0 4888 0.3356 1689112
0.056 5.0 6110 0.4715 2111392
0.0007 6.0 7332 0.5103 2533592
0.0016 7.0 8554 0.6017 2955304
0.0016 8.0 9776 0.7782 3377216
0.0001 9.0 10998 0.7219 3799208
0.0001 10.0 12220 0.7402 4221160
0.0 11.0 13442 0.8109 4643512
0.0 12.0 14664 0.8759 5066080
0.0 13.0 15886 0.7955 5487840
0.0 14.0 17108 0.9195 5910224
0.0 15.0 18330 1.0363 6332064
0.0 16.0 19552 1.0897 6754408
0.0 17.0 20774 1.1226 7176696
0.0 18.0 21996 1.1441 7598912
0.0 19.0 23218 1.1630 8021160
0.0 20.0 24440 1.1596 8443576

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_101112_1760638033

Adapter
(2405)
this model