train_wic_42_1760637577

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4767
  • Num Input Tokens Seen: 8436720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5077 1.0 1222 0.5073 421584
0.6103 2.0 2444 0.4966 843680
0.347 3.0 3666 0.4791 1265424
0.4852 4.0 4888 0.4831 1687040
0.5648 5.0 6110 0.4832 2108784
0.4625 6.0 7332 0.4819 2530720
0.3368 7.0 8554 0.4861 2952752
0.8262 8.0 9776 0.4816 3374632
0.3016 9.0 10998 0.4842 3795800
0.3625 10.0 12220 0.4823 4217096
0.5306 11.0 13442 0.4826 4639384
0.5803 12.0 14664 0.4796 5061608
0.4927 13.0 15886 0.4767 5483544
0.5782 14.0 17108 0.4826 5905728
0.4342 15.0 18330 0.4843 6327600
0.4527 16.0 19552 0.4849 6749312
0.3874 17.0 20774 0.4866 7171032
0.6285 18.0 21996 0.4860 7592384
0.484 19.0 23218 0.4860 8014496
0.5019 20.0 24440 0.4860 8436720

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_42_1760637577

Adapter
(2401)
this model