train_wic_789_1760637921

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4803
  • Num Input Tokens Seen: 8431032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4524 1.0 1222 0.5212 421768
0.7725 2.0 2444 0.4988 843296
0.5933 3.0 3666 0.4898 1265072
0.7239 4.0 4888 0.4816 1687136
0.3341 5.0 6110 0.4887 2108680
0.7349 6.0 7332 0.4809 2530168
0.3534 7.0 8554 0.4803 2951208
0.5553 8.0 9776 0.4807 3372504
0.4469 9.0 10998 0.4881 3793768
0.3666 10.0 12220 0.4859 4214928
0.5461 11.0 13442 0.4833 4636520
0.3518 12.0 14664 0.4861 5057560
0.5114 13.0 15886 0.4805 5479248
0.3864 14.0 17108 0.4860 5901056
0.5046 15.0 18330 0.4898 6323016
0.4585 16.0 19552 0.4831 6744792
0.4453 17.0 20774 0.4874 7165960
0.5392 18.0 21996 0.4807 7587872
0.5631 19.0 23218 0.4824 8009040
0.383 20.0 24440 0.4824 8431032

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_789_1760637921

Adapter
(2403)
this model