train_wic_789_1760637920

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3007
  • Num Input Tokens Seen: 8431032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0713 1.0 1222 0.3007 421768
0.1869 2.0 2444 0.3900 843296
0.3132 3.0 3666 0.3101 1265072
0.0152 4.0 4888 0.3829 1687136
0.0033 5.0 6110 0.5815 2108680
0.0003 6.0 7332 0.6520 2530168
0.001 7.0 8554 0.5526 2951208
0.0002 8.0 9776 0.7611 3372504
0.0001 9.0 10998 0.7282 3793768
0.0 10.0 12220 1.1743 4214928
0.0001 11.0 13442 0.8818 4636520
0.0 12.0 14664 1.2020 5057560
0.0 13.0 15886 0.9998 5479248
0.0 14.0 17108 1.3617 5901056
0.0 15.0 18330 1.4182 6323016
0.0 16.0 19552 1.4640 6744792
0.0 17.0 20774 1.4988 7165960
0.0 18.0 21996 1.5350 7587872
0.0 19.0 23218 1.5408 8009040
0.0 20.0 24440 1.5400 8431032

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_789_1760637920

Adapter
(2117)
this model

Evaluation results