train_wic_123_1760637692

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4619
  • Num Input Tokens Seen: 8429424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.319 1.0 1222 0.5033 421528
0.3012 2.0 2444 0.4831 843368
0.6519 3.0 3666 0.4750 1264408
0.3036 4.0 4888 0.4651 1685768
0.4584 5.0 6110 0.4697 2106968
0.4144 6.0 7332 0.4658 2528648
0.5252 7.0 8554 0.4675 2949592
0.3243 8.0 9776 0.4662 3371056
0.4506 9.0 10998 0.4687 3792672
0.5059 10.0 12220 0.4680 4213808
0.3959 11.0 13442 0.4619 4634936
0.4658 12.0 14664 0.4668 5056144
0.2719 13.0 15886 0.4651 5477344
0.387 14.0 17108 0.4630 5898504
0.4383 15.0 18330 0.4668 6320560
0.5965 16.0 19552 0.4665 6741824
0.6455 17.0 20774 0.4636 7163512
0.3475 18.0 21996 0.4631 7585736
0.6931 19.0 23218 0.4675 8007456
0.4062 20.0 24440 0.4681 8429424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_123_1760637692

Adapter
(2394)
this model