train_wic_123_1760637693

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2284
  • Num Input Tokens Seen: 8429424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2387 1.0 1222 0.3101 421528
0.1879 2.0 2444 0.2852 843368
0.2551 3.0 3666 0.2655 1264408
0.2067 4.0 4888 0.2605 1685768
0.3442 5.0 6110 0.2457 2106968
0.2354 6.0 7332 0.2408 2528648
0.2938 7.0 8554 0.2383 2949592
0.255 8.0 9776 0.2341 3371056
0.1832 9.0 10998 0.2344 3792672
0.1302 10.0 12220 0.2376 4213808
0.1552 11.0 13442 0.2286 4634936
0.265 12.0 14664 0.2315 5056144
0.0872 13.0 15886 0.2300 5477344
0.2469 14.0 17108 0.2316 5898504
0.252 15.0 18330 0.2284 6320560
0.2086 16.0 19552 0.2287 6741824
0.2649 17.0 20774 0.2290 7163512
0.1695 18.0 21996 0.2304 7585736
0.2066 19.0 23218 0.2294 8007456
0.2122 20.0 24440 0.2289 8429424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_123_1760637693

Adapter
(2106)
this model

Evaluation results