train_wic_123_1760637689

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3416
  • Num Input Tokens Seen: 8429424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3396 1.0 1222 0.3548 421528
0.3385 2.0 2444 0.3526 843368
0.3211 3.0 3666 0.3498 1264408
0.3135 4.0 4888 0.3572 1685768
0.3798 5.0 6110 0.3446 2106968
0.3411 6.0 7332 0.3428 2528648
0.3306 7.0 8554 0.3425 2949592
0.32 8.0 9776 0.3468 3371056
0.3454 9.0 10998 0.3426 3792672
0.3271 10.0 12220 0.3426 4213808
0.3266 11.0 13442 0.3428 4634936
0.331 12.0 14664 0.3434 5056144
0.3425 13.0 15886 0.3424 5477344
0.3324 14.0 17108 0.3427 5898504
0.3515 15.0 18330 0.3421 6320560
0.3282 16.0 19552 0.3426 6741824
0.3291 17.0 20774 0.3428 7163512
0.3477 18.0 21996 0.3422 7585736
0.3292 19.0 23218 0.3416 8007456
0.3436 20.0 24440 0.3431 8429424

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_123_1760637689

Adapter
(2158)
this model

Evaluation results