train_wic_1753094168

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2399
  • Num Input Tokens Seen: 4213808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4151 0.5 611 0.3295 210240
0.2389 1.0 1222 0.2997 421528
0.2716 1.5 1833 0.2854 632632
0.1819 2.0 2444 0.2773 843368
0.3153 2.5 3055 0.2658 1054024
0.252 3.0 3666 0.2634 1264408
0.2181 3.5 4277 0.2576 1475000
0.2114 4.0 4888 0.2570 1685768
0.1929 4.5 5499 0.2561 1895752
0.3225 5.0 6110 0.2448 2106968
0.3734 5.5 6721 0.2439 2318136
0.2293 6.0 7332 0.2432 2528648
0.2122 6.5 7943 0.2424 2739720
0.3024 7.0 8554 0.2423 2949592
0.3562 7.5 9165 0.2420 3160056
0.2596 8.0 9776 0.2408 3371056
0.2664 8.5 10387 0.2399 3581616
0.2097 9.0 10998 0.2412 3792672
0.283 9.5 11609 0.2402 4003136
0.1575 10.0 12220 0.2404 4213808

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1753094168

Adapter
(2394)
this model