train_wic_456_1760637803

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1970
  • Num Input Tokens Seen: 7492784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3472 2.0 2172 0.3455 748208
0.3119 4.0 4344 0.3360 1497968
0.3486 6.0 6516 0.3234 2247488
0.2924 8.0 8688 0.3515 2997264
0.3148 10.0 10860 0.4060 3745952
0.1728 12.0 13032 0.5276 4495360
0.0991 14.0 15204 0.7679 5244688
0.1849 16.0 17376 1.0419 5993648
0.4071 18.0 19548 1.1777 6743280
0.0315 20.0 21720 1.1970 7492784

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_456_1760637803

Adapter
(2116)
this model

Evaluation results