train_mnli_1754652132

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2827
  • Num Input Tokens Seen: 347859920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3408 0.5 44179 0.3464 17403808
0.3304 1.0 88358 0.3557 34786008
0.3219 1.5 132537 0.3367 52165240
0.3141 2.0 176716 0.3337 69564424
0.3159 2.5 220895 0.3378 86951080
0.2659 3.0 265074 0.3170 104352808
0.3612 3.5 309253 0.3060 121746504
0.2903 4.0 353432 0.3037 139123792
0.247 4.5 397611 0.2995 156526672
0.2781 5.0 441790 0.2941 173916408
0.337 5.5 485969 0.2924 191309592
0.2202 6.0 530148 0.2921 208701328
0.3108 6.5 574327 0.2903 226098768
0.2501 7.0 618506 0.2863 243493272
0.2648 7.5 662685 0.2848 260881240
0.2872 8.0 706864 0.2836 278276232
0.2503 8.5 751043 0.2834 295687496
0.2881 9.0 795222 0.2829 313062872
0.3381 9.5 839401 0.2827 330444056
0.2779 10.0 883580 0.2828 347859920

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1754652132

Adapter
(2098)
this model

Evaluation results