train_rte_101112_1760638011

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4810
  • Num Input Tokens Seen: 6208288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1369 2.0 996 0.1828 620096
0.165 4.0 1992 0.1808 1243872
0.141 6.0 2988 0.1576 1862496
0.133 8.0 3984 0.1837 2484128
0.1104 10.0 4980 0.2542 3105632
0.0877 12.0 5976 0.3164 3727872
0.0181 14.0 6972 0.3950 4349504
0.0004 16.0 7968 0.4406 4968192
0.0002 18.0 8964 0.4750 5589504
0.0002 20.0 9960 0.4810 6208288

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_rte_101112_1760638011

Adapter
(2188)
this model