train_rte_123_1760637673

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2627
  • Num Input Tokens Seen: 6958720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3537 1.0 561 0.2899 348144
0.0515 2.0 1122 0.2784 697760
0.1832 3.0 1683 0.2735 1046680
0.2625 4.0 2244 0.2674 1394776
0.5942 5.0 2805 0.2645 1743216
0.2322 6.0 3366 0.2649 2088384
0.2246 7.0 3927 0.2632 2437304
0.2593 8.0 4488 0.2632 2785744
0.2941 9.0 5049 0.2647 3132040
0.2292 10.0 5610 0.2656 3481336
0.12 11.0 6171 0.2627 3829824
0.4426 12.0 6732 0.2646 4180088
0.4547 13.0 7293 0.2633 4527216
0.2654 14.0 7854 0.2646 4875496
0.3182 15.0 8415 0.2658 5222072
0.2825 16.0 8976 0.2647 5571288
0.222 17.0 9537 0.2627 5918280
0.2607 18.0 10098 0.2649 6268760
0.293 19.0 10659 0.2658 6614344
0.1571 20.0 11220 0.2647 6958720

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_rte_123_1760637673

Adapter
(2143)
this model

Evaluation results