train_siqa_123_1760637714

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1770
  • Num Input Tokens Seen: 53571360

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3864 2.0 13364 0.4281 5355648
0.1761 4.0 26728 0.2439 10712224
0.0805 6.0 40092 0.2658 16069856
0.0835 8.0 53456 0.3968 21425728
0.1285 10.0 66820 0.5146 26781984
0.0009 12.0 80184 0.6194 32139680
0.0 14.0 93548 0.8041 37497632
0.1594 16.0 106912 0.9499 42854976
0.0 18.0 120276 1.0988 48213248
0.0 20.0 133640 1.1770 53571360

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_123_1760637714

Adapter
(2105)
this model

Evaluation results