train_siqa_42_1760637601

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1848
  • Num Input Tokens Seen: 60302568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2173 1.0 7518 0.1936 3016248
0.1537 2.0 15036 0.1848 6032368
0.0881 3.0 22554 0.1877 9049000
0.2016 4.0 30072 0.3158 12063104
0.0187 5.0 37590 0.3972 15078392
0.0003 6.0 45108 0.4899 18094200
0.0734 7.0 52626 0.5379 21109936
0.0016 8.0 60144 0.4953 24124456
0.0007 9.0 67662 0.5203 27139488
0.0 10.0 75180 0.7133 30155824
0.1875 11.0 82698 0.7045 33169800
0.0 12.0 90216 0.7003 36184296
0.0 13.0 97734 0.7969 39199224
0.0 14.0 105252 0.8078 42213984
0.0 15.0 112770 0.7945 45227616
0.0 16.0 120288 0.9663 48242336
0.0 17.0 127806 0.9757 51258152
0.0 18.0 135324 1.0241 54272896
0.0 19.0 142842 1.0365 57288368
0.0 20.0 150360 1.0399 60302568

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_42_1760637601

Adapter
(2101)
this model

Evaluation results