train_siqa_42_1760637600

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3242
  • Num Input Tokens Seen: 60302568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5513 1.0 7518 0.5516 3016248
0.3912 2.0 15036 0.4717 6032368
0.2175 3.0 22554 0.1947 9049000
0.249 4.0 30072 0.1910 12063104
0.1141 5.0 37590 0.1794 15078392
0.1228 6.0 45108 0.1795 18094200
0.2971 7.0 52626 0.1823 21109936
0.1088 8.0 60144 0.1789 24124456
0.1078 9.0 67662 0.1822 27139488
0.092 10.0 75180 0.1864 30155824
0.0864 11.0 82698 0.1988 33169800
0.1477 12.0 90216 0.2067 36184296
0.0384 13.0 97734 0.2242 39199224
0.0201 14.0 105252 0.2517 42213984
0.0229 15.0 112770 0.2795 45227616
0.0055 16.0 120288 0.3139 48242336
0.0268 17.0 127806 0.3421 51258152
0.0525 18.0 135324 0.3645 54272896
0.0094 19.0 142842 0.3786 57288368
0.0056 20.0 150360 0.3844 60302568

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_42_1760637600

Adapter
(2101)
this model

Evaluation results