train_siqa_123_1760637718

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6259
  • Num Input Tokens Seen: 60276872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1771 1.0 7518 0.6354 3014896
0.5044 2.0 15036 0.6309 6029360
0.5528 3.0 22554 0.6298 9042368
1.1872 4.0 30072 0.6283 12055456
0.5423 5.0 37590 0.6301 15068512
0.425 6.0 45108 0.6280 18081672
0.8483 7.0 52626 0.6273 21095960
1.5292 8.0 60144 0.6305 24109392
0.813 9.0 67662 0.6276 27122856
1.7637 10.0 75180 0.6298 30137256
0.399 11.0 82698 0.6302 33151024
0.2931 12.0 90216 0.6291 36165000
0.4369 13.0 97734 0.6285 39178496
0.8953 14.0 105252 0.6303 42193184
0.7673 15.0 112770 0.6303 45206576
0.2509 16.0 120288 0.6296 48220600
0.2142 17.0 127806 0.6308 51235344
0.5036 18.0 135324 0.6259 54249648
0.2148 19.0 142842 0.6288 57262600
1.737 20.0 150360 0.6288 60276872

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_123_1760637718

Adapter
(2129)
this model

Evaluation results