train_siqa_42_1760637602

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6349
  • Num Input Tokens Seen: 60302568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.684 1.0 7518 0.6438 3016248
1.192 2.0 15036 0.6380 6032368
0.7962 3.0 22554 0.6366 9049000
0.9903 4.0 30072 0.6367 12063104
0.2041 5.0 37590 0.6357 15078392
0.888 6.0 45108 0.6374 18094200
1.1243 7.0 52626 0.6383 21109936
0.6324 8.0 60144 0.6361 24124456
0.5278 9.0 67662 0.6368 27139488
0.6336 10.0 75180 0.6383 30155824
0.537 11.0 82698 0.6359 33169800
0.8799 12.0 90216 0.6371 36184296
0.5514 13.0 97734 0.6365 39199224
0.692 14.0 105252 0.6349 42213984
0.5045 15.0 112770 0.6369 45227616
0.5041 16.0 120288 0.6361 48242336
0.4227 17.0 127806 0.6382 51258152
0.7302 18.0 135324 0.6359 54272896
0.9792 19.0 142842 0.6359 57288368
0.0778 20.0 150360 0.6359 60302568

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_42_1760637602

Adapter
(2105)
this model

Evaluation results