train_siqa_456_1760637832

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5714
  • Num Input Tokens Seen: 60272064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4459 1.0 7518 0.5798 3015336
0.62 2.0 15036 0.5762 6029736
0.2437 3.0 22554 0.5730 9044064
0.6211 4.0 30072 0.5736 12056056
0.7361 5.0 37590 0.5734 15070152
0.672 6.0 45108 0.5748 18083976
0.5897 7.0 52626 0.5733 21097056
0.6191 8.0 60144 0.5744 24109664
0.5043 9.0 67662 0.5741 27122784
1.4349 10.0 75180 0.5727 30139392
1.1223 11.0 82698 0.5727 33151800
0.4852 12.0 90216 0.5714 36165976
0.0319 13.0 97734 0.5722 39180248
0.3707 14.0 105252 0.5737 42193928
0.7959 15.0 112770 0.5732 45207272
0.5074 16.0 120288 0.5743 48219232
0.7606 17.0 127806 0.5750 51231624
0.9033 18.0 135324 0.5750 54245832
0.403 19.0 142842 0.5750 57258952
0.6253 20.0 150360 0.5750 60272064

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_456_1760637832

Adapter
(2105)
this model

Evaluation results