train_siqa_123_1760637719

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1911
  • Num Input Tokens Seen: 60276872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1373 1.0 7518 0.2637 3014896
0.2412 2.0 15036 0.2274 6029360
0.1523 3.0 22554 0.2107 9042368
0.2919 4.0 30072 0.2019 12055456
0.2635 5.0 37590 0.2002 15068512
0.0598 6.0 45108 0.1932 18081672
0.2969 7.0 52626 0.1939 21095960
0.4337 8.0 60144 0.1911 24109392
0.1858 9.0 67662 0.1913 27122856
0.1867 10.0 75180 0.1913 30137256
0.1281 11.0 82698 0.1937 33151024
0.1212 12.0 90216 0.1929 36165000
0.1881 13.0 97734 0.1946 39178496
0.1631 14.0 105252 0.1961 42193184
0.0963 15.0 112770 0.1957 45206576
0.1305 16.0 120288 0.1962 48220600
0.0682 17.0 127806 0.1968 51235344
0.0764 18.0 135324 0.1967 54249648
0.0502 19.0 142842 0.1966 57262600
0.4309 20.0 150360 0.1965 60276872

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_123_1760637719

Adapter
(2124)
this model

Evaluation results