train_siqa_456_1760637828

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9780
  • Num Input Tokens Seen: 53564288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5466 2.0 13364 0.5509 5356576
0.287 4.0 26728 0.3669 10713440
0.4431 6.0 40092 0.2721 16068320
0.2202 8.0 53456 0.3200 21422464
0.1417 10.0 66820 0.3849 26778080
0.1952 12.0 80184 0.4473 32135328
0.0002 14.0 93548 0.6453 37494592
0.1969 16.0 106912 0.8511 42850528
0.0 18.0 120276 0.9620 48207328
0.0 20.0 133640 0.9780 53564288

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_siqa_456_1760637828

Adapter
(2391)
this model