train_svamp_1757340226

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1644
  • Num Input Tokens Seen: 704688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
2.5156 0.5 79 2.3507 35424
1.8739 1.0 158 1.8532 70560
1.5382 1.5 237 1.4540 105728
1.1095 2.0 316 1.0771 140912
0.7179 2.5 395 0.7673 176272
0.5649 3.0 474 0.5458 211360
0.3842 3.5 553 0.4058 246912
0.2542 4.0 632 0.3136 281968
0.3399 4.5 711 0.2597 317392
0.3264 5.0 790 0.2248 352128
0.1362 5.5 869 0.2024 387744
0.192 6.0 948 0.1901 422800
0.2016 6.5 1027 0.1822 457968
0.1899 7.0 1106 0.1751 493104
0.1518 7.5 1185 0.1701 528304
0.1354 8.0 1264 0.1666 563936
0.1451 8.5 1343 0.1647 598976
0.1232 9.0 1422 0.1645 634144
0.1587 9.5 1501 0.1644 669632
0.0843 10.0 1580 0.1645 704688

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_1757340226

Adapter
(2187)
this model