train_svamp_1757340223

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6403
  • Num Input Tokens Seen: 704688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2909 0.5 79 0.1691 35424
0.261 1.0 158 0.1506 70560
0.2303 1.5 237 0.2170 105728
0.0724 2.0 316 0.1112 140912
0.0929 2.5 395 0.0819 176272
0.057 3.0 474 0.0969 211360
0.0273 3.5 553 0.0756 246912
0.0441 4.0 632 0.0836 281968
0.1135 4.5 711 0.0963 317392
0.063 5.0 790 0.0893 352128
0.0147 5.5 869 0.0760 387744
0.0256 6.0 948 0.0894 422800
0.0658 6.5 1027 0.0748 457968
0.0279 7.0 1106 0.0799 493104
0.0174 7.5 1185 0.0857 528304
0.0218 8.0 1264 0.0818 563936
0.0026 8.5 1343 0.0847 598976
0.0101 9.0 1422 0.0969 634144
0.0323 9.5 1501 0.0983 669632
0.0013 10.0 1580 0.0994 704688

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_1757340223

Adapter
(2187)
this model