train_svamp_1757340272

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6840
  • Num Input Tokens Seen: 704272

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.129 0.5 79 0.2475 35296
0.0508 1.0 158 0.2165 70400
0.0964 1.5 237 0.2150 106208
0.2175 2.0 316 0.1600 140736
0.083 2.5 395 0.1529 176064
0.0421 3.0 474 0.1637 211024
0.0575 3.5 553 0.1372 246128
0.0863 4.0 632 0.1360 281616
0.1177 4.5 711 0.1462 316976
0.0249 5.0 790 0.1455 352256
0.0291 5.5 869 0.1452 387360
0.0293 6.0 948 0.1715 422464
0.0127 6.5 1027 0.1800 457760
0.0053 7.0 1106 0.1682 492912
0.0105 7.5 1185 0.2050 528336
0.0025 8.0 1264 0.2022 563600
0.0035 8.5 1343 0.2209 598992
0.0519 9.0 1422 0.2223 633984
0.0023 9.5 1501 0.2223 669152
0.0042 10.0 1580 0.2244 704272

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_1757340272

Adapter
(2187)
this model