train_svamp_1757340273

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1319
  • Num Input Tokens Seen: 704272

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1528 0.5 79 0.2311 35296
0.0753 1.0 158 0.1515 70400
0.0805 1.5 237 0.1408 106208
0.1368 2.0 316 0.1319 140736
0.038 2.5 395 0.1435 176064
0.0199 3.0 474 0.1467 211024
0.0059 3.5 553 0.2152 246128
0.0396 4.0 632 0.1816 281616
0.0337 4.5 711 0.2312 316976
0.0003 5.0 790 0.2054 352256
0.0005 5.5 869 0.2563 387360
0.0001 6.0 948 0.2300 422464
0.0 6.5 1027 0.2501 457760
0.0001 7.0 1106 0.2568 492912
0.0001 7.5 1185 0.2675 528336
0.0 8.0 1264 0.2667 563600
0.0001 8.5 1343 0.2692 598992
0.0 9.0 1422 0.2690 633984
0.0 9.5 1501 0.2714 669152
0.0001 10.0 1580 0.2698 704272

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_1757340273

Adapter
(2187)
this model