train_svamp_42_1760637543

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0827
  • Num Input Tokens Seen: 1433520

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0814 1.0 158 0.4345 71568
0.056 2.0 316 0.0867 143232
0.0493 3.0 474 0.0727 214912
0.107 4.0 632 0.0583 286448
0.0361 5.0 790 0.0556 358176
0.0469 6.0 948 0.0544 429728
0.035 7.0 1106 0.0422 501504
0.0461 8.0 1264 0.0435 573120
0.0055 9.0 1422 0.0514 644944
0.0283 10.0 1580 0.0590 716448
0.0121 11.0 1738 0.0728 788256
0.0041 12.0 1896 0.0705 859808
0.0121 13.0 2054 0.0737 931472
0.0007 14.0 2212 0.0768 1003376
0.0012 15.0 2370 0.0877 1075088
0.0006 16.0 2528 0.0871 1146608
0.0004 17.0 2686 0.0893 1218368
0.0002 18.0 2844 0.0901 1290144
0.0011 19.0 3002 0.0906 1361984
0.0002 20.0 3160 0.0907 1433520

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_42_1760637543

Adapter
(2398)
this model