train_svamp_456_1757596108

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1483
  • Num Input Tokens Seen: 1352544

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5567 1.0 315 0.6299 67504
0.2481 2.0 630 0.2979 135312
0.0177 3.0 945 0.1330 202928
0.0472 4.0 1260 0.0844 270592
0.022 5.0 1575 0.0908 338048
0.0583 6.0 1890 0.0926 405584
0.0008 7.0 2205 0.1284 473216
0.0184 8.0 2520 0.1032 541040
0.0002 9.0 2835 0.1117 608592
0.0 10.0 3150 0.1170 676256
0.0 11.0 3465 0.1389 743840
0.0 12.0 3780 0.1512 811616
0.0 13.0 4095 0.1398 879184
0.0001 14.0 4410 0.1442 947024
0.0 15.0 4725 0.1473 1014704
0.0 16.0 5040 0.1476 1082128
0.0 17.0 5355 0.1478 1149696
0.0 18.0 5670 0.1479 1217440
0.0 19.0 5985 0.1486 1284960
0.0 20.0 6300 0.1483 1352544

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_456_1757596108

Adapter
(2405)
this model