train_svamp_789_1760637886

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6550
  • Num Input Tokens Seen: 1271360

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5975 2.0 280 0.6352 126944
0.658 4.0 560 0.5462 253696
0.3964 6.0 840 0.4913 380768
0.3941 8.0 1120 0.5003 508032
0.2802 10.0 1400 0.5018 635616
0.1987 12.0 1680 0.5148 762944
0.1323 14.0 1960 0.5540 889952
0.1726 16.0 2240 0.6144 1016736
0.1144 18.0 2520 0.6350 1144352
0.1615 20.0 2800 0.6550 1271360

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_789_1760637886

Adapter
(2117)
this model

Evaluation results