train_svamp_101112_1760637999

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6033
  • Num Input Tokens Seen: 1272000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6119 2.0 280 0.7101 127328
0.4197 4.0 560 0.6222 254624
0.5296 6.0 840 0.6052 381760
0.2225 8.0 1120 0.4053 509056
0.2433 10.0 1400 0.4110 636096
0.1727 12.0 1680 0.4295 763200
0.1309 14.0 1960 0.5017 890752
0.0895 16.0 2240 0.5760 1017696
0.1119 18.0 2520 0.6015 1145152
0.0136 20.0 2800 0.6033 1272000

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_101112_1760637999

Adapter
(2119)
this model

Evaluation results