train_math_qa_456_1760637839

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the math_qa dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0762
  • Num Input Tokens Seen: 77891968

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.1839 1.0 6714 1.1326 3900904
1.2836 2.0 13428 1.0997 7795688
1.0965 3.0 20142 1.0861 11690736
1.1173 4.0 26856 1.0804 15583992
1.004 5.0 33570 1.0817 19477680
1.0011 6.0 40284 1.0762 23372072
0.9656 7.0 46998 1.0801 27267240
0.8687 8.0 53712 1.0819 31161216
1.1347 9.0 60426 1.0828 35058040
0.8943 10.0 67140 1.0833 38955336
0.9344 11.0 73854 1.0813 42849552
1.1579 12.0 80568 1.0793 46744544
1.0048 13.0 87282 1.0802 50638504
1.2016 14.0 93996 1.0788 54532704
1.1664 15.0 100710 1.0866 58424776
0.8985 16.0 107424 1.0866 62319120
1.425 17.0 114138 1.0866 66209648
0.9599 18.0 120852 1.0866 70104328
1.5234 19.0 127566 1.0866 73997656
0.9725 20.0 134280 1.0866 77891968

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_math_qa_456_1760637839

Adapter
(2100)
this model

Evaluation results