train_math_qa_1754652175

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the math_qa dataset. It achieves the following results on the evaluation set:

Loss: 1.6091
Num Input Tokens Seen: 38732208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8057	0.5	3357	0.8056	1938656
0.7028	1.0	6714	0.6814	3870688
0.6583	1.5	10071	0.6765	5801696
0.7187	2.0	13428	0.6617	7741288
0.6316	2.5	16785	0.6570	9678632
0.6193	3.0	20142	0.6565	11611120
0.707	3.5	23499	0.6524	13550032
0.6451	4.0	26856	0.6477	15489040
0.6846	4.5	30213	0.6573	17420720
0.7214	5.0	33570	0.6489	19360624
0.4672	5.5	36927	0.6593	21295472
0.6494	6.0	40284	0.6620	23230504
0.7743	6.5	43641	0.6645	25166536
0.6635	7.0	46998	0.6671	27107328
0.4657	7.5	50355	0.6840	29044256
0.5107	8.0	53712	0.6711	30985288
0.5285	8.5	57069	0.6745	32925544
0.465	9.0	60426	0.6800	34856680
0.4761	9.5	63783	0.6832	36790952
0.6976	10.0	67140	0.6838	38732208

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_math_qa_1754652175

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model