train_gsm8k_789_1760637937

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 0.4627
Num Input Tokens Seen: 34722248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5183	1.0	1682	0.5143	1739480
0.4289	2.0	3364	0.4987	3478568
0.4414	3.0	5046	0.4805	5217760
0.3926	4.0	6728	0.4737	6949888
0.4516	5.0	8410	0.4713	8687904
0.5028	6.0	10092	0.4667	10421288
0.3925	7.0	11774	0.4662	12155264
0.5059	8.0	13456	0.4627	13889536
0.4859	9.0	15138	0.4652	15631248
0.4439	10.0	16820	0.4675	17370104
0.4567	11.0	18502	0.4687	19100344
0.4527	12.0	20184	0.4733	20834120
0.3269	13.0	21866	0.4761	22566752
0.3482	14.0	23548	0.4796	24305592
0.3411	15.0	25230	0.4854	26037952
0.3431	16.0	26912	0.4918	27770056
0.3209	17.0	28594	0.4970	29506864
0.3019	18.0	30276	0.5006	31245432
0.29	19.0	31958	0.5010	32980080
0.4077	20.0	33640	0.5010	34722248

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_gsm8k_789_1760637937

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2402)

this model