train_gsm8k_456_1760637823

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 0.4582
Num Input Tokens Seen: 34715672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5763	1.0	1682	0.5152	1733152
0.4466	2.0	3364	0.4911	3469824
0.494	3.0	5046	0.4777	5206688
0.49	4.0	6728	0.4710	6940592
0.4609	5.0	8410	0.4642	8670720
0.4278	6.0	10092	0.4620	10405952
0.4217	7.0	11774	0.4605	12141496
0.4437	8.0	13456	0.4582	13881648
0.3743	9.0	15138	0.4598	15618544
0.4087	10.0	16820	0.4619	17357848
0.4177	11.0	18502	0.4599	19092400
0.4703	12.0	20184	0.4680	20828128
0.3937	13.0	21866	0.4674	22562912
0.386	14.0	23548	0.4706	24304376
0.3788	15.0	25230	0.4742	26038152
0.3471	16.0	26912	0.4795	27771064
0.4133	17.0	28594	0.4848	29508400
0.3481	18.0	30276	0.4869	31244288
0.3554	19.0	31958	0.4879	32976960
0.3729	20.0	33640	0.4878	34715672

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_456_1760637823

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2399)

this model