train_gsm8k_123_1760637709

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 0.4466
Num Input Tokens Seen: 34679720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5609	1.0	1682	0.4951	1738584
0.444	2.0	3364	0.4797	3477544
0.5094	3.0	5046	0.4648	5217560
0.5101	4.0	6728	0.4575	6947896
0.4052	5.0	8410	0.4518	8680664
0.3402	6.0	10092	0.4502	10414768
0.4342	7.0	11774	0.4504	12148592
0.3496	8.0	13456	0.4466	13883456
0.4198	9.0	15138	0.4483	15613432
0.3061	10.0	16820	0.4502	17345296
0.3443	11.0	18502	0.4494	19082288
0.4635	12.0	20184	0.4531	20812896
0.3567	13.0	21866	0.4589	22544592
0.3681	14.0	23548	0.4605	24280408
0.3171	15.0	25230	0.4641	26007520
0.3373	16.0	26912	0.4712	27741272
0.3833	17.0	28594	0.4749	29470344
0.3457	18.0	30276	0.4776	31206528
0.3426	19.0	31958	0.4782	32940696
0.3324	20.0	33640	0.4784	34679720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_123_1760637709

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2402)

this model