train_gsm8k_123_1760637711

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 0.4390
Num Input Tokens Seen: 34679720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5581	1.0	1682	0.4952	1738584
0.4307	2.0	3364	0.4664	3477544
0.4728	3.0	5046	0.4474	5217560
0.4713	4.0	6728	0.4390	6947896
0.3124	5.0	8410	0.4438	8680664
0.2684	6.0	10092	0.4589	10414768
0.3172	7.0	11774	0.4816	12148592
0.1967	8.0	13456	0.5213	13883456
0.2271	9.0	15138	0.5696	15613432
0.1185	10.0	16820	0.6338	17345296
0.1106	11.0	18502	0.7360	19082288
0.1396	12.0	20184	0.8254	20812896
0.0534	13.0	21866	0.9418	22544592
0.0455	14.0	23548	1.0541	24280408
0.0221	15.0	25230	1.1591	26007520
0.0214	16.0	26912	1.3018	27741272
0.0325	17.0	28594	1.3551	29470344
0.0091	18.0	30276	1.4204	31206528
0.0114	19.0	31958	1.4704	32940696
0.013	20.0	33640	1.4843	34679720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_123_1760637711

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2402)

this model