train_gsm8k_789_1760637941

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 0.5283
Num Input Tokens Seen: 34722248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.9541	1.0	1682	0.9698	1739480
0.5719	2.0	3364	0.6386	3478568
0.584	3.0	5046	0.5922	5217760
0.4825	4.0	6728	0.5725	6949888
0.536	5.0	8410	0.5607	8687904
0.6147	6.0	10092	0.5523	10421288
0.4946	7.0	11774	0.5463	12155264
0.5933	8.0	13456	0.5417	13889536
0.6213	9.0	15138	0.5383	15631248
0.5636	10.0	16820	0.5357	17370104
0.5903	11.0	18502	0.5336	19100344
0.5506	12.0	20184	0.5319	20834120
0.4607	13.0	21866	0.5308	22566752
0.4761	14.0	23548	0.5298	24305592
0.4947	15.0	25230	0.5293	26037952
0.4854	16.0	26912	0.5288	27770056
0.4793	17.0	28594	0.5285	29506864
0.4761	18.0	30276	0.5284	31245432
0.4238	19.0	31958	0.5283	32980080
0.5847	20.0	33640	0.5283	34722248

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_789_1760637941

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2401)

this model