train_gsm8k_42_1760637596

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 1.3668
Num Input Tokens Seen: 34797032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.359	1.0	1682	1.3778	1738904
1.3101	2.0	3364	1.3702	3481872
1.4164	3.0	5046	1.3674	5222160
1.3298	4.0	6728	1.3674	6964040
1.4647	5.0	8410	1.3673	8703920
1.4088	6.0	10092	1.3674	10444208
1.247	7.0	11774	1.3677	12184024
1.4788	8.0	13456	1.3670	13925976
1.4338	9.0	15138	1.3675	15667472
1.2503	10.0	16820	1.3668	17407120
1.398	11.0	18502	1.3673	19145240
1.2493	12.0	20184	1.3671	20882704
1.4611	13.0	21866	1.3674	22623936
1.3645	14.0	23548	1.3670	24361736
1.511	15.0	25230	1.3672	26106632
1.4059	16.0	26912	1.3672	27845256
1.3993	17.0	28594	1.3672	29579704
1.4476	18.0	30276	1.3672	31322600
1.3274	19.0	31958	1.3672	33057576
1.5706	20.0	33640	1.3672	34797032

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_42_1760637596

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model