train_gsm8k_456_1760637826

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the gsm8k dataset. It achieves the following results on the evaluation set:

Loss: 1.3670
Num Input Tokens Seen: 34715672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.4368	1.0	1682	1.3780	1733152
1.2597	2.0	3364	1.3700	3469824
1.4624	3.0	5046	1.3677	5206688
1.3455	4.0	6728	1.3674	6940592
1.2476	5.0	8410	1.3671	8670720
1.4678	6.0	10092	1.3673	10405952
1.2484	7.0	11774	1.3674	12141496
1.2448	8.0	13456	1.3672	13881648
1.3418	9.0	15138	1.3672	15618544
1.3421	10.0	16820	1.3670	17357848
1.3243	11.0	18502	1.3672	19092400
1.5747	12.0	20184	1.3674	20828128
1.3693	13.0	21866	1.3671	22562912
1.3084	14.0	23548	1.3673	24304376
1.3728	15.0	25230	1.3676	26038152
1.2376	16.0	26912	1.3670	27771064
1.3527	17.0	28594	1.3671	29508400
1.3263	18.0	30276	1.3671	31244288
1.3912	19.0	31958	1.3671	32976960
1.3835	20.0	33640	1.3671	34715672

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_gsm8k_456_1760637826

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2403)

this model