train_svamp_42_1760637541

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.4530
Num Input Tokens Seen: 1273056

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5586	2.0	280	0.6296	127488
0.6293	4.0	560	0.5659	254720
0.4602	6.0	840	0.5153	382048
0.4131	8.0	1120	0.4834	509248
0.3323	10.0	1400	0.4074	636704
0.2269	12.0	1680	0.3795	764000
0.2116	14.0	1960	0.3973	891488
0.1569	16.0	2240	0.4267	1018592
0.1114	18.0	2520	0.4509	1145632
0.1902	20.0	2800	0.4530	1273056

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_svamp_42_1760637541

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2397)

this model