train_svamp_123_1760637657

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.0788
Num Input Tokens Seen: 1433472

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1046	1.0	158	0.1303	71776
0.0784	2.0	316	0.1064	143552
0.0319	3.0	474	0.0807	215312
0.0821	4.0	632	0.0788	286736
0.0388	5.0	790	0.0813	358416
0.0481	6.0	948	0.0975	430224
0.0234	7.0	1106	0.0892	501728
0.0072	8.0	1264	0.1037	573328
0.0281	9.0	1422	0.1386	645088
0.0011	10.0	1580	0.1195	716976
0.0096	11.0	1738	0.1209	788752
0.0017	12.0	1896	0.1246	860352
0.0013	13.0	2054	0.1284	931872
0.0011	14.0	2212	0.1292	1003600
0.0004	15.0	2370	0.1309	1075312
0.0005	16.0	2528	0.1304	1146848
0.0005	17.0	2686	0.1310	1218416
0.0007	18.0	2844	0.1300	1290080
0.0004	19.0	3002	0.1304	1361712
0.0002	20.0	3160	0.1299	1433472

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_svamp_123_1760637657

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model