train_svamp_42_1763998314

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.7745	0.5	79	1.7625	36256
1.3854	1.0	158	1.3985	71568
1.1939	1.5	237	1.1660	107504
0.8904	2.0	316	0.9381	143232
0.7032	2.5	395	0.7508	178848
0.5788	3.0	474	0.6024	214912
0.4304	3.5	553	0.4853	250784
0.481	4.0	632	0.4035	286448
0.3769	4.5	711	0.3452	322448
0.2554	5.0	790	0.3013	358176
0.3228	5.5	869	0.2763	394336
0.2438	6.0	948	0.2574	429728
0.1587	6.5	1027	0.2453	465376
0.2415	7.0	1106	0.2387	501504
0.2159	7.5	1185	0.2324	537248
0.3976	8.0	1264	0.2304	573120
0.2527	8.5	1343	0.2293	609248
0.2387	9.0	1422	0.2281	644944
0.2243	9.5	1501	0.2281	680880
0.2578	10.0	1580	0.2283	716448

Base model

Adapter

(599)

this model