train_svamp_42_1763998313

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.4159	0.5016	158	1.3880	34720
0.5733	1.0032	316	0.5194	68736
0.2342	1.5048	474	0.2169	103216
0.2228	2.0063	632	0.1882	137648
0.0123	2.5079	790	0.1728	171744
0.0905	3.0095	948	0.1569	206608
0.2056	3.5111	1106	0.1562	240992
0.2048	4.0127	1264	0.1445	275328
0.1339	4.5143	1422	0.1451	309648
0.3122	5.0159	1580	0.1376	344256
0.1349	5.5175	1738	0.1364	379024
0.1058	6.0190	1896	0.1365	413280
0.0976	6.5206	2054	0.1364	447504
0.0612	7.0222	2212	0.1348	482256
0.0088	7.5238	2370	0.1347	516448
0.0615	8.0254	2528	0.1327	550928
0.0695	8.5270	2686	0.1320	585568
0.1507	9.0286	2844	0.1323	619904
0.0546	9.5302	3002	0.1306	654176

Base model

Adapter

(515)

this model