train_mrpc_42_1774791060

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2006	0.2518	104	0.2291	89600
0.1908	0.5036	208	0.1950	178688
0.1721	0.7554	312	0.1814	267968
0.1644	1.0073	416	0.1772	357488
0.1866	1.2591	520	0.1557	446896
0.1538	1.5109	624	0.1509	536176
0.1846	1.7627	728	0.1462	626992
0.181	2.0145	832	0.1442	716344
0.0969	2.2663	936	0.1435	806712
0.1447	2.5182	1040	0.1429	895736
0.0919	2.7700	1144	0.1439	985592
0.1187	3.0218	1248	0.1343	1074624
0.1757	3.2736	1352	0.1521	1164544
0.1714	3.5254	1456	0.1346	1253248
0.1512	3.7772	1560	0.1374	1344000
0.0868	4.0291	1664	0.1423	1432880
0.1456	4.2809	1768	0.1339	1522544
0.1223	4.5327	1872	0.1340	1611760
0.1404	4.7845	1976	0.1332	1702832

Base model

Adapter

(599)

this model