train_cb_42_1767875486

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4795	0.5044	57	0.2893	15552
0.3258	1.0088	114	0.1449	31616
0.0192	1.5133	171	0.1440	46416
0.4173	2.0177	228	0.1411	62960
0.1101	2.5221	285	0.1376	78816
1.0026	3.0265	342	0.1310	94776
0.7717	3.5310	399	0.1266	110632
0.4419	4.0354	456	0.1324	125984
0.0126	4.5398	513	0.1156	140528
0.0181	5.0442	570	0.1495	157360
0.4873	5.5487	627	0.1534	173696
0.0234	6.0531	684	0.1282	188944
0.011	6.5575	741	0.1217	205552
0.0517	7.0619	798	0.1336	220952
0.0033	7.5664	855	0.1375	236872
0.3956	8.0708	912	0.1351	252408
0.0014	8.5752	969	0.1361	268296
0.0006	9.0796	1026	0.1273	284224
0.0706	9.5841	1083	0.1289	300048

Base model

Adapter

this model