train_cb_789_1768397602

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2063	0.5044	57	0.2563	15680
0.0489	1.0088	114	0.1914	30920
0.2735	1.5133	171	0.1920	46408
0.006	2.0177	228	0.1840	62160
0.6382	2.5221	285	0.2559	77600
0.0881	3.0265	342	0.2233	93536
0.0735	3.5310	399	0.2426	109440
0.1571	4.0354	456	0.2381	124576
0.057	4.5398	513	0.2273	140208
0.0043	5.0442	570	0.2536	155688
0.0012	5.5487	627	0.2614	170968
0.1838	6.0531	684	0.2616	186656
0.2536	6.5575	741	0.2580	202560
0.1287	7.0619	798	0.2665	217872
0.0002	7.5664	855	0.2789	233376
0.0006	8.0708	912	0.2810	249272
0.2171	8.5752	969	0.2723	265000
0.0021	9.0796	1026	0.2731	279984
0.0002	9.5841	1083	0.2649	295536

Base model

Adapter

this model