train_cb_42_1760637527

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.0884	1.0	57	1.0677	36480
0.7728	2.0	114	0.4761	72112
0.1341	3.0	171	0.1235	108712
0.0575	4.0	228	0.1208	145296
0.0977	5.0	285	0.1131	181408
0.1258	6.0	342	0.1171	217760
0.0496	7.0	399	0.1128	254568
0.3172	8.0	456	0.1145	291000
0.0538	9.0	513	0.1123	327792
0.1347	10.0	570	0.1072	363864
0.1298	11.0	627	0.1069	400104
0.2924	12.0	684	0.1078	436440
0.1389	13.0	741	0.1062	471944
0.0521	14.0	798	0.1061	508424
0.1696	15.0	855	0.1049	545352
0.1024	16.0	912	0.1067	581368
0.0598	17.0	969	0.1048	616776
0.0994	18.0	1026	0.1045	653152
0.0973	19.0	1083	0.1085	689856
0.0289	20.0	1140	0.1031	725992

Base model

Adapter

this model