train_cb_42_1760637524

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3463	1.0	57	0.6231	36480
0.341	2.0	114	0.3803	72112
0.3295	3.0	171	0.3839	108712
0.1486	4.0	228	0.2464	145296
0.273	5.0	285	0.2429	181408
0.1861	6.0	342	0.2956	217760
0.133	7.0	399	0.2893	254568
0.2981	8.0	456	0.2437	291000
0.1107	9.0	513	0.2237	327792
0.1137	10.0	570	0.2338	363864
0.1901	11.0	627	0.2368	400104
0.2099	12.0	684	0.1995	436440
0.0337	13.0	741	0.1697	471944
0.0331	14.0	798	0.2350	508424
0.0566	15.0	855	0.2077	545352
0.0141	16.0	912	0.1869	581368
0.0086	17.0	969	0.2149	616776
0.0092	18.0	1026	0.2464	653152
0.0052	19.0	1083	0.2407	689856
0.0044	20.0	1140	0.2406	725992

Base model

Adapter

this model