train_cb_42_1760637525

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.135	1.0	57	0.1434	36480
0.2326	2.0	114	0.1093	72112
0.0139	3.0	171	0.1085	108712
0.0032	4.0	228	0.0949	145296
0.0003	5.0	285	0.0308	181408
0.0189	6.0	342	0.0292	217760
0.0001	7.0	399	0.0629	254568
0.0002	8.0	456	0.0427	291000
0.0	9.0	513	0.0475	327792
0.0	10.0	570	0.0469	363864
0.0	11.0	627	0.0463	400104
0.0	12.0	684	0.0535	436440
0.0	13.0	741	0.0497	471944
0.0	14.0	798	0.0585	508424
0.0	15.0	855	0.0569	545352
0.0	16.0	912	0.0574	581368
0.0	17.0	969	0.0575	616776
0.0	18.0	1026	0.0530	653152
0.0	19.0	1083	0.0508	689856
0.0	20.0	1140	0.0513	725992

Base model

Adapter

this model