train_cb_101112_1760637984

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.2005	1.0	57	1.1137	36112
0.9465	2.0	114	1.1054	71552
1.1672	3.0	171	1.0676	108088
1.0103	4.0	228	1.0539	144720
0.9606	5.0	285	1.0277	181120
0.8504	6.0	342	1.0145	217128
0.956	7.0	399	1.0001	253536
0.9733	8.0	456	1.0024	290112
0.9202	9.0	513	0.9897	325872
0.9889	10.0	570	0.9847	361920
0.9169	11.0	627	0.9753	398432
0.8499	12.0	684	0.9816	435536
0.9133	13.0	741	0.9839	471520
0.9967	14.0	798	0.9842	507256
1.1597	15.0	855	0.9895	543064
1.075	16.0	912	0.9756	579704
1.0447	17.0	969	0.9865	615960
0.7836	18.0	1026	0.9890	652368
0.7811	19.0	1083	0.9975	687976
0.9389	20.0	1140	0.9975	723584

Base model

Adapter

this model