train_cb_101112_1760637981

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4459	1.0	57	0.2854	36112
0.4375	2.0	114	0.2225	71552
0.3917	3.0	171	0.3163	108088
0.4173	4.0	228	0.1994	144720
0.1829	5.0	285	0.1971	181120
0.158	6.0	342	0.2243	217128
0.2394	7.0	399	0.1963	253536
0.2078	8.0	456	0.1828	290112
0.1586	9.0	513	0.1782	325872
0.1667	10.0	570	0.1894	361920
0.2652	11.0	627	0.1790	398432
0.2114	12.0	684	0.1851	435536
0.1488	13.0	741	0.1975	471520
0.2387	14.0	798	0.2111	507256
0.157	15.0	855	0.2199	543064
0.1676	16.0	912	0.2082	579704
0.1222	17.0	969	0.2069	615960
0.0839	18.0	1026	0.2193	652368
0.112	19.0	1083	0.2239	687976
0.0968	20.0	1140	0.2251	723584

Base model

Adapter

this model