train_cb_456_1768397594

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6942	0.5044	57	0.3884	15728
0.2738	1.0088	114	0.2075	31592
0.0359	1.5133	171	0.2229	48376
0.2917	2.0177	228	0.2299	63536
0.3753	2.5221	285	0.2244	79744
0.005	3.0265	342	0.2296	94600
0.0457	3.5310	399	0.2253	110776
0.8882	4.0354	456	0.2269	126704
0.0005	4.5398	513	0.2379	142112
0.0351	5.0442	570	0.2386	158152
0.1327	5.5487	627	0.2305	173816
0.0003	6.0531	684	0.2398	189600
0.2059	6.5575	741	0.2376	205776
0.0109	7.0619	798	0.2417	221128
0.0005	7.5664	855	0.2405	237016
0.239	8.0708	912	0.2448	253296
0.2007	8.5752	969	0.2445	268768
0.0066	9.0796	1026	0.2419	284048
0.1443	9.5841	1083	0.2407	299920

Base model

Adapter

this model