train_cb_1757081468

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3962	0.5044	57	0.5438	17136
0.5389	1.0088	114	1.2906	32376
0.4136	1.5133	171	0.9869	48728
0.322	2.0177	228	0.2219	64040
0.216	2.5221	285	0.3634	79784
0.0529	3.0265	342	0.3681	96200
0.095	3.5310	399	0.3086	112440
0.1869	4.0354	456	0.2328	128712
0.0598	4.5398	513	0.2687	143944
0.2561	5.0442	570	0.2723	160016
0.0703	5.5487	627	0.3429	176688
0.016	6.0531	684	0.2530	192272
0.0193	6.5575	741	0.4542	208944
0.0018	7.0619	798	0.2367	224288
0.0013	7.5664	855	0.2155	239840
0.0003	8.0708	912	0.2102	255984
0.0015	8.5752	969	0.1897	272064
0.0008	9.0796	1026	0.1867	287928
0.0005	9.5841	1083	0.1829	303800

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model