train_cb_42_1757595249

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.233	1.0	113	0.4873	31088
0.642	2.0	226	0.4480	61872
0.0744	3.0	339	0.3388	93016
0.3811	4.0	452	0.3027	124056
0.2857	5.0	565	0.2438	155240
0.319	6.0	678	0.2082	185984
0.0015	7.0	791	0.3063	217192
0.0786	8.0	904	0.1092	248456
0.1621	9.0	1017	0.2128	279744
0.0004	10.0	1130	0.2852	310888
0.0001	11.0	1243	0.1720	341832
0.0001	12.0	1356	0.1684	372952
0.0001	13.0	1469	0.1634	403768
0.0	14.0	1582	0.1620	434704
0.0	15.0	1695	0.1667	466016
0.0	16.0	1808	0.1622	497200
0.0001	17.0	1921	0.1631	528320
0.0	18.0	2034	0.1550	559408
0.0	19.0	2147	0.1626	590544
0.0	20.0	2260	0.1635	621640

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model