train_cb_456_1757596102

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2505	1.0	113	0.4036	31064
0.2659	2.0	226	0.5713	62304
0.2465	3.0	339	0.1494	93232
0.0531	4.0	452	0.1094	124680
0.3213	5.0	565	0.1122	155672
0.1717	6.0	678	0.0656	186688
0.0001	7.0	791	0.3050	217736
0.0003	8.0	904	0.0683	248784
0.0	9.0	1017	0.0873	279688
0.0	10.0	1130	0.0826	310504
0.0001	11.0	1243	0.0818	341152
0.0	12.0	1356	0.0865	371768
0.0	13.0	1469	0.0811	402896
0.0	14.0	1582	0.0840	433768
0.0	15.0	1695	0.0824	464816
0.0	16.0	1808	0.0821	496216
0.0	17.0	1921	0.0823	527360
0.0	18.0	2034	0.0810	558088
0.0	19.0	2147	0.0824	589072
0.0	20.0	2260	0.0808	620240

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model