train_cb_1757340264

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4167	1.0	113	0.2356	30240
0.6586	2.0	226	0.3422	61600
0.4559	3.0	339	0.1848	92552
0.3953	4.0	452	0.2929	123976
0.0964	5.0	565	0.2887	155224
0.1943	6.0	678	0.2239	186368
0.0136	7.0	791	0.1439	217280
0.0123	8.0	904	0.1020	248064
0.0001	9.0	1017	0.2388	278576
0.0011	10.0	1130	0.1703	309584
0.0001	11.0	1243	0.2147	340752
0.0	12.0	1356	0.2641	372240
0.0	13.0	1469	0.2673	402976
0.0001	14.0	1582	0.2684	433800
0.0	15.0	1695	0.2671	465096
0.0	16.0	1808	0.2721	496184
0.0	17.0	1921	0.2717	527400
0.0	18.0	2034	0.2720	558656
0.0	19.0	2147	0.2725	589928
0.0	20.0	2260	0.2709	621040

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model