train_cb_42_1757596053

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2227	1.0	113	0.6365	31088
0.467	2.0	226	0.3558	61872
0.4767	3.0	339	0.6339	93016
0.5349	4.0	452	0.2697	124056
0.3279	5.0	565	0.1693	155240
0.3209	6.0	678	0.1891	185984
0.0175	7.0	791	0.1852	217192
0.0084	8.0	904	0.1616	248456
0.085	9.0	1017	0.1374	279744
0.0003	10.0	1130	0.1618	310888
0.0001	11.0	1243	0.1426	341832
0.0001	12.0	1356	0.1332	372952
0.0001	13.0	1469	0.1352	403768
0.0	14.0	1582	0.1344	434704
0.0	15.0	1695	0.1344	466016
0.0	16.0	1808	0.1367	497200
0.0001	17.0	1921	0.1336	528320
0.0	18.0	2034	0.1321	559408
0.0	19.0	2147	0.1339	590544
0.0	20.0	2260	0.1383	621640

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model