train_cb_789_1757596126

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2828	1.0	113	1.1051	30520
0.1389	2.0	226	0.4152	61312
0.5868	3.0	339	0.2018	92192
0.3556	4.0	452	0.2638	122752
0.1219	5.0	565	0.3085	153112
0.0153	6.0	678	0.6845	183568
0.3057	7.0	791	0.6521	214352
0.0119	8.0	904	0.5289	245208
0.0001	9.0	1017	0.5546	275632
0.0004	10.0	1130	0.6658	306152
0.0	11.0	1243	0.6691	336688
0.0	12.0	1356	0.6779	367392
0.0001	13.0	1469	0.6813	398224
0.0001	14.0	1582	0.6873	428448
0.0	15.0	1695	0.6911	459320
0.0	16.0	1808	0.6964	489768
0.0	17.0	1921	0.6963	520440
0.0	18.0	2034	0.6961	551464
0.0001	19.0	2147	0.6963	582408
0.0001	20.0	2260	0.6940	612968

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model