model_hh_shp4_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0234	4.0	100	1.9125	1.9279	1.7933	0.6000	0.1345	-212.4782	-237.2473	-0.4999	-0.5089
0.0617	8.0	200	3.6324	1.3369	2.0986	0.5100	-0.7617	-212.1390	-237.9039	-0.8276	-0.7974
0.013	12.0	300	4.9916	-0.1751	0.9862	0.5100	-1.1614	-213.3750	-239.5840	-0.6475	-0.6290
0.0	16.0	400	4.9706	-8.6036	-8.4956	0.5200	-0.1080	-223.9103	-248.9490	-0.7141	-0.6786
0.0	20.0	500	4.9635	-8.6143	-8.5440	0.5	-0.0704	-223.9641	-248.9609	-0.7130	-0.6773
0.0	24.0	600	4.9862	-8.5996	-8.4640	0.5100	-0.1356	-223.8752	-248.9445	-0.7137	-0.6781
0.0	28.0	700	4.9701	-8.6180	-8.5034	0.5	-0.1147	-223.9190	-248.9650	-0.7143	-0.6782
0.0	32.0	800	4.9508	-8.5621	-8.5009	0.5100	-0.0612	-223.9163	-248.9029	-0.7140	-0.6782
0.0	36.0	900	4.9729	-8.6333	-8.5082	0.5	-0.1252	-223.9243	-248.9819	-0.7143	-0.6783
0.0	40.0	1000	4.9693	-8.6216	-8.5158	0.5100	-0.1058	-223.9328	-248.9690	-0.7143	-0.6784

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model