model_hh_usp3_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.01	4.0	100	1.2916	-0.4582	-4.3086	0.6700	3.8504	-117.8148	-113.0015	-0.2184	-0.2363
0.0779	8.0	200	2.2220	-3.5887	-8.9487	0.6700	5.3600	-122.9704	-116.4798	-0.6463	-0.6426
0.0002	12.0	300	2.6768	-2.9215	-9.1033	0.6700	6.1818	-123.1422	-115.7384	-0.5538	-0.4825
0.0	16.0	400	3.0879	-8.2794	-15.6271	0.6700	7.3476	-130.3908	-121.6917	-0.6205	-0.5443
0.0	20.0	500	3.0933	-8.2829	-15.6299	0.6700	7.3470	-130.3939	-121.6956	-0.6209	-0.5444
0.0	24.0	600	3.0984	-8.2550	-15.6140	0.6800	7.3590	-130.3763	-121.6645	-0.6208	-0.5443
0.0	28.0	700	3.0852	-8.2794	-15.5895	0.6800	7.3102	-130.3491	-121.6916	-0.6204	-0.5440
0.0	32.0	800	3.0838	-8.2687	-15.6392	0.6700	7.3705	-130.4043	-121.6798	-0.6212	-0.5448
0.0	36.0	900	3.0836	-8.2681	-15.6105	0.6700	7.3424	-130.3724	-121.6791	-0.6211	-0.5444
0.0	40.0	1000	3.1160	-8.2855	-15.5942	0.6700	7.3087	-130.3543	-121.6985	-0.6216	-0.5451

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model