model_hh_usp2_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0066	4.0	100	2.4563	-5.8278	-7.6948	0.5900	1.8670	-124.3548	-120.4786	-0.0743	-0.0887
0.042	8.0	200	2.8011	-2.8779	-4.7588	0.5300	1.8808	-121.0926	-117.2011	0.3427	0.3247
0.0009	12.0	300	3.2063	-16.1959	-19.1144	0.5500	2.9186	-137.0433	-131.9988	0.1998	0.1756
0.0001	16.0	400	3.1047	-10.1343	-12.7872	0.5800	2.6529	-130.0131	-125.2637	0.1757	0.1437
0.0	20.0	500	3.1359	-10.1980	-12.8447	0.5800	2.6467	-130.0769	-125.3345	0.1736	0.1412
0.0	24.0	600	3.1186	-10.1842	-12.8467	0.5800	2.6625	-130.0792	-125.3191	0.1732	0.1409
0.0	28.0	700	3.1174	-10.2101	-12.8729	0.5900	2.6628	-130.1082	-125.3479	0.1733	0.1406
0.0	32.0	800	3.1257	-10.1973	-12.8683	0.5900	2.6711	-130.1032	-125.3336	0.1735	0.1409
0.0	36.0	900	3.1112	-10.1620	-12.8766	0.5800	2.7147	-130.1124	-125.2944	0.1735	0.1413
0.0	40.0	1000	3.1146	-10.1910	-12.8552	0.5700	2.6642	-130.0886	-125.3267	0.1734	0.1410

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model