model_usp4_dpo1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.098	2.67	100	0.9024	-5.2759	-6.1963	0.6200	0.9203	-172.1640	-163.4645	-1.3943	-1.4049
0.014	5.33	200	1.0817	-5.6325	-6.8881	0.5900	1.2556	-179.0825	-167.0302	-1.3979	-1.4111
0.0002	8.0	300	1.2922	-11.0538	-12.8527	0.5700	1.7989	-238.7282	-221.2436	-1.1960	-1.2430
0.0001	10.67	400	1.2957	-11.1287	-12.9674	0.5700	1.8388	-239.8755	-221.9918	-1.1895	-1.2369
0.0001	13.33	500	1.3067	-11.1696	-13.0195	0.5700	1.8499	-240.3959	-222.4008	-1.1866	-1.2350
0.0001	16.0	600	1.3094	-11.2106	-13.0741	0.5700	1.8635	-240.9421	-222.8107	-1.1833	-1.2314
0.0001	18.67	700	1.3114	-11.2339	-13.0993	0.5700	1.8654	-241.1942	-223.0445	-1.1811	-1.2298
0.0001	21.33	800	1.3091	-11.2358	-13.1096	0.5700	1.8738	-241.2972	-223.0631	-1.1808	-1.2294
0.0001	24.0	900	1.3126	-11.2442	-13.1117	0.5700	1.8676	-241.3186	-223.1469	-1.1810	-1.2294
0.0001	26.67	1000	1.3096	-11.2358	-13.1040	0.5700	1.8682	-241.2410	-223.0633	-1.1809	-1.2295

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model