model_usp4_dpo9

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0474	2.67	100	1.4973	-7.5502	-10.3869	0.6800	2.8366	-121.3912	-121.4387	-0.3436	-0.3419
0.0519	5.33	200	2.4820	-8.4480	-12.1331	0.6800	3.6850	-123.3315	-122.4363	-0.4441	-0.4246
0.042	8.0	300	2.9082	-4.2282	-9.0990	0.6700	4.8708	-119.9602	-117.7476	-0.3275	-0.2971
0.0	10.67	400	3.0274	-8.7246	-14.6605	0.6900	5.9359	-126.1397	-122.7436	-0.5266	-0.4935
0.0	13.33	500	3.0135	-8.7321	-14.6991	0.6900	5.9670	-126.1826	-122.7520	-0.5276	-0.4944
0.0	16.0	600	3.0025	-8.7128	-14.6671	0.7000	5.9542	-126.1470	-122.7305	-0.5289	-0.4956
0.0	18.67	700	3.0086	-8.7343	-14.6314	0.6900	5.8971	-126.1074	-122.7544	-0.5275	-0.4944
0.0	21.33	800	3.0003	-8.7154	-14.6723	0.6800	5.9569	-126.1529	-122.7334	-0.5277	-0.4945
0.0	24.0	900	3.0039	-8.7302	-14.6820	0.6900	5.9518	-126.1636	-122.7498	-0.5274	-0.4944
0.0	26.67	1000	3.0245	-8.7593	-14.6879	0.6800	5.9286	-126.1702	-122.7822	-0.5270	-0.4937

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model