model_usp2_dpo9

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.1118	2.67	100	1.5049	0.5063	-2.6419	0.6100	3.1482	-112.1433	-108.1832	-0.0840	-0.0663
0.0485	5.33	200	3.9547	-16.7119	-24.8420	0.6600	8.1301	-136.8102	-127.3146	-0.4936	-0.4252
0.0933	8.0	300	2.7780	6.4777	2.8373	0.6200	3.6404	-106.0553	-101.5483	0.0824	0.1237
0.0001	10.67	400	3.9997	-24.6430	-30.7416	0.6600	6.0986	-143.3652	-136.1268	-0.5838	-0.5341
0.0	13.33	500	3.0680	-16.6486	-22.4522	0.6900	5.8036	-134.1547	-127.2442	-0.3725	-0.3021
0.0	16.0	600	3.0371	-16.6460	-22.4827	0.6900	5.8367	-134.1887	-127.2413	-0.3725	-0.3022
0.0	18.67	700	3.0540	-16.6424	-22.4815	0.6900	5.8391	-134.1873	-127.2373	-0.3728	-0.3028
0.0	21.33	800	3.0298	-16.6187	-22.4938	0.6900	5.8750	-134.2010	-127.2110	-0.3731	-0.3028
0.0	24.0	900	3.0554	-16.6241	-22.4802	0.6900	5.8561	-134.1858	-127.2169	-0.3725	-0.3027
0.0	26.67	1000	3.0422	-16.6521	-22.4980	0.6900	5.8460	-134.2057	-127.2480	-0.3729	-0.3027

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model