model_usp2_dpo1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0632	2.67	100	0.6858	-5.5164	-7.1903	0.6900	1.6739	-179.8054	-162.3683	-0.8170	-0.7196
0.002	5.33	200	0.7615	-6.7178	-9.5839	0.7600	2.8661	-203.7411	-174.3826	-1.0768	-1.0701
0.0001	8.0	300	1.0247	-10.4976	-13.8923	0.7400	3.3948	-246.8256	-212.1801	-0.8995	-0.9506
0.0001	10.67	400	1.0323	-10.6255	-14.0760	0.75	3.4505	-248.6621	-213.4589	-0.8910	-0.9437
0.0001	13.33	500	1.0328	-10.7107	-14.1992	0.7400	3.4885	-249.8943	-214.3115	-0.8858	-0.9397
0.0001	16.0	600	1.0378	-10.7577	-14.2607	0.7400	3.5030	-250.5091	-214.7812	-0.8823	-0.9372
0.0	18.67	700	1.0407	-10.7811	-14.2886	0.75	3.5075	-250.7885	-215.0155	-0.8811	-0.9363
0.0001	21.33	800	1.0415	-10.7857	-14.2997	0.7400	3.5139	-250.8989	-215.0617	-0.8802	-0.9359
0.0001	24.0	900	1.0423	-10.7886	-14.2954	0.7400	3.5068	-250.8562	-215.0906	-0.8802	-0.9356
0.0001	26.67	1000	1.0445	-10.7928	-14.2959	0.7400	3.5031	-250.8612	-215.1325	-0.8799	-0.9353

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model