Llama-2-7b-hf-DPO-Filtered-0.2-version-2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7942	0.3007	289	0.6999	-0.9241	-0.9313	0.4500	0.0072	-50.7666	-50.7346	-0.3565	-0.3616
0.2572	0.6015	578	0.7096	-5.2469	-5.5281	0.5	0.2812	-96.7347	-93.9629	-0.7591	-0.7630
1.0719	0.9022	867	0.9637	-8.1192	-8.8822	0.6500	0.7630	-130.2761	-122.6863	-1.0380	-1.0448
7.5957	1.2029	1156	0.8582	-11.6385	-12.5781	0.75	0.9396	-167.2348	-157.8790	-1.2656	-1.2691
1.1754	1.5036	1445	1.0008	-14.3623	-14.8965	0.6500	0.5342	-190.4189	-185.1169	-1.2682	-1.2719
4.1259	1.8044	1734	0.8550	-12.9397	-14.0616	0.6500	1.1219	-182.0699	-170.8906	-1.3336	-1.3374
0.0022	2.1051	2023	0.9542	-13.3905	-14.4785	0.6500	1.0880	-186.2391	-175.3986	-1.3731	-1.3758
0.0207	2.4058	2312	0.9893	-14.5572	-15.8387	0.6500	1.2815	-199.8413	-187.0660	-1.3984	-1.4023
0.0015	2.7066	2601	1.0005	-14.6192	-15.8788	0.6500	1.2596	-200.2418	-187.6859	-1.3992	-1.4011

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model