model_usp4_dpo5

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0857	2.67	100	1.4158	-6.1266	-7.7828	0.5700	1.6562	-126.6688	-123.4309	-0.2615	-0.2304
0.0373	5.33	200	2.0473	-10.8748	-14.0013	0.6400	3.1265	-139.1058	-132.9272	-1.1231	-1.1327
0.0061	8.0	300	2.3674	-11.1453	-14.1832	0.5900	3.0378	-139.4695	-133.4684	-0.8038	-0.7431
0.0004	10.67	400	2.0235	-4.6284	-7.5396	0.6500	2.9112	-126.1823	-120.4344	-0.8446	-0.7851
0.0	13.33	500	2.0425	-5.3605	-8.3967	0.6400	3.0362	-127.8966	-121.8987	-0.8512	-0.7922
0.0	16.0	600	2.0426	-5.3772	-8.4171	0.6400	3.0399	-127.9373	-121.9320	-0.8517	-0.7927
0.0	18.67	700	2.0478	-5.3866	-8.4190	0.6400	3.0323	-127.9411	-121.9509	-0.8520	-0.7932
0.0	21.33	800	2.0499	-5.3884	-8.4250	0.6400	3.0366	-127.9531	-121.9544	-0.8517	-0.7929
0.0	24.0	900	2.0375	-5.3727	-8.4358	0.6400	3.0631	-127.9748	-121.9230	-0.8519	-0.7930
0.0	26.67	1000	2.0392	-5.3723	-8.4336	0.6400	3.0613	-127.9703	-121.9222	-0.8515	-0.7928

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model