model_hh_shp3_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0216	4.0	100	2.2837	-3.0069	-3.4434	0.5400	0.4365	-257.0145	-237.7664	-0.7901	-0.8063
0.1476	8.0	200	3.5175	-3.9480	-5.1618	0.5300	1.2138	-258.9238	-238.8121	-0.8303	-0.8463
0.0108	12.0	300	3.2066	-1.3603	-2.7278	0.5600	1.3674	-256.2194	-235.9369	-0.7824	-0.7443
0.0	16.0	400	3.1558	-4.1573	-6.0643	0.5300	1.9070	-259.9266	-239.0446	-0.7891	-0.7612
0.0	20.0	500	3.1564	-4.1409	-6.0450	0.5400	1.9041	-259.9052	-239.0264	-0.7894	-0.7613
0.0	24.0	600	3.1533	-4.1925	-6.0561	0.5300	1.8636	-259.9174	-239.0837	-0.7890	-0.7614
0.0	28.0	700	3.1650	-4.1547	-6.0212	0.5300	1.8665	-259.8788	-239.0417	-0.7892	-0.7614
0.0	32.0	800	3.1593	-4.1704	-6.0572	0.5400	1.8868	-259.9187	-239.0591	-0.7891	-0.7619
0.0	36.0	900	3.1711	-4.1626	-6.0504	0.5400	1.8879	-259.9112	-239.0504	-0.7892	-0.7614
0.0	40.0	1000	3.1493	-4.1445	-6.0670	0.5400	1.9226	-259.9296	-239.0303	-0.7892	-0.7615

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model