YuchenLi01
/

generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42

generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseIncludeGT dataset. It achieves the following results on the evaluation set:

Loss: 0.5407
Rewards/chosen: -0.6257
Rewards/rejected: -1.4591
Rewards/accuracies: 0.7240
Rewards/margins: 0.8334
Logps/rejected: -48.5346
Logps/chosen: -44.7152
Logits/rejected: -2.2013
Logits/chosen: -2.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7099	0.0153	20	0.7025	-0.0118	-0.0142	0.5455	0.0025	-44.9225	-43.1803	-2.2758	-2.3245
0.697	0.0306	40	0.6961	0.0008	-0.0241	0.5357	0.0249	-44.9472	-43.1489	-2.2756	-2.3248
0.6707	0.0459	60	0.6969	-0.0335	-0.0417	0.5097	0.0082	-44.9912	-43.2347	-2.2695	-2.3186
0.6954	0.0612	80	0.6900	-0.0724	-0.0995	0.5487	0.0271	-45.1356	-43.3319	-2.2608	-2.3105
0.6779	0.0765	100	0.6826	-0.1611	-0.2154	0.5844	0.0543	-45.4255	-43.5537	-2.2402	-2.2908
0.6799	0.0917	120	0.6733	-0.2920	-0.3618	0.5682	0.0698	-45.7915	-43.8810	-2.2211	-2.2730
0.6989	0.1070	140	0.6610	-0.4120	-0.5280	0.5812	0.1160	-46.2070	-44.1810	-2.1928	-2.2456
0.5823	0.1223	160	0.6573	-0.5531	-0.6790	0.5909	0.1259	-46.5844	-44.5338	-2.1679	-2.2219
0.7023	0.1376	180	0.6460	-0.6382	-0.8057	0.5974	0.1675	-46.9012	-44.7465	-2.1511	-2.2063
0.5912	0.1529	200	0.6407	-0.5424	-0.7268	0.6623	0.1843	-46.7038	-44.5070	-2.1753	-2.2314
0.5464	0.1682	220	0.6308	-0.5724	-0.7861	0.6266	0.2137	-46.8521	-44.5819	-2.1769	-2.2343
0.6323	0.1835	240	0.6266	-0.6324	-0.8809	0.6558	0.2485	-47.0892	-44.7320	-2.1604	-2.2183
0.6174	0.1988	260	0.6189	-0.6857	-0.9617	0.6526	0.2759	-47.2911	-44.8653	-2.1511	-2.2100
0.6117	0.2141	280	0.6153	-0.6964	-0.9937	0.6753	0.2973	-47.3711	-44.8919	-2.1455	-2.2047
0.6543	0.2294	300	0.6128	-0.6730	-0.9993	0.6526	0.3263	-47.3853	-44.8335	-2.1510	-2.2109
0.6772	0.2446	320	0.6057	-0.5641	-0.8972	0.6818	0.3331	-47.1299	-44.5611	-2.1670	-2.2271
0.5993	0.2599	340	0.6023	-0.4706	-0.8110	0.6851	0.3404	-46.9144	-44.3273	-2.2016	-2.2626
0.6106	0.2752	360	0.5969	-0.4956	-0.8659	0.6916	0.3702	-47.0516	-44.3900	-2.2036	-2.2652
0.5839	0.2905	380	0.5969	-0.5398	-0.9584	0.7045	0.4186	-47.2828	-44.5004	-2.1933	-2.2562
0.5425	0.3058	400	0.5901	-0.5128	-0.9610	0.6786	0.4482	-47.2894	-44.4329	-2.2008	-2.2646
0.5481	0.3211	420	0.5892	-0.4691	-0.9336	0.6916	0.4644	-47.2209	-44.3238	-2.2126	-2.2761
0.5124	0.3364	440	0.5832	-0.3746	-0.8508	0.7013	0.4761	-47.0138	-44.0875	-2.2315	-2.2943
0.5781	0.3517	460	0.5810	-0.4879	-0.9815	0.6851	0.4936	-47.3407	-44.3707	-2.2125	-2.2767
0.6022	0.3670	480	0.5785	-0.5538	-1.0946	0.7143	0.5409	-47.6235	-44.5353	-2.1914	-2.2554
0.5997	0.3823	500	0.5763	-0.4839	-1.0486	0.7078	0.5646	-47.5083	-44.3607	-2.2174	-2.2817
0.4629	0.3976	520	0.5736	-0.4618	-1.0494	0.7175	0.5877	-47.5105	-44.3054	-2.2144	-2.2785
0.4794	0.4128	540	0.5680	-0.4757	-1.0677	0.7143	0.5920	-47.5562	-44.3401	-2.2116	-2.2756
0.6439	0.4281	560	0.5699	-0.5639	-1.1848	0.7013	0.6209	-47.8490	-44.5607	-2.1883	-2.2522
0.5701	0.4434	580	0.5688	-0.6027	-1.2314	0.7110	0.6287	-47.9654	-44.6576	-2.1938	-2.2588
0.6866	0.4587	600	0.5682	-0.6280	-1.2869	0.6981	0.6589	-48.1043	-44.7210	-2.1904	-2.2564
0.4416	0.4740	620	0.5640	-0.6475	-1.3140	0.7078	0.6665	-48.1720	-44.7696	-2.1878	-2.2534
0.4867	0.4893	640	0.5630	-0.6543	-1.3348	0.7078	0.6805	-48.2239	-44.7867	-2.1848	-2.2512
0.5613	0.5046	660	0.5622	-0.6238	-1.3276	0.7110	0.7038	-48.2059	-44.7104	-2.1866	-2.2526
0.4683	0.5199	680	0.5567	-0.6656	-1.3837	0.7208	0.7181	-48.3461	-44.8148	-2.1821	-2.2490
0.6244	0.5352	700	0.5585	-0.6607	-1.3665	0.7305	0.7058	-48.3031	-44.8027	-2.1912	-2.2580
0.6216	0.5505	720	0.5567	-0.7016	-1.4505	0.7435	0.7489	-48.5132	-44.9050	-2.1811	-2.2484
0.3742	0.5657	740	0.5573	-0.7193	-1.4764	0.7370	0.7571	-48.5779	-44.9491	-2.1811	-2.2494
0.719	0.5810	760	0.5533	-0.6734	-1.4476	0.7110	0.7742	-48.5059	-44.8344	-2.1896	-2.2571
0.4734	0.5963	780	0.5533	-0.6357	-1.3940	0.7175	0.7584	-48.3720	-44.7401	-2.2026	-2.2704
0.4929	0.6116	800	0.5510	-0.6373	-1.4141	0.7403	0.7767	-48.4221	-44.7442	-2.2022	-2.2700
0.632	0.6269	820	0.5477	-0.6490	-1.4418	0.7305	0.7928	-48.4915	-44.7734	-2.1967	-2.2643
0.5001	0.6422	840	0.5481	-0.6725	-1.4707	0.7273	0.7982	-48.5637	-44.8321	-2.1940	-2.2619
0.7523	0.6575	860	0.5481	-0.6816	-1.4845	0.7338	0.8028	-48.5981	-44.8550	-2.1947	-2.2630
0.6176	0.6728	880	0.5480	-0.6684	-1.4840	0.7305	0.8156	-48.5969	-44.8218	-2.1894	-2.2573
0.4124	0.6881	900	0.5463	-0.6604	-1.4643	0.7370	0.8039	-48.5477	-44.8018	-2.1931	-2.2610
0.498	0.7034	920	0.5469	-0.6825	-1.4748	0.7273	0.7923	-48.5738	-44.8571	-2.1916	-2.2599
0.3912	0.7187	940	0.5442	-0.6745	-1.4845	0.7240	0.8101	-48.5983	-44.8370	-2.1938	-2.2620
0.6001	0.7339	960	0.5431	-0.6746	-1.4897	0.7143	0.8151	-48.6112	-44.8373	-2.1964	-2.2650
0.6153	0.7492	980	0.5418	-0.6651	-1.4959	0.7403	0.8308	-48.6267	-44.8138	-2.1921	-2.2600
0.4508	0.7645	1000	0.5402	-0.6495	-1.4828	0.7273	0.8333	-48.5940	-44.7748	-2.1953	-2.2633
0.4763	0.7798	1020	0.5419	-0.6611	-1.5016	0.7208	0.8405	-48.6409	-44.8036	-2.1971	-2.2660
0.4468	0.7951	1040	0.5420	-0.6580	-1.4863	0.7370	0.8283	-48.6026	-44.7958	-2.1962	-2.2643
0.4179	0.8104	1060	0.5417	-0.6353	-1.4790	0.7240	0.8437	-48.5844	-44.7392	-2.1986	-2.2671
0.5283	0.8257	1080	0.5425	-0.6470	-1.4815	0.7143	0.8345	-48.5906	-44.7684	-2.2029	-2.2715
0.5548	0.8410	1100	0.5427	-0.6322	-1.4783	0.7273	0.8462	-48.5827	-44.7313	-2.2019	-2.2706
0.6147	0.8563	1120	0.5430	-0.6466	-1.4682	0.7078	0.8216	-48.5575	-44.7674	-2.2024	-2.2712
0.5355	0.8716	1140	0.5399	-0.6323	-1.4787	0.7305	0.8464	-48.5836	-44.7316	-2.1987	-2.2669
0.6367	0.8869	1160	0.5414	-0.6339	-1.4595	0.7305	0.8255	-48.5356	-44.7357	-2.2065	-2.2754
0.3592	0.9021	1180	0.5395	-0.6308	-1.4706	0.7143	0.8398	-48.5634	-44.7280	-2.2026	-2.2707
0.4004	0.9174	1200	0.5425	-0.6253	-1.4580	0.7273	0.8326	-48.5319	-44.7143	-2.2069	-2.2754
0.4802	0.9327	1220	0.5399	-0.6224	-1.4638	0.7078	0.8414	-48.5464	-44.7070	-2.2015	-2.2697
0.6739	0.9480	1240	0.5401	-0.6364	-1.4667	0.7045	0.8302	-48.5536	-44.7420	-2.2034	-2.2714
0.5283	0.9633	1260	0.5402	-0.6258	-1.4621	0.7240	0.8364	-48.5422	-44.7153	-2.2037	-2.2720
0.3915	0.9786	1280	0.5410	-0.6262	-1.4651	0.7208	0.8388	-48.5496	-44.7165	-2.2042	-2.2726
0.4974	0.9939	1300	0.5397	-0.6358	-1.4604	0.7305	0.8246	-48.5380	-44.7405	-2.2010	-2.2694

Framework versions

Transformers 4.45.2
Pytorch 2.5.1+cu121
Datasets 3.5.0
Tokenizers 0.20.3

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for YuchenLi01/generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1539)

this model