generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseIncludeGT dataset. It achieves the following results on the evaluation set:
- Loss: 0.5407
- Rewards/chosen: -0.6257
- Rewards/rejected: -1.4591
- Rewards/accuracies: 0.7240
- Rewards/margins: 0.8334
- Logps/rejected: -48.5346
- Logps/chosen: -44.7152
- Logits/rejected: -2.2013
- Logits/chosen: -2.2695
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.7099 | 0.0153 | 20 | 0.7025 | -0.0118 | -0.0142 | 0.5455 | 0.0025 | -44.9225 | -43.1803 | -2.2758 | -2.3245 |
| 0.697 | 0.0306 | 40 | 0.6961 | 0.0008 | -0.0241 | 0.5357 | 0.0249 | -44.9472 | -43.1489 | -2.2756 | -2.3248 |
| 0.6707 | 0.0459 | 60 | 0.6969 | -0.0335 | -0.0417 | 0.5097 | 0.0082 | -44.9912 | -43.2347 | -2.2695 | -2.3186 |
| 0.6954 | 0.0612 | 80 | 0.6900 | -0.0724 | -0.0995 | 0.5487 | 0.0271 | -45.1356 | -43.3319 | -2.2608 | -2.3105 |
| 0.6779 | 0.0765 | 100 | 0.6826 | -0.1611 | -0.2154 | 0.5844 | 0.0543 | -45.4255 | -43.5537 | -2.2402 | -2.2908 |
| 0.6799 | 0.0917 | 120 | 0.6733 | -0.2920 | -0.3618 | 0.5682 | 0.0698 | -45.7915 | -43.8810 | -2.2211 | -2.2730 |
| 0.6989 | 0.1070 | 140 | 0.6610 | -0.4120 | -0.5280 | 0.5812 | 0.1160 | -46.2070 | -44.1810 | -2.1928 | -2.2456 |
| 0.5823 | 0.1223 | 160 | 0.6573 | -0.5531 | -0.6790 | 0.5909 | 0.1259 | -46.5844 | -44.5338 | -2.1679 | -2.2219 |
| 0.7023 | 0.1376 | 180 | 0.6460 | -0.6382 | -0.8057 | 0.5974 | 0.1675 | -46.9012 | -44.7465 | -2.1511 | -2.2063 |
| 0.5912 | 0.1529 | 200 | 0.6407 | -0.5424 | -0.7268 | 0.6623 | 0.1843 | -46.7038 | -44.5070 | -2.1753 | -2.2314 |
| 0.5464 | 0.1682 | 220 | 0.6308 | -0.5724 | -0.7861 | 0.6266 | 0.2137 | -46.8521 | -44.5819 | -2.1769 | -2.2343 |
| 0.6323 | 0.1835 | 240 | 0.6266 | -0.6324 | -0.8809 | 0.6558 | 0.2485 | -47.0892 | -44.7320 | -2.1604 | -2.2183 |
| 0.6174 | 0.1988 | 260 | 0.6189 | -0.6857 | -0.9617 | 0.6526 | 0.2759 | -47.2911 | -44.8653 | -2.1511 | -2.2100 |
| 0.6117 | 0.2141 | 280 | 0.6153 | -0.6964 | -0.9937 | 0.6753 | 0.2973 | -47.3711 | -44.8919 | -2.1455 | -2.2047 |
| 0.6543 | 0.2294 | 300 | 0.6128 | -0.6730 | -0.9993 | 0.6526 | 0.3263 | -47.3853 | -44.8335 | -2.1510 | -2.2109 |
| 0.6772 | 0.2446 | 320 | 0.6057 | -0.5641 | -0.8972 | 0.6818 | 0.3331 | -47.1299 | -44.5611 | -2.1670 | -2.2271 |
| 0.5993 | 0.2599 | 340 | 0.6023 | -0.4706 | -0.8110 | 0.6851 | 0.3404 | -46.9144 | -44.3273 | -2.2016 | -2.2626 |
| 0.6106 | 0.2752 | 360 | 0.5969 | -0.4956 | -0.8659 | 0.6916 | 0.3702 | -47.0516 | -44.3900 | -2.2036 | -2.2652 |
| 0.5839 | 0.2905 | 380 | 0.5969 | -0.5398 | -0.9584 | 0.7045 | 0.4186 | -47.2828 | -44.5004 | -2.1933 | -2.2562 |
| 0.5425 | 0.3058 | 400 | 0.5901 | -0.5128 | -0.9610 | 0.6786 | 0.4482 | -47.2894 | -44.4329 | -2.2008 | -2.2646 |
| 0.5481 | 0.3211 | 420 | 0.5892 | -0.4691 | -0.9336 | 0.6916 | 0.4644 | -47.2209 | -44.3238 | -2.2126 | -2.2761 |
| 0.5124 | 0.3364 | 440 | 0.5832 | -0.3746 | -0.8508 | 0.7013 | 0.4761 | -47.0138 | -44.0875 | -2.2315 | -2.2943 |
| 0.5781 | 0.3517 | 460 | 0.5810 | -0.4879 | -0.9815 | 0.6851 | 0.4936 | -47.3407 | -44.3707 | -2.2125 | -2.2767 |
| 0.6022 | 0.3670 | 480 | 0.5785 | -0.5538 | -1.0946 | 0.7143 | 0.5409 | -47.6235 | -44.5353 | -2.1914 | -2.2554 |
| 0.5997 | 0.3823 | 500 | 0.5763 | -0.4839 | -1.0486 | 0.7078 | 0.5646 | -47.5083 | -44.3607 | -2.2174 | -2.2817 |
| 0.4629 | 0.3976 | 520 | 0.5736 | -0.4618 | -1.0494 | 0.7175 | 0.5877 | -47.5105 | -44.3054 | -2.2144 | -2.2785 |
| 0.4794 | 0.4128 | 540 | 0.5680 | -0.4757 | -1.0677 | 0.7143 | 0.5920 | -47.5562 | -44.3401 | -2.2116 | -2.2756 |
| 0.6439 | 0.4281 | 560 | 0.5699 | -0.5639 | -1.1848 | 0.7013 | 0.6209 | -47.8490 | -44.5607 | -2.1883 | -2.2522 |
| 0.5701 | 0.4434 | 580 | 0.5688 | -0.6027 | -1.2314 | 0.7110 | 0.6287 | -47.9654 | -44.6576 | -2.1938 | -2.2588 |
| 0.6866 | 0.4587 | 600 | 0.5682 | -0.6280 | -1.2869 | 0.6981 | 0.6589 | -48.1043 | -44.7210 | -2.1904 | -2.2564 |
| 0.4416 | 0.4740 | 620 | 0.5640 | -0.6475 | -1.3140 | 0.7078 | 0.6665 | -48.1720 | -44.7696 | -2.1878 | -2.2534 |
| 0.4867 | 0.4893 | 640 | 0.5630 | -0.6543 | -1.3348 | 0.7078 | 0.6805 | -48.2239 | -44.7867 | -2.1848 | -2.2512 |
| 0.5613 | 0.5046 | 660 | 0.5622 | -0.6238 | -1.3276 | 0.7110 | 0.7038 | -48.2059 | -44.7104 | -2.1866 | -2.2526 |
| 0.4683 | 0.5199 | 680 | 0.5567 | -0.6656 | -1.3837 | 0.7208 | 0.7181 | -48.3461 | -44.8148 | -2.1821 | -2.2490 |
| 0.6244 | 0.5352 | 700 | 0.5585 | -0.6607 | -1.3665 | 0.7305 | 0.7058 | -48.3031 | -44.8027 | -2.1912 | -2.2580 |
| 0.6216 | 0.5505 | 720 | 0.5567 | -0.7016 | -1.4505 | 0.7435 | 0.7489 | -48.5132 | -44.9050 | -2.1811 | -2.2484 |
| 0.3742 | 0.5657 | 740 | 0.5573 | -0.7193 | -1.4764 | 0.7370 | 0.7571 | -48.5779 | -44.9491 | -2.1811 | -2.2494 |
| 0.719 | 0.5810 | 760 | 0.5533 | -0.6734 | -1.4476 | 0.7110 | 0.7742 | -48.5059 | -44.8344 | -2.1896 | -2.2571 |
| 0.4734 | 0.5963 | 780 | 0.5533 | -0.6357 | -1.3940 | 0.7175 | 0.7584 | -48.3720 | -44.7401 | -2.2026 | -2.2704 |
| 0.4929 | 0.6116 | 800 | 0.5510 | -0.6373 | -1.4141 | 0.7403 | 0.7767 | -48.4221 | -44.7442 | -2.2022 | -2.2700 |
| 0.632 | 0.6269 | 820 | 0.5477 | -0.6490 | -1.4418 | 0.7305 | 0.7928 | -48.4915 | -44.7734 | -2.1967 | -2.2643 |
| 0.5001 | 0.6422 | 840 | 0.5481 | -0.6725 | -1.4707 | 0.7273 | 0.7982 | -48.5637 | -44.8321 | -2.1940 | -2.2619 |
| 0.7523 | 0.6575 | 860 | 0.5481 | -0.6816 | -1.4845 | 0.7338 | 0.8028 | -48.5981 | -44.8550 | -2.1947 | -2.2630 |
| 0.6176 | 0.6728 | 880 | 0.5480 | -0.6684 | -1.4840 | 0.7305 | 0.8156 | -48.5969 | -44.8218 | -2.1894 | -2.2573 |
| 0.4124 | 0.6881 | 900 | 0.5463 | -0.6604 | -1.4643 | 0.7370 | 0.8039 | -48.5477 | -44.8018 | -2.1931 | -2.2610 |
| 0.498 | 0.7034 | 920 | 0.5469 | -0.6825 | -1.4748 | 0.7273 | 0.7923 | -48.5738 | -44.8571 | -2.1916 | -2.2599 |
| 0.3912 | 0.7187 | 940 | 0.5442 | -0.6745 | -1.4845 | 0.7240 | 0.8101 | -48.5983 | -44.8370 | -2.1938 | -2.2620 |
| 0.6001 | 0.7339 | 960 | 0.5431 | -0.6746 | -1.4897 | 0.7143 | 0.8151 | -48.6112 | -44.8373 | -2.1964 | -2.2650 |
| 0.6153 | 0.7492 | 980 | 0.5418 | -0.6651 | -1.4959 | 0.7403 | 0.8308 | -48.6267 | -44.8138 | -2.1921 | -2.2600 |
| 0.4508 | 0.7645 | 1000 | 0.5402 | -0.6495 | -1.4828 | 0.7273 | 0.8333 | -48.5940 | -44.7748 | -2.1953 | -2.2633 |
| 0.4763 | 0.7798 | 1020 | 0.5419 | -0.6611 | -1.5016 | 0.7208 | 0.8405 | -48.6409 | -44.8036 | -2.1971 | -2.2660 |
| 0.4468 | 0.7951 | 1040 | 0.5420 | -0.6580 | -1.4863 | 0.7370 | 0.8283 | -48.6026 | -44.7958 | -2.1962 | -2.2643 |
| 0.4179 | 0.8104 | 1060 | 0.5417 | -0.6353 | -1.4790 | 0.7240 | 0.8437 | -48.5844 | -44.7392 | -2.1986 | -2.2671 |
| 0.5283 | 0.8257 | 1080 | 0.5425 | -0.6470 | -1.4815 | 0.7143 | 0.8345 | -48.5906 | -44.7684 | -2.2029 | -2.2715 |
| 0.5548 | 0.8410 | 1100 | 0.5427 | -0.6322 | -1.4783 | 0.7273 | 0.8462 | -48.5827 | -44.7313 | -2.2019 | -2.2706 |
| 0.6147 | 0.8563 | 1120 | 0.5430 | -0.6466 | -1.4682 | 0.7078 | 0.8216 | -48.5575 | -44.7674 | -2.2024 | -2.2712 |
| 0.5355 | 0.8716 | 1140 | 0.5399 | -0.6323 | -1.4787 | 0.7305 | 0.8464 | -48.5836 | -44.7316 | -2.1987 | -2.2669 |
| 0.6367 | 0.8869 | 1160 | 0.5414 | -0.6339 | -1.4595 | 0.7305 | 0.8255 | -48.5356 | -44.7357 | -2.2065 | -2.2754 |
| 0.3592 | 0.9021 | 1180 | 0.5395 | -0.6308 | -1.4706 | 0.7143 | 0.8398 | -48.5634 | -44.7280 | -2.2026 | -2.2707 |
| 0.4004 | 0.9174 | 1200 | 0.5425 | -0.6253 | -1.4580 | 0.7273 | 0.8326 | -48.5319 | -44.7143 | -2.2069 | -2.2754 |
| 0.4802 | 0.9327 | 1220 | 0.5399 | -0.6224 | -1.4638 | 0.7078 | 0.8414 | -48.5464 | -44.7070 | -2.2015 | -2.2697 |
| 0.6739 | 0.9480 | 1240 | 0.5401 | -0.6364 | -1.4667 | 0.7045 | 0.8302 | -48.5536 | -44.7420 | -2.2034 | -2.2714 |
| 0.5283 | 0.9633 | 1260 | 0.5402 | -0.6258 | -1.4621 | 0.7240 | 0.8364 | -48.5422 | -44.7153 | -2.2037 | -2.2720 |
| 0.3915 | 0.9786 | 1280 | 0.5410 | -0.6262 | -1.4651 | 0.7208 | 0.8388 | -48.5496 | -44.7165 | -2.2042 | -2.2726 |
| 0.4974 | 0.9939 | 1300 | 0.5397 | -0.6358 | -1.4604 | 0.7305 | 0.8246 | -48.5380 | -44.7405 | -2.2010 | -2.2694 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu121
- Datasets 3.5.0
- Tokenizers 0.20.3
- Downloads last month
- 4