generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseIncludeGT dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5407
  • Rewards/chosen: -0.6257
  • Rewards/rejected: -1.4591
  • Rewards/accuracies: 0.7240
  • Rewards/margins: 0.8334
  • Logps/rejected: -48.5346
  • Logps/chosen: -44.7152
  • Logits/rejected: -2.2013
  • Logits/chosen: -2.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7099 0.0153 20 0.7025 -0.0118 -0.0142 0.5455 0.0025 -44.9225 -43.1803 -2.2758 -2.3245
0.697 0.0306 40 0.6961 0.0008 -0.0241 0.5357 0.0249 -44.9472 -43.1489 -2.2756 -2.3248
0.6707 0.0459 60 0.6969 -0.0335 -0.0417 0.5097 0.0082 -44.9912 -43.2347 -2.2695 -2.3186
0.6954 0.0612 80 0.6900 -0.0724 -0.0995 0.5487 0.0271 -45.1356 -43.3319 -2.2608 -2.3105
0.6779 0.0765 100 0.6826 -0.1611 -0.2154 0.5844 0.0543 -45.4255 -43.5537 -2.2402 -2.2908
0.6799 0.0917 120 0.6733 -0.2920 -0.3618 0.5682 0.0698 -45.7915 -43.8810 -2.2211 -2.2730
0.6989 0.1070 140 0.6610 -0.4120 -0.5280 0.5812 0.1160 -46.2070 -44.1810 -2.1928 -2.2456
0.5823 0.1223 160 0.6573 -0.5531 -0.6790 0.5909 0.1259 -46.5844 -44.5338 -2.1679 -2.2219
0.7023 0.1376 180 0.6460 -0.6382 -0.8057 0.5974 0.1675 -46.9012 -44.7465 -2.1511 -2.2063
0.5912 0.1529 200 0.6407 -0.5424 -0.7268 0.6623 0.1843 -46.7038 -44.5070 -2.1753 -2.2314
0.5464 0.1682 220 0.6308 -0.5724 -0.7861 0.6266 0.2137 -46.8521 -44.5819 -2.1769 -2.2343
0.6323 0.1835 240 0.6266 -0.6324 -0.8809 0.6558 0.2485 -47.0892 -44.7320 -2.1604 -2.2183
0.6174 0.1988 260 0.6189 -0.6857 -0.9617 0.6526 0.2759 -47.2911 -44.8653 -2.1511 -2.2100
0.6117 0.2141 280 0.6153 -0.6964 -0.9937 0.6753 0.2973 -47.3711 -44.8919 -2.1455 -2.2047
0.6543 0.2294 300 0.6128 -0.6730 -0.9993 0.6526 0.3263 -47.3853 -44.8335 -2.1510 -2.2109
0.6772 0.2446 320 0.6057 -0.5641 -0.8972 0.6818 0.3331 -47.1299 -44.5611 -2.1670 -2.2271
0.5993 0.2599 340 0.6023 -0.4706 -0.8110 0.6851 0.3404 -46.9144 -44.3273 -2.2016 -2.2626
0.6106 0.2752 360 0.5969 -0.4956 -0.8659 0.6916 0.3702 -47.0516 -44.3900 -2.2036 -2.2652
0.5839 0.2905 380 0.5969 -0.5398 -0.9584 0.7045 0.4186 -47.2828 -44.5004 -2.1933 -2.2562
0.5425 0.3058 400 0.5901 -0.5128 -0.9610 0.6786 0.4482 -47.2894 -44.4329 -2.2008 -2.2646
0.5481 0.3211 420 0.5892 -0.4691 -0.9336 0.6916 0.4644 -47.2209 -44.3238 -2.2126 -2.2761
0.5124 0.3364 440 0.5832 -0.3746 -0.8508 0.7013 0.4761 -47.0138 -44.0875 -2.2315 -2.2943
0.5781 0.3517 460 0.5810 -0.4879 -0.9815 0.6851 0.4936 -47.3407 -44.3707 -2.2125 -2.2767
0.6022 0.3670 480 0.5785 -0.5538 -1.0946 0.7143 0.5409 -47.6235 -44.5353 -2.1914 -2.2554
0.5997 0.3823 500 0.5763 -0.4839 -1.0486 0.7078 0.5646 -47.5083 -44.3607 -2.2174 -2.2817
0.4629 0.3976 520 0.5736 -0.4618 -1.0494 0.7175 0.5877 -47.5105 -44.3054 -2.2144 -2.2785
0.4794 0.4128 540 0.5680 -0.4757 -1.0677 0.7143 0.5920 -47.5562 -44.3401 -2.2116 -2.2756
0.6439 0.4281 560 0.5699 -0.5639 -1.1848 0.7013 0.6209 -47.8490 -44.5607 -2.1883 -2.2522
0.5701 0.4434 580 0.5688 -0.6027 -1.2314 0.7110 0.6287 -47.9654 -44.6576 -2.1938 -2.2588
0.6866 0.4587 600 0.5682 -0.6280 -1.2869 0.6981 0.6589 -48.1043 -44.7210 -2.1904 -2.2564
0.4416 0.4740 620 0.5640 -0.6475 -1.3140 0.7078 0.6665 -48.1720 -44.7696 -2.1878 -2.2534
0.4867 0.4893 640 0.5630 -0.6543 -1.3348 0.7078 0.6805 -48.2239 -44.7867 -2.1848 -2.2512
0.5613 0.5046 660 0.5622 -0.6238 -1.3276 0.7110 0.7038 -48.2059 -44.7104 -2.1866 -2.2526
0.4683 0.5199 680 0.5567 -0.6656 -1.3837 0.7208 0.7181 -48.3461 -44.8148 -2.1821 -2.2490
0.6244 0.5352 700 0.5585 -0.6607 -1.3665 0.7305 0.7058 -48.3031 -44.8027 -2.1912 -2.2580
0.6216 0.5505 720 0.5567 -0.7016 -1.4505 0.7435 0.7489 -48.5132 -44.9050 -2.1811 -2.2484
0.3742 0.5657 740 0.5573 -0.7193 -1.4764 0.7370 0.7571 -48.5779 -44.9491 -2.1811 -2.2494
0.719 0.5810 760 0.5533 -0.6734 -1.4476 0.7110 0.7742 -48.5059 -44.8344 -2.1896 -2.2571
0.4734 0.5963 780 0.5533 -0.6357 -1.3940 0.7175 0.7584 -48.3720 -44.7401 -2.2026 -2.2704
0.4929 0.6116 800 0.5510 -0.6373 -1.4141 0.7403 0.7767 -48.4221 -44.7442 -2.2022 -2.2700
0.632 0.6269 820 0.5477 -0.6490 -1.4418 0.7305 0.7928 -48.4915 -44.7734 -2.1967 -2.2643
0.5001 0.6422 840 0.5481 -0.6725 -1.4707 0.7273 0.7982 -48.5637 -44.8321 -2.1940 -2.2619
0.7523 0.6575 860 0.5481 -0.6816 -1.4845 0.7338 0.8028 -48.5981 -44.8550 -2.1947 -2.2630
0.6176 0.6728 880 0.5480 -0.6684 -1.4840 0.7305 0.8156 -48.5969 -44.8218 -2.1894 -2.2573
0.4124 0.6881 900 0.5463 -0.6604 -1.4643 0.7370 0.8039 -48.5477 -44.8018 -2.1931 -2.2610
0.498 0.7034 920 0.5469 -0.6825 -1.4748 0.7273 0.7923 -48.5738 -44.8571 -2.1916 -2.2599
0.3912 0.7187 940 0.5442 -0.6745 -1.4845 0.7240 0.8101 -48.5983 -44.8370 -2.1938 -2.2620
0.6001 0.7339 960 0.5431 -0.6746 -1.4897 0.7143 0.8151 -48.6112 -44.8373 -2.1964 -2.2650
0.6153 0.7492 980 0.5418 -0.6651 -1.4959 0.7403 0.8308 -48.6267 -44.8138 -2.1921 -2.2600
0.4508 0.7645 1000 0.5402 -0.6495 -1.4828 0.7273 0.8333 -48.5940 -44.7748 -2.1953 -2.2633
0.4763 0.7798 1020 0.5419 -0.6611 -1.5016 0.7208 0.8405 -48.6409 -44.8036 -2.1971 -2.2660
0.4468 0.7951 1040 0.5420 -0.6580 -1.4863 0.7370 0.8283 -48.6026 -44.7958 -2.1962 -2.2643
0.4179 0.8104 1060 0.5417 -0.6353 -1.4790 0.7240 0.8437 -48.5844 -44.7392 -2.1986 -2.2671
0.5283 0.8257 1080 0.5425 -0.6470 -1.4815 0.7143 0.8345 -48.5906 -44.7684 -2.2029 -2.2715
0.5548 0.8410 1100 0.5427 -0.6322 -1.4783 0.7273 0.8462 -48.5827 -44.7313 -2.2019 -2.2706
0.6147 0.8563 1120 0.5430 -0.6466 -1.4682 0.7078 0.8216 -48.5575 -44.7674 -2.2024 -2.2712
0.5355 0.8716 1140 0.5399 -0.6323 -1.4787 0.7305 0.8464 -48.5836 -44.7316 -2.1987 -2.2669
0.6367 0.8869 1160 0.5414 -0.6339 -1.4595 0.7305 0.8255 -48.5356 -44.7357 -2.2065 -2.2754
0.3592 0.9021 1180 0.5395 -0.6308 -1.4706 0.7143 0.8398 -48.5634 -44.7280 -2.2026 -2.2707
0.4004 0.9174 1200 0.5425 -0.6253 -1.4580 0.7273 0.8326 -48.5319 -44.7143 -2.2069 -2.2754
0.4802 0.9327 1220 0.5399 -0.6224 -1.4638 0.7078 0.8414 -48.5464 -44.7070 -2.2015 -2.2697
0.6739 0.9480 1240 0.5401 -0.6364 -1.4667 0.7045 0.8302 -48.5536 -44.7420 -2.2034 -2.2714
0.5283 0.9633 1260 0.5402 -0.6258 -1.4621 0.7240 0.8364 -48.5422 -44.7153 -2.2037 -2.2720
0.3915 0.9786 1280 0.5410 -0.6262 -1.4651 0.7208 0.8388 -48.5496 -44.7165 -2.2042 -2.2726
0.4974 0.9939 1300 0.5397 -0.6358 -1.4604 0.7305 0.8246 -48.5380 -44.7405 -2.2010 -2.2694

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YuchenLi01/generatedMoreUniqueResponseIncludeGT_Qwen2.5-1.5BInstruct_dpo_ebs32_lr5e-07_beta0.4_42

Finetuned
(1539)
this model