GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization Paper • 2606.16771 • Published 4 days ago • 10
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js1_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js1_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js4_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js4_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js3_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js3_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js1_kl_false Updated Oct 10, 2025