tzwilliam0/maxmin-dpo-init-kl-coef-0.5-rebuttal-dongnan Reinforcement Learning • Updated Mar 27, 2025 • 1
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-rebuttal-dongnan Reinforcement Learning • Updated Mar 27, 2025 • 1
tzwilliam0/maxmin-dpo-init-kl-coef-0.5-fix-reward-norm-dongnan Reinforcement Learning • Updated Jan 10, 2025
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-fix-reward-norm-dongnan Reinforcement Learning • Updated Jan 10, 2025