Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 19 days ago • 41
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 19 days ago • 42
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 19 days ago • 42
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching Paper • 2605.26108 • Published May 25 • 7
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 19 days ago • 42
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 19 days ago • 41
Rethinking the Divergence Regularization in LLM RL Paper • 2606.09821 • Published 20 days ago • 33 • 4
Rethinking the Divergence Regularization in LLM RL Paper • 2606.09821 • Published 20 days ago • 33 • 4
RTDMD Collection Reinforcing Few-step Generators via Reward-Tilted Distribution Matching • 5 items • Updated 26 days ago • 3