Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 2 days ago • 16
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 28 days ago • 36
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118