Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3, 2025 • 32
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning Paper • 2504.00891 • Published Apr 1, 2025 • 14