MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 16 days ago • 92
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published May 14 • 121
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 30
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 329