Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration Paper • 2602.03647 • Published 1 day ago • 5
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 75
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16, 2025 • 23