TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL Paper • 2606.01599 • Published 3 days ago • 17
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 22
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published Feb 11 • 245
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data Paper • 2502.14044 • Published Feb 19, 2025 • 8