Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs Paper • 2512.17206 • Published 7 days ago • 15
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published Oct 15 • 57
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony Paper • 2510.11345 • Published Oct 13 • 15