DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes Paper • 2605.28421 • Published 2 days ago • 39
ACC: Compiling Agent Trajectories for Long-Context Training Paper • 2605.21850 • Published 8 days ago • 59
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published Apr 13 • 144
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242