Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published 1 day ago • 15