Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published 1 day ago • 15
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions Paper • 2602.13013 • Published Feb 13 • 54
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 93
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published Feb 2 • 117