Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
Paper • 2603.22847 • Published • 25
Video Understanding, Audio-Visual, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation, Qwen-based, Open-source, Fully-Open-MLLMs