ViQ: Text-Aligned Visual Quantized Representations at Any Resolution Paper • 2606.27313 • Published 10 days ago • 38
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 27 days ago • 64
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published May 18 • 33
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 182
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published Mar 24 • 26
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought Paper • 2603.22847 • Published Mar 24 • 26