EgoCS-400K: An Egocentric Gameplay Dataset for World Models Paper • 2606.18180 • Published 16 days ago • 15
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Paper • 2606.12195 • Published 22 days ago • 23
DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning Paper • 2606.08035 • Published 26 days ago • 16
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published Feb 12 • 63