EgoCS-400K: An Egocentric Gameplay Dataset for World Models Paper • 2606.18180 • Published 18 days ago • 15
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning Paper • 2606.12195 • Published 24 days ago • 23
DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning Paper • 2606.08035 • Published 28 days ago • 16
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published Feb 12 • 63