DSR_Suite Collection Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models • 3 items • Updated 2 days ago • 6
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 2 days ago • 40
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14 • 164
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper • 2510.11027 • Published Oct 13 • 21
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints Paper • 2510.08565 • Published Oct 9 • 19
MiroThinker-v0.1 Collection High performance in deep research and tool use. • 7 items • Updated Sep 8 • 36
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published Dec 12, 2024 • 38