Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 102
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 37
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 63
MM-ACT: Learn from Multimodal Parallel Generation to Act Paper • 2512.00975 • Published Nov 30, 2025 • 6
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 179
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis Paper • 2506.04217 • Published Jun 4, 2025 • 1
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10, 2025 • 61