CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 341
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published Mar 23 • 47
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published Mar 23 • 126
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper • 2603.21944 • Published Mar 23 • 26
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published Mar 18 • 17
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 109
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109
Video-CoE: Reinforcing Video Event Prediction via Chain of Events Paper • 2603.14935 • Published Mar 16 • 91
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 139
OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics Paper • 2512.08625 • Published Dec 9, 2025 • 1
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published Mar 17 • 87
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 99
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published Mar 15 • 34
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Paper • 2603.19224 • Published Mar 19 • 18
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Paper • 2603.18002 • Published Mar 18 • 13