Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published May 27 • 431
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published May 26 • 144
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis Paper • 2605.22570 • Published May 21 • 24
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 344
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 157
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published Mar 23 • 48
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published Mar 23 • 127
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper • 2603.21944 • Published Mar 23 • 27
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published Mar 18 • 18
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109
Video-CoE: Reinforcing Video Event Prediction via Chain of Events Paper • 2603.14935 • Published Mar 16 • 91
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 141
OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics Paper • 2512.08625 • Published Dec 9, 2025 • 1
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published Mar 17 • 89
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 99