TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published May 25 • 52
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published May 28 • 146
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction Paper • 2605.26230 • Published May 25 • 41
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published May 26 • 72
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published May 26 • 145
CubePart: An Open-Vocabulary Part-Controllable 3D Generator Paper • 2605.28763 • Published May 27 • 14
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving Paper • 2605.23163 • Published May 25 • 17
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published May 20 • 111
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation Paper • 2604.18168 • Published Apr 20 • 96
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 127