Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)? Paper • 2605.30557 • Published May 28 • 12
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation Paper • 2605.14269 • Published May 14 • 9
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper • 2603.16792 • Published Mar 17 • 3
Error-Driven Scene Editing for 3D Grounding in Large Language Models Paper • 2511.14086 • Published Nov 18, 2025 • 7