Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views Paper • 2606.29513 • Published 5 days ago • 43
Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking Paper • 2606.15673 • Published Apr 8 • 13
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 22 days ago • 109
Rethinking State Tracking in Recurrent Models Through Error Control Dynamics Paper • 2605.07755 • Published May 8 • 24
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published Jun 1, 2025 • 20
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24, 2025 • 36
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18, 2025 • 19
Teaching Metric Distance to Autoregressive Multimodal Foundational Models Paper • 2503.02379 • Published Mar 4, 2025 • 4