3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models Paper • 2603.07751 • Published 16 days ago • 12
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10, 2025 • 130