UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 21
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 38
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 7 days ago • 40