LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 1 day ago • 62
InstructSAM: Segment Any Instance with Any Instructions Paper • 2605.26102 • Published 2 days ago • 10
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? Paper • 2506.05287 • Published Jun 5, 2025 • 14
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 46