MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 3 days ago • 41
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 14 days ago • 40
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning? Paper • 2603.07888 • Published 4 days ago • 9
OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization Paper • 2106.03721 • Published Jun 7, 2021 • 1
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving Paper • 2203.07724 • Published Mar 15, 2022 • 1
Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models Paper • 2411.19757 • Published Nov 29, 2024 • 1
InSight-o3 Collection Empowering Multimodal Foundation Models with Generalized Visual Search • 4 items • Updated Jan 15 • 1
MapTrace: Scalable Data Generation for Route Tracing on Maps Paper • 2512.19609 • Published Dec 22, 2025 • 3