N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 7 days ago • 19
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published 7 days ago • 10
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24 • 26