etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct Image-Text-to-Text • 4B • Updated 10 days ago • 18 • 1
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model Paper • 2603.18892 • Published Mar 19 • 1