Papers
arxiv:2601.06165

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Published on Jan 7
· Submitted by
DasolChoi
on Jan 13
Authors:
,
,
,
,
,
,
,
,

Abstract

Real-world vision-language benchmarks reveal that under-specified user queries pose significant challenges for current models, with explicit query rewriting leading to substantial performance improvements.

AI-generated summary

Current vision-language benchmarks predominantly feature well-structured questions with clear, explicit prompts. However, real user queries are often informal and underspecified. Users naturally leave much unsaid, relying on images to convey context. We introduce HAERAE-Vision, a benchmark of 653 real-world visual questions from Korean online communities (0.76% survival from 86K candidates), each paired with an explicit rewrite, yielding 1,306 query variants in total. Evaluating 39 VLMs, we find that even state-of-the-art models (GPT-5, Gemini 2.5 Pro) achieve under 50% on the original queries. Crucially, query explicitation alone yields 8 to 22 point improvements, with smaller models benefiting most. We further show that even with web search, under-specified queries underperform explicit queries without search, revealing that current retrieval cannot compensate for what users leave unsaid. Our findings demonstrate that a substantial portion of VLM difficulty stem from natural query under-specification instead of model capability, highlighting a critical gap between benchmark evaluation and real-world deployment.

Community

Paper author Paper submitter

Users often ask VLMs under-specified, informal visual questions, which current clean-prompt benchmarks fail to capture. We introduce HAERAE-Vision (653 real Korean community queries + explicit rewrites) and show that making queries explicit boosts accuracy by 8–22 points, while web search cannot fully offset what users leave unsaid.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.06165 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.06165 in a Space README.md to link it from this page.

Collections including this paper 1