arxiv:2512.18563

OpenView: Empowering MLLMs with Out-of-view VQA

Published on Dec 21

Authors:

Abstract

OpenView enhances multimodal large language models' ability to reason about out-of-view elements in images through a pipeline, dataset, and benchmark for multi-choice visual question answering.

AI-generated summary

Recent multimodal large language models (MLLMs) show great potential in natural image understanding. Yet, they perform well, mainly on reasoning in-view contents within the image frame. This paper presents the first study on out-of-view (OOV) understanding, i.e., the ability to reason objects, activities, and scenes beyond the visible frame of a perspective view. Our technical contributions are threefold. First, we design OpenView, a four-stage pipeline to massively generate multi-choice VQA by leveraging panoramic imagery to enable context-rich and spatial-grounded VQA synthesis with free-view framing. Second, we curate OpenView-Dataset, a high-quality synthetic dataset from diverse real-world panoramas to empower MLLMs upon supervised fine-tuning. Third, we build OpenView-Bench, a benchmark that jointly measures choice and rationale accuracy for interpretable and diagnosable evaluation. Experimental results show that despite having a large gap from human performance in OOV VQA answer selection, upon empowered by OpenView, multiple MLLMs can consistently boost their performance, uplifted from 48.6% to 64.1% on average. Code, benchmark, and data will be available at https://github.com/q1xiangchen/OpenView.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.18563 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.18563 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.