Papers
arxiv:2602.07055

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Published on Feb 4
· Submitted by
Jieyu Zhang
on Feb 10
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Current multimodal foundation models show limitations in maintaining coherent spatial beliefs during active exploration, exhibiting gaps between active and passive performance, inefficient exploration strategies, and difficulties in updating outdated spatial knowledge.

AI-generated summary

Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We propose Theory of Space, defined as an agent's ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. We evaluate this through a benchmark where the goal is curiosity-driven exploration to build an accurate cognitive map. A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step. Our evaluation of state-of-the-art models reveals several critical bottlenecks. First, we identify an Active-Passive Gap, where performance drops significantly when agents must autonomously gather information. Second, we find high inefficiency, as models explore unsystematically compared to program-based proxies. Through belief probing, we diagnose that while perception is an initial bottleneck, global beliefs suffer from instability that causes spatial knowledge to degrade over time. Finally, using a false belief paradigm, we uncover Belief Inertia, where agents fail to update obsolete priors with new evidence. This issue is present in text-based agents but is particularly severe in vision-based models. Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.

Community

Paper submitter

Theory of Space studies whether foundation models can construct a globally consistent spatial belief from partial observations via active exploration, revise the belief in dynamic environments when new evidence contradicts prior assumptions, and exploit the belief for downstream spatial tasks. We also probe the model to externalize its spatial belief during exploration to “open the box” and directly observe how beliefs evolve over time.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.07055 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.07055 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.07055 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.