--- title: SpatialBench emoji: 🧩 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: "5.23.3" app_file: app.py pinned: true short_description: Do LLMs Build Spatial World Models? Evidence from Maze Tasks --- # SpatialBench Evaluation platform for **"Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks"** (ICLR 2026 Workshop). Three tasks probe whether LLMs construct internal spatial representations: | Task | Type | Description | |------|------|-------------| | **Maze Navigation** | Planning | Find shortest path from start to goal | | **Sequential Point Reuse** | Reasoning | Q3 = Q0 — do models reuse earlier computation? | | **Compositional Distance** | Reasoning | Compose corner→center distances for Q2 | Models evaluated: Gemini 2.5 Flash, GPT-5 Mini, Claude Haiku 4.5, DeepSeek Chat.