Spaces:
Running
Running
| title: SpatialBench | |
| emoji: 🧩 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: "5.23.3" | |
| app_file: app.py | |
| pinned: true | |
| short_description: Do LLMs Build Spatial World Models? Evidence from Maze Tasks | |
| # SpatialBench | |
| Evaluation platform for **"Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks"** (ICLR 2026 Workshop). | |
| Three tasks probe whether LLMs construct internal spatial representations: | |
| | Task | Type | Description | | |
| |------|------|-------------| | |
| | **Maze Navigation** | Planning | Find shortest path from start to goal | | |
| | **Sequential Point Reuse** | Reasoning | Q3 = Q0 — do models reuse earlier computation? | | |
| | **Compositional Distance** | Reasoning | Compose corner→center distances for Q2 | | |
| Models evaluated: Gemini 2.5 Flash, GPT-5 Mini, Claude Haiku 4.5, DeepSeek Chat. | |