Spaces:

weijiang99
/

SpatialBench

Running

SpatialBench / README.md

Upload folder using huggingface_hub

5906d8c verified about 1 month ago

831 Bytes

	---
	title: SpatialBench
	emoji: 🧩
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: "5.23.3"
	app_file: app.py
	pinned: true
	short_description: Do LLMs Build Spatial World Models? Evidence from Maze Tasks
	---

	# SpatialBench

	Evaluation platform for "Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks" (ICLR 2026 Workshop).

	Three tasks probe whether LLMs construct internal spatial representations:

	\| Task \| Type \| Description \|
	\|------\|------\|-------------\|
	\| Maze Navigation \| Planning \| Find shortest path from start to goal \|
	\| Sequential Point Reuse \| Reasoning \| Q3 = Q0 — do models reuse earlier computation? \|
	\| Compositional Distance \| Reasoning \| Compose corner→center distances for Q2 \|

	Models evaluated: Gemini 2.5 Flash, GPT-5 Mini, Claude Haiku 4.5, DeepSeek Chat.