Spaces:

gbl1357
/

AgenticZork

No application file

App Files Files Community

gbl1357 commited on Feb 22

Commit

ebd9480

verified ·

1 Parent(s): 82a203a

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -69

README.md CHANGED Viewed

@@ -11,80 +11,37 @@ pinned: false
 # AgenticZork — MCP ReAct Agent for Jericho Text Adventures
-This project implements a **ReAct-style agent** that plays **Jericho/Z-machine** text-adventure games through a small **FastMCP** server. The goal is to **maximize game score** and **explore as many distinct locations as possible** under a fixed step budget.
-## Components
-- **`agent.py`** — Student agent: ReAct loop + heuristics + lightweight state tracking; LLM inference via Hugging Face (`Qwen/Qwen2.5-72B-Instruct`).
-- **`mcp_server.py`** — MCP server exposing game interaction tools and maintaining server-side exploration logs.
-## Tool interface (MCP)
-The server exposes four tools used by the agent:
-- `play_action(action: str)`: execute an in-game command and return the resulting observation (+ score/move metadata).
-- `inventory()`: return current inventory.
-- `get_map()`: return an explored-graph view (locations + discovered exits).
-- `memory()`: return a compact state summary (location, score, moves, recent actions, current observation).
-## Agent policy (high level)
-At each step, the agent:
-1. Builds a prompt from: **current observation**, **score**, **recent actions**, and **anti-loop warnings**.
-2. Queries the LLM with a strict output schema:
-   - `THOUGHT: ...`
-   - `TOOL: ...`
-   - `ARGS: {...}`
-3. Executes the selected MCP tool and updates internal trackers:
-   - `locations_explored` (unique room titles)
-   - `failed_actions` (actions that repeatedly fail)
-   - `recent_actions` (loop detection)
-   - `steps_since_progress` (no new rooms and no score increase)
-Heuristics favor: **take/open/examine** on salient nouns, then **systematic exploration** (compass directions), while discouraging immediate backtracking.
-## Critical bug fixed: location tracking corrupted by tool outputs
-### Failure mode
-The agent originally extracted the “current room” from the **first plausible header line** of the latest text. When the agent called `get_map()` / `memory()` / `inventory()`, those tool responses begin with headers like:
-- `Explored Locations and Exits:`
-- `Current State:`
-- `Inventory:`
-These headers were incorrectly interpreted as room titles, causing the agent to set its location to e.g. `Explored Locations and Exits:` and subsequently break:
-- visited-location accounting,
-- progress detection,
-- exploration decisions (leading to repeated `get_map()` calls).
-### Fix (agent-side, primary)
-Location updates are now **gated**:
-- The agent updates location **only after `play_action`** (never after `get_map` / `memory` / `inventory`).
-- Location updates are preferred on **movement / look-like actions** (`n/s/e/w/u/d`, `go <dir>`, `look`), and conservative otherwise.
-Implemented via:
-- `_should_update_location_from_action()`
-- `_maybe_update_location()`
-### Hardening (server-side)
-All non-game tools are prefixed to make them unambiguous:
-- `[TOOL:get_map]`, `[TOOL:memory]`, `[TOOL:inventory]`
-This prevents any future parser from confusing tool output with a room title.
-### Hardening (parser-side)
-`_extract_location()` additionally ignores:
-- tool prefixes/headers,
-- score/move metadata,
-- common failure messages (e.g., “I don’t understand”, “You can’t go that way”).
-## Reproducibility
-- LLM inference uses `temperature=0.0` and a deterministic `seed` schedule (`seed + step`).
-- Server logs can be saved per session (actions, location transitions, score deltas).
-## Running
-### Environment
-Create a `.env` file:
-```bash
-HF_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXX

 # AgenticZork — MCP ReAct Agent for Jericho Text Adventures
+This project implements a **ReAct-style agent** for Jericho/Z-machine text adventure games using a lightweight **FastMCP** server. The objective is to **maximize score** while also **visiting many unique locations** under a step budget.
+## Core ideas (design choices)
+### 1) ReAct with strict tool protocol
+The agent queries an LLM (`Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference) and forces a rigid output schema:
+`THOUGHT`, `TOOL`, `ARGS`. This reduces parsing errors and stabilizes action selection. Inference is run with **temperature=0** and a step-dependent seed for repeatability.
+### 2) Hybrid policy: heuristics + LLM
+The LLM is guided by universal scoring heuristics: **take/open/examine first**, then systematic exploration (cardinal + vertical directions). The agent injects lightweight state into the prompt: current score, recent actions, and warnings when progress stalls, to bias the LLM toward productive interactions rather than wandering.
+### 3) Progress/loop control
+To avoid common failure modes (ping-ponging or infinite cycling), the agent maintains:
+- `recent_actions` (short window) for repetition detection
+- `failed_actions` to avoid retrying commands that consistently fail
+- `steps_since_progress` defined as “no new rooms AND no score increase”
+When the agent detects stagnation, it (i) queries the map more often, (ii) forces unexplored directions, or (iii) switches to interaction verbs (examine/open/take).
+### 4) Server-side map and memory tools
+The MCP server exposes four tools:
+- `play_action(action)`: run a game command
+- `inventory()`: list carried items
+- `memory()`: compact state summary (score/moves/recent actions)
+- `get_map()`: explored graph (location → exits → destination)
+The server updates the exploration graph by tracking movement commands and extracting room titles from observations.
+### 5) Critical fix: prevent “location corruption” from tool outputs
+A major bug observed during evaluation was that the agent sometimes treated `get_map()`/`memory()` output headers (e.g., “Explored Locations and Exits:”) as *room names*, corrupting location tracking and causing loops. I fixed this with two complementary measures:
+1) **Agent-side gating**: location is updated **only** after `play_action` (never after `get_map`/`memory`/`inventory`), and primarily on movement/look commands.
+2) **Server-side prefixing**: non-game tool outputs are prefixed with `[TOOL:...]` to make them unambiguous.
+These choices make the agent more stable across many Jericho games while remaining simple and reproducible.
+Reference for Space metadata: https://huggingface.co/docs/hub/spaces-config-reference