Spaces:
No application file
No application file
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,80 +11,37 @@ pinned: false
|
|
| 11 |
|
| 12 |
# AgenticZork — MCP ReAct Agent for Jericho Text Adventures
|
| 13 |
|
| 14 |
-
This project implements a **ReAct-style agent**
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
| 20 |
|
| 21 |
-
##
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
- `
|
| 26 |
-
- `
|
| 27 |
-
- `
|
|
|
|
| 28 |
|
| 29 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
- `TOOL: ...`
|
| 36 |
-
- `ARGS: {...}`
|
| 37 |
-
3. Executes the selected MCP tool and updates internal trackers:
|
| 38 |
-
- `locations_explored` (unique room titles)
|
| 39 |
-
- `failed_actions` (actions that repeatedly fail)
|
| 40 |
-
- `recent_actions` (loop detection)
|
| 41 |
-
- `steps_since_progress` (no new rooms and no score increase)
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
## Critical bug fixed: location tracking corrupted by tool outputs
|
| 46 |
-
|
| 47 |
-
### Failure mode
|
| 48 |
-
The agent originally extracted the “current room” from the **first plausible header line** of the latest text. When the agent called `get_map()` / `memory()` / `inventory()`, those tool responses begin with headers like:
|
| 49 |
-
- `Explored Locations and Exits:`
|
| 50 |
-
- `Current State:`
|
| 51 |
-
- `Inventory:`
|
| 52 |
-
|
| 53 |
-
These headers were incorrectly interpreted as room titles, causing the agent to set its location to e.g. `Explored Locations and Exits:` and subsequently break:
|
| 54 |
-
- visited-location accounting,
|
| 55 |
-
- progress detection,
|
| 56 |
-
- exploration decisions (leading to repeated `get_map()` calls).
|
| 57 |
-
|
| 58 |
-
### Fix (agent-side, primary)
|
| 59 |
-
Location updates are now **gated**:
|
| 60 |
-
- The agent updates location **only after `play_action`** (never after `get_map` / `memory` / `inventory`).
|
| 61 |
-
- Location updates are preferred on **movement / look-like actions** (`n/s/e/w/u/d`, `go <dir>`, `look`), and conservative otherwise.
|
| 62 |
-
|
| 63 |
-
Implemented via:
|
| 64 |
-
- `_should_update_location_from_action()`
|
| 65 |
-
- `_maybe_update_location()`
|
| 66 |
-
|
| 67 |
-
### Hardening (server-side)
|
| 68 |
-
All non-game tools are prefixed to make them unambiguous:
|
| 69 |
-
- `[TOOL:get_map]`, `[TOOL:memory]`, `[TOOL:inventory]`
|
| 70 |
-
|
| 71 |
-
This prevents any future parser from confusing tool output with a room title.
|
| 72 |
-
|
| 73 |
-
### Hardening (parser-side)
|
| 74 |
-
`_extract_location()` additionally ignores:
|
| 75 |
-
- tool prefixes/headers,
|
| 76 |
-
- score/move metadata,
|
| 77 |
-
- common failure messages (e.g., “I don’t understand”, “You can’t go that way”).
|
| 78 |
-
|
| 79 |
-
## Reproducibility
|
| 80 |
-
|
| 81 |
-
- LLM inference uses `temperature=0.0` and a deterministic `seed` schedule (`seed + step`).
|
| 82 |
-
- Server logs can be saved per session (actions, location transitions, score deltas).
|
| 83 |
-
|
| 84 |
-
## Running
|
| 85 |
-
|
| 86 |
-
### Environment
|
| 87 |
-
Create a `.env` file:
|
| 88 |
-
```bash
|
| 89 |
-
HF_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
| 90 |
|
|
|
|
|
|
| 11 |
|
| 12 |
# AgenticZork — MCP ReAct Agent for Jericho Text Adventures
|
| 13 |
|
| 14 |
+
This project implements a **ReAct-style agent** for Jericho/Z-machine text adventure games using a lightweight **FastMCP** server. The objective is to **maximize score** while also **visiting many unique locations** under a step budget.
|
| 15 |
|
| 16 |
+
## Core ideas (design choices)
|
| 17 |
|
| 18 |
+
### 1) ReAct with strict tool protocol
|
| 19 |
+
The agent queries an LLM (`Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference) and forces a rigid output schema:
|
| 20 |
+
`THOUGHT`, `TOOL`, `ARGS`. This reduces parsing errors and stabilizes action selection. Inference is run with **temperature=0** and a step-dependent seed for repeatability.
|
| 21 |
|
| 22 |
+
### 2) Hybrid policy: heuristics + LLM
|
| 23 |
+
The LLM is guided by universal scoring heuristics: **take/open/examine first**, then systematic exploration (cardinal + vertical directions). The agent injects lightweight state into the prompt: current score, recent actions, and warnings when progress stalls, to bias the LLM toward productive interactions rather than wandering.
|
| 24 |
|
| 25 |
+
### 3) Progress/loop control
|
| 26 |
+
To avoid common failure modes (ping-ponging or infinite cycling), the agent maintains:
|
| 27 |
+
- `recent_actions` (short window) for repetition detection
|
| 28 |
+
- `failed_actions` to avoid retrying commands that consistently fail
|
| 29 |
+
- `steps_since_progress` defined as “no new rooms AND no score increase”
|
| 30 |
+
When the agent detects stagnation, it (i) queries the map more often, (ii) forces unexplored directions, or (iii) switches to interaction verbs (examine/open/take).
|
| 31 |
|
| 32 |
+
### 4) Server-side map and memory tools
|
| 33 |
+
The MCP server exposes four tools:
|
| 34 |
+
- `play_action(action)`: run a game command
|
| 35 |
+
- `inventory()`: list carried items
|
| 36 |
+
- `memory()`: compact state summary (score/moves/recent actions)
|
| 37 |
+
- `get_map()`: explored graph (location → exits → destination)
|
| 38 |
+
The server updates the exploration graph by tracking movement commands and extracting room titles from observations.
|
| 39 |
|
| 40 |
+
### 5) Critical fix: prevent “location corruption” from tool outputs
|
| 41 |
+
A major bug observed during evaluation was that the agent sometimes treated `get_map()`/`memory()` output headers (e.g., “Explored Locations and Exits:”) as *room names*, corrupting location tracking and causing loops. I fixed this with two complementary measures:
|
| 42 |
+
1) **Agent-side gating**: location is updated **only** after `play_action` (never after `get_map`/`memory`/`inventory`), and primarily on movement/look commands.
|
| 43 |
+
2) **Server-side prefixing**: non-game tool outputs are prefixed with `[TOOL:...]` to make them unambiguous.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
These choices make the agent more stable across many Jericho games while remaining simple and reproducible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
Reference for Space metadata: https://huggingface.co/docs/hub/spaces-config-reference
|