gbl1357 commited on
Commit
ebd9480
·
verified ·
1 Parent(s): 82a203a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -69
README.md CHANGED
@@ -11,80 +11,37 @@ pinned: false
11
 
12
  # AgenticZork — MCP ReAct Agent for Jericho Text Adventures
13
 
14
- This project implements a **ReAct-style agent** that plays **Jericho/Z-machine** text-adventure games through a small **FastMCP** server. The goal is to **maximize game score** and **explore as many distinct locations as possible** under a fixed step budget.
15
 
16
- ## Components
17
 
18
- - **`agent.py`** — Student agent: ReAct loop + heuristics + lightweight state tracking; LLM inference via Hugging Face (`Qwen/Qwen2.5-72B-Instruct`).
19
- - **`mcp_server.py`** MCP server exposing game interaction tools and maintaining server-side exploration logs.
 
20
 
21
- ## Tool interface (MCP)
 
22
 
23
- The server exposes four tools used by the agent:
24
- - `play_action(action: str)`: execute an in-game command and return the resulting observation (+ score/move metadata).
25
- - `inventory()`: return current inventory.
26
- - `get_map()`: return an explored-graph view (locations + discovered exits).
27
- - `memory()`: return a compact state summary (location, score, moves, recent actions, current observation).
 
28
 
29
- ## Agent policy (high level)
 
 
 
 
 
 
30
 
31
- At each step, the agent:
32
- 1. Builds a prompt from: **current observation**, **score**, **recent actions**, and **anti-loop warnings**.
33
- 2. Queries the LLM with a strict output schema:
34
- - `THOUGHT: ...`
35
- - `TOOL: ...`
36
- - `ARGS: {...}`
37
- 3. Executes the selected MCP tool and updates internal trackers:
38
- - `locations_explored` (unique room titles)
39
- - `failed_actions` (actions that repeatedly fail)
40
- - `recent_actions` (loop detection)
41
- - `steps_since_progress` (no new rooms and no score increase)
42
 
43
- Heuristics favor: **take/open/examine** on salient nouns, then **systematic exploration** (compass directions), while discouraging immediate backtracking.
44
-
45
- ## Critical bug fixed: location tracking corrupted by tool outputs
46
-
47
- ### Failure mode
48
- The agent originally extracted the “current room” from the **first plausible header line** of the latest text. When the agent called `get_map()` / `memory()` / `inventory()`, those tool responses begin with headers like:
49
- - `Explored Locations and Exits:`
50
- - `Current State:`
51
- - `Inventory:`
52
-
53
- These headers were incorrectly interpreted as room titles, causing the agent to set its location to e.g. `Explored Locations and Exits:` and subsequently break:
54
- - visited-location accounting,
55
- - progress detection,
56
- - exploration decisions (leading to repeated `get_map()` calls).
57
-
58
- ### Fix (agent-side, primary)
59
- Location updates are now **gated**:
60
- - The agent updates location **only after `play_action`** (never after `get_map` / `memory` / `inventory`).
61
- - Location updates are preferred on **movement / look-like actions** (`n/s/e/w/u/d`, `go <dir>`, `look`), and conservative otherwise.
62
-
63
- Implemented via:
64
- - `_should_update_location_from_action()`
65
- - `_maybe_update_location()`
66
-
67
- ### Hardening (server-side)
68
- All non-game tools are prefixed to make them unambiguous:
69
- - `[TOOL:get_map]`, `[TOOL:memory]`, `[TOOL:inventory]`
70
-
71
- This prevents any future parser from confusing tool output with a room title.
72
-
73
- ### Hardening (parser-side)
74
- `_extract_location()` additionally ignores:
75
- - tool prefixes/headers,
76
- - score/move metadata,
77
- - common failure messages (e.g., “I don’t understand”, “You can’t go that way”).
78
-
79
- ## Reproducibility
80
-
81
- - LLM inference uses `temperature=0.0` and a deterministic `seed` schedule (`seed + step`).
82
- - Server logs can be saved per session (actions, location transitions, score deltas).
83
-
84
- ## Running
85
-
86
- ### Environment
87
- Create a `.env` file:
88
- ```bash
89
- HF_TOKEN=hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXX
90
 
 
 
11
 
12
  # AgenticZork — MCP ReAct Agent for Jericho Text Adventures
13
 
14
+ This project implements a **ReAct-style agent** for Jericho/Z-machine text adventure games using a lightweight **FastMCP** server. The objective is to **maximize score** while also **visiting many unique locations** under a step budget.
15
 
16
+ ## Core ideas (design choices)
17
 
18
+ ### 1) ReAct with strict tool protocol
19
+ The agent queries an LLM (`Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference) and forces a rigid output schema:
20
+ `THOUGHT`, `TOOL`, `ARGS`. This reduces parsing errors and stabilizes action selection. Inference is run with **temperature=0** and a step-dependent seed for repeatability.
21
 
22
+ ### 2) Hybrid policy: heuristics + LLM
23
+ The LLM is guided by universal scoring heuristics: **take/open/examine first**, then systematic exploration (cardinal + vertical directions). The agent injects lightweight state into the prompt: current score, recent actions, and warnings when progress stalls, to bias the LLM toward productive interactions rather than wandering.
24
 
25
+ ### 3) Progress/loop control
26
+ To avoid common failure modes (ping-ponging or infinite cycling), the agent maintains:
27
+ - `recent_actions` (short window) for repetition detection
28
+ - `failed_actions` to avoid retrying commands that consistently fail
29
+ - `steps_since_progress` defined as “no new rooms AND no score increase”
30
+ When the agent detects stagnation, it (i) queries the map more often, (ii) forces unexplored directions, or (iii) switches to interaction verbs (examine/open/take).
31
 
32
+ ### 4) Server-side map and memory tools
33
+ The MCP server exposes four tools:
34
+ - `play_action(action)`: run a game command
35
+ - `inventory()`: list carried items
36
+ - `memory()`: compact state summary (score/moves/recent actions)
37
+ - `get_map()`: explored graph (location → exits → destination)
38
+ The server updates the exploration graph by tracking movement commands and extracting room titles from observations.
39
 
40
+ ### 5) Critical fix: prevent “location corruption” from tool outputs
41
+ A major bug observed during evaluation was that the agent sometimes treated `get_map()`/`memory()` output headers (e.g., “Explored Locations and Exits:”) as *room names*, corrupting location tracking and causing loops. I fixed this with two complementary measures:
42
+ 1) **Agent-side gating**: location is updated **only** after `play_action` (never after `get_map`/`memory`/`inventory`), and primarily on movement/look commands.
43
+ 2) **Server-side prefixing**: non-game tool outputs are prefixed with `[TOOL:...]` to make them unambiguous.
 
 
 
 
 
 
 
44
 
45
+ These choices make the agent more stable across many Jericho games while remaining simple and reproducible.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
+ Reference for Space metadata: https://huggingface.co/docs/hub/spaces-config-reference