| --- |
| title: Agentic Zork |
| emoji: 🎮 |
| colorFrom: green |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 6.5.1 |
| app_file: app.py |
| pinned: true |
| license: mit |
| hf_oauth: true |
| short_description: "Third assignment: Playing Zork has never been so boring!" |
| --- |
| |
| # Agentic Zork — Submission |
|
|
| This Space contains my MCP server and agent for the **Agentic Zork / Text Adventure LLM Agent** project. |
| Evaluation is performed by the course scripts (deterministic runs with a fixed model and seeded trials). |
|
|
| ## Report: ideas and design choices |
|
|
| This submission focuses on building a robust **MCP abstraction layer** on top of Jericho and designing a **deterministic exploration policy** that improves (score, explored locations) within a fixed step budget, without manual gameplay and without using walkthroughs. |
|
|
| ### MCP server (tools and state tracking) |
|
|
| I implemented a **self-contained FastMCP server** that wraps Jericho’s `FrotzEnv` directly (no dependency on external project folders). The main interaction tool is: |
|
|
| - **`play_action(action)`**: executes a text command and returns the game observation. |
| |
| To make the agent reliable and reduce hallucinated/invalid commands, the server also exposes structured telemetry: |
| |
| - **`status()`**: returns a compact **JSON snapshot** of the current state that the agent can parse deterministically. It includes a **stable location identifier** (`loc_id`), room name, score/moves, inventory, the last observation, per-location action outcomes (count / moved / score delta / failure flag), known directed edges (`edges_here`), and the list of **untried directions** at the current location. |
| - **`valid_actions()`**: returns Jericho’s suggested action set (with a safe fallback if the analyzer is unavailable or returns an empty list). |
| - **`inventory()` / `memory()` / `get_map()`**: lightweight helpers for inspection and debugging; only `play_action` consumes an in-game move. |
| |
| A key improvement is **look-ahead simulation**: |
| - **`peek_action(action)`**: simulates an action without committing it by snapshotting/restoring Jericho state (`get_state` / `set_state`) and restoring server-side trackers. This enables planning and loop avoidance while respecting the “no walkthrough” rule (Jericho includes walkthroughs but they are never used). |
| |
| Finally, to preserve MCP stdio correctness, the server avoids writing to stdout (JSON-RPC framing) and sends any debug output to stderr only. |
| |
| ### Agent (policy and heuristics) |
| |
| The agent follows an explicit ReAct-style loop: **THOUGHT → TOOL (MCP call) → OBSERVATION**, repeated for the step budget. |
| It is deterministic and **exploration-first**. At each step it reads `status()` and applies a fixed priority policy: |
| |
| 1. **Explore untried exits** first (with a preference for Jericho-validated exits and observation-boosted directions). |
| 2. **Try a bounded number of safe, game-validated interactions** (`suggested_interactions`) per location (e.g., open/read/examine/take), skipping destructive actions. |
| 3. **BFS backtracking**: when the current location has no frontier, the agent uses the learned directed graph to backtrack to the nearest location that still has untried exits. A simple oscillation guard avoids A→B→A→B bouncing. |
| 4. **Stuck recovery**: if there is no progress (no score and no location change) for many steps, the agent runs a short recovery set (look/inventory/examine noun) before escalating. |
|
|
| When `peek_action` is available, the agent evaluates a small candidate set with look-ahead and selects the action maximizing a utility that heavily rewards **score gains** and **new locations**, while penalizing repeated no-ops and failure patterns. This improves robustness across games because it does not rely on a game-specific walkthrough, only on verified actions and exploration structure. The LLM uses the **fixed course model** (`Qwen/Qwen2.5-72B-Instruct`) and is optional, used only as a last-resort fallback to keep behavior stable under evaluation constraints. |
|
|
| **Trade-off:** `peek_action` is the main compute bottleneck; I cap look-ahead to a small candidate set to balance performance and speed. In practice this already gives strong gains over the baseline by improving exploration efficiency and reducing repeated failures. |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `agent.py` | Student agent (`StudentAgent`) using `status()` + BFS + optional look-ahead | |
| | `mcp_server.py` | MCP server exposing `play_action`, `status`, `peek_action`, `valid_actions`, etc. | |
| | `app.py` | Minimal Gradio page for HF Spaces | |
| | `requirements.txt` | Dependencies | |