---
title: Agentic Zork
emoji: 🎮
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: true
license: mit
hf_oauth: true
short_description: "Third assignment: Playing Zork has never been so boring!"
---

# Agentic Zork — Submission

This Space contains my MCP server and agent for the **Agentic Zork / Text Adventure LLM Agent** project.  
Evaluation is performed by the course scripts (deterministic runs with a fixed model and seeded trials).

## Report: ideas and design choices

This submission focuses on building a robust **MCP abstraction layer** on top of Jericho and designing a **deterministic exploration policy** that improves (score, explored locations) within a fixed step budget, without manual gameplay and without using walkthroughs.

### MCP server (tools and state tracking)

I implemented a **self-contained FastMCP server** that wraps Jericho’s `FrotzEnv` directly (no dependency on external project folders). The main interaction tool is:

- **`play_action(action)`**: executes a text command and returns the game observation.

To make the agent reliable and reduce hallucinated/invalid commands, the server also exposes structured telemetry:

- **`status()`**: returns a compact **JSON snapshot** of the current state that the agent can parse deterministically. It includes a **stable location identifier** (`loc_id`), room name, score/moves, inventory, the last observation, per-location action outcomes (count / moved / score delta / failure flag), known directed edges (`edges_here`), and the list of **untried directions** at the current location.
- **`valid_actions()`**: returns Jericho’s suggested action set (with a safe fallback if the analyzer is unavailable or returns an empty list).
- **`inventory()` / `memory()` / `get_map()`**: lightweight helpers for inspection and debugging; only `play_action` consumes an in-game move.

A key improvement is **look-ahead simulation**:
- **`peek_action(action)`**: simulates an action without committing it by snapshotting/restoring Jericho state (`get_state` / `set_state`) and restoring server-side trackers. This enables planning and loop avoidance while respecting the “no walkthrough” rule (Jericho includes walkthroughs but they are never used).

Finally, to preserve MCP stdio correctness, the server avoids writing to stdout (JSON-RPC framing) and sends any debug output to stderr only.

### Agent (policy and heuristics)

The agent follows an explicit ReAct-style loop: **THOUGHT → TOOL (MCP call) → OBSERVATION**, repeated for the step budget.
It is deterministic and **exploration-first**. At each step it reads `status()` and applies a fixed priority policy:

1. **Explore untried exits** first (with a preference for Jericho-validated exits and observation-boosted directions).
2. **Try a bounded number of safe, game-validated interactions** (`suggested_interactions`) per location (e.g., open/read/examine/take), skipping destructive actions.
3. **BFS backtracking**: when the current location has no frontier, the agent uses the learned directed graph to backtrack to the nearest location that still has untried exits. A simple oscillation guard avoids A→B→A→B bouncing.
4. **Stuck recovery**: if there is no progress (no score and no location change) for many steps, the agent runs a short recovery set (look/inventory/examine noun) before escalating.

When `peek_action` is available, the agent evaluates a small candidate set with look-ahead and selects the action maximizing a utility that heavily rewards **score gains** and **new locations**, while penalizing repeated no-ops and failure patterns. This improves robustness across games because it does not rely on a game-specific walkthrough, only on verified actions and exploration structure. The LLM uses the **fixed course model** (`Qwen/Qwen2.5-72B-Instruct`) and is optional, used only as a last-resort fallback to keep behavior stable under evaluation constraints.

**Trade-off:** `peek_action` is the main compute bottleneck; I cap look-ahead to a small candidate set to balance performance and speed. In practice this already gives strong gains over the baseline by improving exploration efficiency and reducing repeated failures.

## Files

| File | Description |
|------|-------------|
| `agent.py` | Student agent (`StudentAgent`) using `status()` + BFS + optional look-ahead |
| `mcp_server.py` | MCP server exposing `play_action`, `status`, `peek_action`, `valid_actions`, etc. |
| `app.py` | Minimal Gradio page for HF Spaces |
| `requirements.txt` | Dependencies |