test1 / README.md
bouhss's picture
Update README.md
c68e054 verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Agentic Zork
emoji: 🎮
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: true
license: mit
hf_oauth: true
short_description: 'Third assignment: Playing Zork has never been so boring!'

Agentic Zork — Submission

This Space contains my MCP server and agent for the Agentic Zork / Text Adventure LLM Agent project.
Evaluation is performed by the course scripts (deterministic runs with a fixed model and seeded trials).

Report: ideas and design choices

This submission focuses on building a robust MCP abstraction layer on top of Jericho and designing a deterministic exploration policy that improves (score, explored locations) within a fixed step budget, without manual gameplay and without using walkthroughs.

MCP server (tools and state tracking)

I implemented a self-contained FastMCP server that wraps Jericho’s FrotzEnv directly (no dependency on external project folders). The main interaction tool is:

  • play_action(action): executes a text command and returns the game observation.

To make the agent reliable and reduce hallucinated/invalid commands, the server also exposes structured telemetry:

  • status(): returns a compact JSON snapshot of the current state that the agent can parse deterministically. It includes a stable location identifier (loc_id), room name, score/moves, inventory, the last observation, per-location action outcomes (count / moved / score delta / failure flag), known directed edges (edges_here), and the list of untried directions at the current location.
  • valid_actions(): returns Jericho’s suggested action set (with a safe fallback if the analyzer is unavailable or returns an empty list).
  • inventory() / memory() / get_map(): lightweight helpers for inspection and debugging; only play_action consumes an in-game move.

A key improvement is look-ahead simulation:

  • peek_action(action): simulates an action without committing it by snapshotting/restoring Jericho state (get_state / set_state) and restoring server-side trackers. This enables planning and loop avoidance while respecting the “no walkthrough” rule (Jericho includes walkthroughs but they are never used).

Finally, to preserve MCP stdio correctness, the server avoids writing to stdout (JSON-RPC framing) and sends any debug output to stderr only.

Agent (policy and heuristics)

The agent follows an explicit ReAct-style loop: THOUGHT → TOOL (MCP call) → OBSERVATION, repeated for the step budget. It is deterministic and exploration-first. At each step it reads status() and applies a fixed priority policy:

  1. Explore untried exits first (with a preference for Jericho-validated exits and observation-boosted directions).
  2. Try a bounded number of safe, game-validated interactions (suggested_interactions) per location (e.g., open/read/examine/take), skipping destructive actions.
  3. BFS backtracking: when the current location has no frontier, the agent uses the learned directed graph to backtrack to the nearest location that still has untried exits. A simple oscillation guard avoids A→B→A→B bouncing.
  4. Stuck recovery: if there is no progress (no score and no location change) for many steps, the agent runs a short recovery set (look/inventory/examine noun) before escalating.

When peek_action is available, the agent evaluates a small candidate set with look-ahead and selects the action maximizing a utility that heavily rewards score gains and new locations, while penalizing repeated no-ops and failure patterns. This improves robustness across games because it does not rely on a game-specific walkthrough, only on verified actions and exploration structure. The LLM uses the fixed course model (Qwen/Qwen2.5-72B-Instruct) and is optional, used only as a last-resort fallback to keep behavior stable under evaluation constraints.

Trade-off: peek_action is the main compute bottleneck; I cap look-ahead to a small candidate set to balance performance and speed. In practice this already gives strong gains over the baseline by improving exploration efficiency and reducing repeated failures.

Files

File Description
agent.py Student agent (StudentAgent) using status() + BFS + optional look-ahead
mcp_server.py MCP server exposing play_action, status, peek_action, valid_actions, etc.
app.py Minimal Gradio page for HF Spaces
requirements.txt Dependencies