test1

Sleeping

App Files Files Community

test1 / README.md

bouhss

Update README.md

c68e054 verified about 2 months ago

preview code

raw

history blame contribute delete

4.58 kB

	---
	title: Agentic Zork
	emoji: 🎮
	colorFrom: green
	colorTo: purple
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: true
	license: mit
	hf_oauth: true
	short_description: "Third assignment: Playing Zork has never been so boring!"
	---

	# Agentic Zork — Submission

	This Space contains my MCP server and agent for the Agentic Zork / Text Adventure LLM Agent project.
	Evaluation is performed by the course scripts (deterministic runs with a fixed model and seeded trials).

	## Report: ideas and design choices

	This submission focuses on building a robust MCP abstraction layer on top of Jericho and designing a deterministic exploration policy that improves (score, explored locations) within a fixed step budget, without manual gameplay and without using walkthroughs.

	### MCP server (tools and state tracking)

	I implemented a self-contained FastMCP server that wraps Jericho’s `FrotzEnv` directly (no dependency on external project folders). The main interaction tool is:

	- `play_action(action)`: executes a text command and returns the game observation.

	To make the agent reliable and reduce hallucinated/invalid commands, the server also exposes structured telemetry:

	- `status()`: returns a compact JSON snapshot of the current state that the agent can parse deterministically. It includes a stable location identifier (`loc_id`), room name, score/moves, inventory, the last observation, per-location action outcomes (count / moved / score delta / failure flag), known directed edges (`edges_here`), and the list of untried directions at the current location.
	- `valid_actions()`: returns Jericho’s suggested action set (with a safe fallback if the analyzer is unavailable or returns an empty list).
	- `inventory()` / `memory()` / `get_map()`: lightweight helpers for inspection and debugging; only `play_action` consumes an in-game move.

	A key improvement is look-ahead simulation:
	- `peek_action(action)`: simulates an action without committing it by snapshotting/restoring Jericho state (`get_state` / `set_state`) and restoring server-side trackers. This enables planning and loop avoidance while respecting the “no walkthrough” rule (Jericho includes walkthroughs but they are never used).

	Finally, to preserve MCP stdio correctness, the server avoids writing to stdout (JSON-RPC framing) and sends any debug output to stderr only.

	### Agent (policy and heuristics)

	The agent follows an explicit ReAct-style loop: THOUGHT → TOOL (MCP call) → OBSERVATION, repeated for the step budget.
	It is deterministic and exploration-first. At each step it reads `status()` and applies a fixed priority policy:

	1. Explore untried exits first (with a preference for Jericho-validated exits and observation-boosted directions).
	2. Try a bounded number of safe, game-validated interactions (`suggested_interactions`) per location (e.g., open/read/examine/take), skipping destructive actions.
	3. BFS backtracking: when the current location has no frontier, the agent uses the learned directed graph to backtrack to the nearest location that still has untried exits. A simple oscillation guard avoids A→B→A→B bouncing.
	4. Stuck recovery: if there is no progress (no score and no location change) for many steps, the agent runs a short recovery set (look/inventory/examine noun) before escalating.

	When `peek_action` is available, the agent evaluates a small candidate set with look-ahead and selects the action maximizing a utility that heavily rewards score gains and new locations, while penalizing repeated no-ops and failure patterns. This improves robustness across games because it does not rely on a game-specific walkthrough, only on verified actions and exploration structure. The LLM uses the fixed course model (`Qwen/Qwen2.5-72B-Instruct`) and is optional, used only as a last-resort fallback to keep behavior stable under evaluation constraints.

	Trade-off: `peek_action` is the main compute bottleneck; I cap look-ahead to a small candidate set to balance performance and speed. In practice this already gives strong gains over the baseline by improving exploration efficiency and reducing repeated failures.

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `agent.py` \| Student agent (`StudentAgent`) using `status()` + BFS + optional look-ahead \|
	\| `mcp_server.py` \| MCP server exposing `play_action`, `status`, `peek_action`, `valid_actions`, etc. \|
	\| `app.py` \| Minimal Gradio page for HF Spaces \|
	\| `requirements.txt` \| Dependencies \|

	---
	title: Agentic Zork
	emoji: 🎮
	colorFrom: green
	colorTo: purple
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: true
	license: mit
	hf_oauth: true
	short_description: "Third assignment: Playing Zork has never been so boring!"
	---

	# Agentic Zork — Submission

	This Space contains my MCP server and agent for the Agentic Zork / Text Adventure LLM Agent project.
	Evaluation is performed by the course scripts (deterministic runs with a fixed model and seeded trials).

	## Report: ideas and design choices

	This submission focuses on building a robust MCP abstraction layer on top of Jericho and designing a deterministic exploration policy that improves (score, explored locations) within a fixed step budget, without manual gameplay and without using walkthroughs.

	### MCP server (tools and state tracking)

	I implemented a self-contained FastMCP server that wraps Jericho’s `FrotzEnv` directly (no dependency on external project folders). The main interaction tool is:

	- `play_action(action)`: executes a text command and returns the game observation.

	To make the agent reliable and reduce hallucinated/invalid commands, the server also exposes structured telemetry:

	- `status()`: returns a compact JSON snapshot of the current state that the agent can parse deterministically. It includes a stable location identifier (`loc_id`), room name, score/moves, inventory, the last observation, per-location action outcomes (count / moved / score delta / failure flag), known directed edges (`edges_here`), and the list of untried directions at the current location.
	- `valid_actions()`: returns Jericho’s suggested action set (with a safe fallback if the analyzer is unavailable or returns an empty list).
	- `inventory()` / `memory()` / `get_map()`: lightweight helpers for inspection and debugging; only `play_action` consumes an in-game move.

	A key improvement is look-ahead simulation:
	- `peek_action(action)`: simulates an action without committing it by snapshotting/restoring Jericho state (`get_state` / `set_state`) and restoring server-side trackers. This enables planning and loop avoidance while respecting the “no walkthrough” rule (Jericho includes walkthroughs but they are never used).

	Finally, to preserve MCP stdio correctness, the server avoids writing to stdout (JSON-RPC framing) and sends any debug output to stderr only.

	### Agent (policy and heuristics)

	The agent follows an explicit ReAct-style loop: THOUGHT → TOOL (MCP call) → OBSERVATION, repeated for the step budget.
	It is deterministic and exploration-first. At each step it reads `status()` and applies a fixed priority policy:

	1. Explore untried exits first (with a preference for Jericho-validated exits and observation-boosted directions).
	2. Try a bounded number of safe, game-validated interactions (`suggested_interactions`) per location (e.g., open/read/examine/take), skipping destructive actions.
	3. BFS backtracking: when the current location has no frontier, the agent uses the learned directed graph to backtrack to the nearest location that still has untried exits. A simple oscillation guard avoids A→B→A→B bouncing.
	4. Stuck recovery: if there is no progress (no score and no location change) for many steps, the agent runs a short recovery set (look/inventory/examine noun) before escalating.

	When `peek_action` is available, the agent evaluates a small candidate set with look-ahead and selects the action maximizing a utility that heavily rewards score gains and new locations, while penalizing repeated no-ops and failure patterns. This improves robustness across games because it does not rely on a game-specific walkthrough, only on verified actions and exploration structure. The LLM uses the fixed course model (`Qwen/Qwen2.5-72B-Instruct`) and is optional, used only as a last-resort fallback to keep behavior stable under evaluation constraints.

	Trade-off: `peek_action` is the main compute bottleneck; I cap look-ahead to a small candidate set to balance performance and speed. In practice this already gives strong gains over the baseline by improving exploration efficiency and reducing repeated failures.

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `agent.py` \| Student agent (`StudentAgent`) using `status()` + BFS + optional look-ahead \|
	\| `mcp_server.py` \| MCP server exposing `play_action`, `status`, `peek_action`, `valid_actions`, etc. \|
	\| `app.py` \| Minimal Gradio page for HF Spaces \|
	\| `requirements.txt` \| Dependencies \|