malekfeki14's picture
Update README.md
3ba35c6 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Text Adventure Agent Submission
emoji: 🗺️
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit

Text Adventure Agent Submission

Overview

This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.

Approach

This project implements an interactive fiction (Jericho-style) agent with an MCP server that exposes both core gameplay interaction and extra state to support more reliable decision-making. The main idea is to reduce “parser thrashing” (repeating ineffective commands) while pushing the agent toward score-relevant progress and broad exploration.

1) Stable state via MCP tools

A key failure mode in IF agents is confusing locations due to variable descriptions. I added an MCP location() tool that returns a stable room identifier (best-effort: first line of observation, usually the room title). The agent uses this as a primary key for per-room memory (tried / failed action sets). The server also provides inventory(), objects_here(), and get_map() so the agent can condition prompts on what it carries, what is likely present, and which locations have already been visited.

2) Prompt enrichment and tool-safety

The agent prompt includes: current observation, stable location, inventory, objects_here, explored map, valid_actions, recent actions/locations, and local tried/failed lists. This gives the LLM enough context to avoid repeating mistakes and to select actions that make progress. Tool results are treated defensively: the agent never assumes a particular field exists and always converts tool outputs to text safely.

3) Score-first action selection with puzzle verb bias

When the LLM output is missing/invalid/repeated, a deterministic fallback selects actions with a clear scoring/puzzle profile: treasure-related “take/get” first, then verbs like open/unlock/read/light/push/pull/give/put, then any untried valid action. Movement is deliberately de-prioritized until the agent has attempted local interactions, which improves puzzle progress and reduces aimless wandering.

4) Darkness / grue avoidance

Many IF games punish random movement in darkness. A dedicated darkness policy detects “too dark / pitch black / grue” cues and forces either lighting actions (lamp/torch) or a safe retreat (out/up/north/etc.) rather than exploring blindly.

5) Stall/failure detection

Exploration agents often loop between two rooms (A↔B). The agent detects oscillation patterns in recent locations and reduces movement choices when looping. It also classifies failures using common parser/soft-fail patterns (“you can’t…”, “not here”, “nothing happens…”, etc.) and detects stalls when observations remain identical (or near-identical). Failed actions are blacklisted per-location, and repeated failures can be blacklisted globally to prevent re-trying the same ineffective command across rooms.

6) Improved debugging via richer history

To support iteration and grading, the agent records step-by-step history including action, location, before/after observation, score/move deltas, and failed/stalled flags, and can print the full result of each step in verbose mode. This makes it easy to diagnose why exploration or scoring stalls.

Overall, these ideas aim to improve reliability (fewer invalid actions), safety (avoid death in darkness), and performance (more meaningful exploration and puzzle progress) while keeping the system robust to missing Jericho features.

Tools implemented in the MCP server

  • play_action(action: str) -> str: send an action to the game and return observation + [Score | Moves]
  • memory() -> str: compact summary of recent history + score/moves + visited count
  • location() -> str: stable location identifier (room title best-effort)
  • inventory() -> str: Jericho inventory if available (safe fallback)
  • valid_actions() -> str: Jericho valid actions (safe fallback)
  • objects() -> str: global object names (Jericho)
  • objects_here() -> str: best-effort room objects (fallback to global objects)
  • get_map() -> str: explored locations list (visit tracking)

Files

File Description
agent.py ReAct agent with StudentAgent class
mcp_server.py MCP server with game interaction tools
app.py Gradio interface for HF Space
requirements.txt Additional dependencies

How to Submit

  1. Fork the template Space: https://huggingface.co/spaces/LLM-course/text-adventure-template
  2. Clone your fork locally
  3. Implement your agent in agent.py and mcp_server.py
  4. Test locally (see below)
  5. Push your changes to your Space
  6. Submit your Space URL on the course platform

Local Testing

# Install dependencies
pip install -r requirements.txt

# Test the MCP server interactively
fastmcp dev mcp_server.py

# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20

# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3