j0eyd's picture
Refine general policy, exploration memory, and README approach
aea1d92

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit

Text Adventure Agent Submission

Overview

This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.

Approach

To keep general performance high without overfitting to a game, I use a general policy stack rather than game-specific scripts. The main idea is to separate local (short-term) interaction from exploration/planning, but to keep strong failure memory so the agent does not waste turns repeating known bad behavior. At each step, the agent builds candidates from:

  1. valid actions (when available)
  2. movement actions
  3. object-centric actions from observation/inventory
  4. a compositional set (e.g., look under X, follow X, ask A about B, show I to A, put I in T). Compositional templates are only emphasized when the action space is sparse, to avoid destabilizing games that already expose rich parser affordances.

My action choice is score-based in an attempt to mimick RL, and uses contextual features: prior score gain, local/global success rate, first-try bonus, loop penalties, oscillation penalties, tabu (location, action) memory, and anti-repeat rules. Of course, repeating non-moving known failures is strictly banned. Move failures are stored contextually by observation signature (hashing), so failed moves are avoided in the same local state but can be retried later if state changes.

Exploration is frontier-driven, e.g. the agent tracks discovered exits per location and prioritizes untried exits first. If none are available locally, it computes a shortest known path to the nearest location with untried exits and follows that frontier direction. This gives broad map coverage without hardcoded navigation plans.

The server-side metadata is designed for robustness, each play_action returns machine-readable [META] fields (location, score, change flags, progress labels). Location derivation combines Jericho state labels with observation structure, plus a hash anchor fallback when movement produces meaningful text change but ambiguous heading data.

Because I ran my tests on an A100 on my own server, LLM selection is optional and used only as a constrained picker over top-ranked candidates, not as an unconstrained command generator. This keeps behavior predictable across games and avoids prompt-only overfitting. I wasn't able to overcome the score limitation of 1 on lostpig, only the number of visited locations.

Files

File Description
agent.py ReAct agent with StudentAgent class
mcp_server.py MCP server with game interaction tools
app.py Gradio interface for HF Space
requirements.txt Additional dependencies

How to Submit

  1. Fork the template Space: https://huggingface.co/spaces/LLM-course/text-adventure-template
  2. Clone your fork locally
  3. Implement your agent in agent.py and mcp_server.py
  4. Test locally (see below)
  5. Push your changes to your Space
  6. Submit your Space URL on the course platform

Local Testing

# Install dependencies
pip install -r requirements.txt

# Test the MCP server interactively
fastmcp dev mcp_server.py

# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20

# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3