Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
Text Adventure Agent Submission
Overview
This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
Approach
To keep general performance high without overfitting to a game, I use a general policy stack rather than game-specific scripts. The main idea is to separate local (short-term) interaction from exploration/planning, but to keep strong failure memory so the agent does not waste turns repeating known bad behavior. At each step, the agent builds candidates from:
- valid actions (when available)
- movement actions
- object-centric actions from observation/inventory
- a compositional set (e.g.,
look under X,follow X,ask A about B,show I to A,put I in T). Compositional templates are only emphasized when the action space is sparse, to avoid destabilizing games that already expose rich parser affordances.
My action choice is score-based in an attempt to mimick RL, and uses contextual features: prior score gain, local/global success rate, first-try bonus, loop penalties, oscillation penalties, tabu (location, action) memory, and anti-repeat rules.
Of course, repeating non-moving known failures is strictly banned. Move failures are stored contextually by observation signature (hashing), so failed moves are avoided in the same local state but can be retried later if state changes.
Exploration is frontier-driven, e.g. the agent tracks discovered exits per location and prioritizes untried exits first. If none are available locally, it computes a shortest known path to the nearest location with untried exits and follows that frontier direction. This gives broad map coverage without hardcoded navigation plans.
The server-side metadata is designed for robustness, each play_action returns machine-readable [META] fields (location, score, change flags, progress labels).
Location derivation combines Jericho state labels with observation structure, plus a hash anchor fallback when movement produces meaningful text change but ambiguous heading data.
Because I ran my tests on an A100 on my own server, LLM selection is optional and used only as a constrained picker over top-ranked candidates, not as an unconstrained command generator. This keeps behavior predictable across games and avoids prompt-only overfitting. I wasn't able to overcome the score limitation of 1 on lostpig, only the number of visited locations.
Files
| File | Description |
|---|---|
agent.py |
ReAct agent with StudentAgent class |
mcp_server.py |
MCP server with game interaction tools |
app.py |
Gradio interface for HF Space |
requirements.txt |
Additional dependencies |
How to Submit
- Fork the template Space:
https://huggingface.co/spaces/LLM-course/text-adventure-template - Clone your fork locally
- Implement your agent in
agent.pyandmcp_server.py - Test locally (see below)
- Push your changes to your Space
- Submit your Space URL on the course platform
Local Testing
# Install dependencies
pip install -r requirements.txt
# Test the MCP server interactively
fastmcp dev mcp_server.py
# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20
# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3