text-adventure-template

Sleeping

File size: 6,171 Bytes

---
title: Text Adventure Agent Submission
emoji: "\U0001F5FA"
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "5.12.0"
app_file: app.py
pinned: false
license: mit
---

# Text Adventure Agent Submission

## Overview

This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.

## Approach

<!-- Describe your approach here -->

- What strategy does your agent use?
- What tools did you implement in your MCP server?
- Any interesting techniques or optimizations?

## Files

| File | Description |
|------|-------------|
| `agent.py` | ReAct agent with `StudentAgent` class |
| `mcp_server.py` | MCP server with game interaction tools |
| `app.py` | Gradio interface for HF Space |
| `requirements.txt` | Additional dependencies |

## How to Submit

1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
2. Clone your fork locally
3. Implement your agent in `agent.py` and `mcp_server.py`
4. Test locally (see below)
5. Push your changes to your Space
6. Submit your Space URL on the course platform

## Local Testing

```bash
# Install dependencies
pip install -r requirements.txt

# Test the MCP server interactively
fastmcp dev mcp_server.py

# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20

# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3
```





---





# 🧠 MCP ReAct Agent for Text Adventure Games

This project implements a complete **MCP-based ReAct agent** that plays classic text adventure games (e.g., `zork1`) using a tool-driven architecture.

It consists of:

* An **MCP server** exposing the game environment as structured tools
* A **ReAct-style LLM agent** that reasons and acts via those tools
* Loop detection, score tracking, and structured parsing
* Experimental improvements and debugging attempts

---

# 📦 Project Structure

## 1️⃣ MCP Server (`mcp_server.py`)

Built using `FastMCP`, this server wraps a `TextAdventureEnv` and exposes game functionality as callable tools.

### Core Features

#### 🎮 Game State Management

The `GameState` class manages:

* Current environment state
* Score and move tracking
* Action history (last 50 steps)
* Explored locations (map tracking)
* Inventory parsing
* Location extraction from observations

---

## 🛠️ Exposed MCP Tools

The server provides the following tools:

### `play_action`

Executes a game command (e.g., `north`, `take lamp`, `open mailbox`).

Returns:

* Game observation
* Score updates
* Move count
* Game over notice

---

### `memory`

Returns a structured summary of:

* Current location
* Score
* Moves
* Recent actions
* Current observation

This helps the agent reason about the current state.

---

### `get_map`

Displays explored locations and directional transitions discovered so far.

---

### `inventory`

Returns cleaned inventory information, parsing object strings from Jericho.

---

### `get_valid_actions`

A fallback tool that returns a **fixed list of possible actions** plus context-aware object interactions based on keywords in the observation.

Note:

* `env.get_valid_actions()` was tested and debugged.
* It **did not work reliably** in this setup.
* Therefore, I implemented a **manually defined valid action set**.
* However, using fixed valid actions **did not improve the score**.

---

### `get_walkthrough`

Returns the official Jericho walkthrough (not used in `agent.py`).

---

### `get_world_objects`

Returns all known world objects from Jericho.

---

# 🤖 ReAct Agent (`agent.py`)

The agent is a complete ReAct implementation using:

* Thought → Tool → Observation loop
* Structured output parsing
* Loop detection
* Score extraction
* Action validation

It uses:

```
Qwen/Qwen2.5-72B-Instruct
```

via HuggingFace Inference API.

---

# Agent Architecture

## ReAct Loop

At each step:

1. Build prompt with:

   * Current score
   * Recent actions
   * Current observation
2. Call LLM
3. Parse structured response:

   ```
   THOUGHT:
   TOOL:
   ARGS:
   ```
4. Validate tool call
5. Execute tool via MCP
6. Update:

   * Score
   * History
   * Visited locations
7. Detect loops

---

## Loop Detection

If the agent repeats the same action 3 times:

* It automatically forces a `"look"` action.
* A warning is injected into the prompt.

---

## Tool Validation & Auto-Fixes

The agent corrects:

* Invalid tool names
* Unsupported verbs (e.g., `inspect → examine`)
* Markdown artifacts in responses
* JSON formatting errors

---

## Score Tracking

Score is extracted from:

* `Score: X`
* `[Score: X | Moves: Y]`
* Case-insensitive regex matching

The agent keeps the maximum observed score.

---

# 🔬 Experiments & Debugging Attempts

## 1️⃣ Fixed Valid Actions

I replaced `env.get_valid_actions()` with a manually defined action set.

* Added movement commands
* Basic verbs
* Context-aware object interactions (lamp, key, mailbox, etc.)

**Result:**

* Did not improve score (in contrary it became worse)
* Agent still plateaued

---

## 2️⃣ Debugging `env.get_valid_actions()`

I attempted to use and debug:

```python
env.get_valid_actions()
```

However:

* It consistently failed or returned unusable results
* Therefore, it was not used in the final setup

---

## 3️⃣ Prompt Enrichment with Memory + History

I experimented with:

* Injecting full memory output into the prompt
* Including longer history traces
* Combining map information + memory + past actions

**Issue:**

* Prompt grew very large quickly
* Context length became inefficient
* No noticeable improvement in performance
* Slower inference due to longer inputs

Therefore, I reverted to a **lightweight context strategy**:

* Last 3 actions
* Current observation
* Current score
* Loop warning if necessary

---

# 📊 Current Performance Characteristics

* The agent explores systematically
* Picks up obvious items (lamp, mailbox interactions, etc.)
* Avoids simple loops
* Tracks visited locations
* Maintains structured reasoning

However:

* No planning memory across long horizons
* No true valid action constraint from the environment