text-adventure-template

Sleeping

App Files Files Community

DevZoneX commited on 12 days ago

Commit

57900f7

1 Parent(s): 7ec4d32

Final Commit

Browse files

Files changed (2) hide show

README.md +263 -0
mcp_server.py +92 -0

README.md CHANGED Viewed

@@ -57,3 +57,266 @@ python run_agent.py --agent . --game lostpig -v -n 20
 # Run evaluation
 python -m evaluation.evaluate -s . -g lostpig -t 3
 ```

 # Run evaluation
 python -m evaluation.evaluate -s . -g lostpig -t 3
 ```
+---
+# 🧠 MCP ReAct Agent for Text Adventure Games
+This project implements a complete **MCP-based ReAct agent** that plays classic text adventure games (e.g., `zork1`) using a tool-driven architecture.
+It consists of:
+* An **MCP server** exposing the game environment as structured tools
+* A **ReAct-style LLM agent** that reasons and acts via those tools
+* Loop detection, score tracking, and structured parsing
+* Experimental improvements and debugging attempts
+---
+# 📦 Project Structure
+## 1️⃣ MCP Server (`mcp_server.py`)
+Built using `FastMCP`, this server wraps a `TextAdventureEnv` and exposes game functionality as callable tools.
+### Core Features
+#### 🎮 Game State Management
+The `GameState` class manages:
+* Current environment state
+* Score and move tracking
+* Action history (last 50 steps)
+* Explored locations (map tracking)
+* Inventory parsing
+* Location extraction from observations
+---
+## 🛠️ Exposed MCP Tools
+The server provides the following tools:
+### `play_action`
+Executes a game command (e.g., `north`, `take lamp`, `open mailbox`).
+Returns:
+* Game observation
+* Score updates
+* Move count
+* Game over notice
+---
+### `memory`
+Returns a structured summary of:
+* Current location
+* Score
+* Moves
+* Recent actions
+* Current observation
+This helps the agent reason about the current state.
+---
+### `get_map`
+Displays explored locations and directional transitions discovered so far.
+---
+### `inventory`
+Returns cleaned inventory information, parsing object strings from Jericho.
+---
+### `get_valid_actions`
+A fallback tool that returns a **fixed list of possible actions** plus context-aware object interactions based on keywords in the observation.
+Note:
+* `env.get_valid_actions()` was tested and debugged.
+* It **did not work reliably** in this setup.
+* Therefore, I implemented a **manually defined valid action set**.
+* However, using fixed valid actions **did not improve the score**.
+---
+### `get_walkthrough`
+Returns the official Jericho walkthrough (not used in `agent.py`).
+---
+### `get_world_objects`
+Returns all known world objects from Jericho.
+---
+# 🤖 ReAct Agent (`agent.py`)
+The agent is a complete ReAct implementation using:
+* Thought → Tool → Observation loop
+* Structured output parsing
+* Loop detection
+* Score extraction
+* Action validation
+It uses:
+```
+Qwen/Qwen2.5-72B-Instruct
+```
+via HuggingFace Inference API.
+---
+# Agent Architecture
+## ReAct Loop
+At each step:
+1. Build prompt with:
+   * Current score
+   * Recent actions
+   * Current observation
+2. Call LLM
+3. Parse structured response:
+   ```
+   THOUGHT:
+   TOOL:
+   ARGS:
+   ```
+4. Validate tool call
+5. Execute tool via MCP
+6. Update:
+   * Score
+   * History
+   * Visited locations
+7. Detect loops
+---
+## Loop Detection
+If the agent repeats the same action 3 times:
+* It automatically forces a `"look"` action.
+* A warning is injected into the prompt.
+---
+## Tool Validation & Auto-Fixes
+The agent corrects:
+* Invalid tool names
+* Unsupported verbs (e.g., `inspect → examine`)
+* Markdown artifacts in responses
+* JSON formatting errors
+---
+## Score Tracking
+Score is extracted from:
+* `Score: X`
+* `[Score: X | Moves: Y]`
+* Case-insensitive regex matching
+The agent keeps the maximum observed score.
+---
+# 🔬 Experiments & Debugging Attempts
+## 1️⃣ Fixed Valid Actions
+I replaced `env.get_valid_actions()` with a manually defined action set.
+* Added movement commands
+* Basic verbs
+* Context-aware object interactions (lamp, key, mailbox, etc.)
+**Result:**
+* Did not improve score (in contrary it became worse)
+* Agent still plateaued
+---
+## 2️⃣ Debugging `env.get_valid_actions()`
+I attempted to use and debug:
+```python
+env.get_valid_actions()
+```
+However:
+* It consistently failed or returned unusable results
+* Therefore, it was not used in the final setup
+---
+## 3️⃣ Prompt Enrichment with Memory + History
+I experimented with:
+* Injecting full memory output into the prompt
+* Including longer history traces
+* Combining map information + memory + past actions
+**Issue:**
+* Prompt grew very large quickly
+* Context length became inefficient
+* No noticeable improvement in performance
+* Slower inference due to longer inputs
+Therefore, I reverted to a **lightweight context strategy**:
+* Last 3 actions
+* Current observation
+* Current score
+* Loop warning if necessary
+---
+# 📊 Current Performance Characteristics
+* The agent explores systematically
+* Picks up obvious items (lamp, mailbox interactions, etc.)
+* Avoids simple loops
+* Tracks visited locations
+* Maintains structured reasoning
+However:
+* No planning memory across long horizons
+* No true valid action constraint from the environment

mcp_server.py CHANGED Viewed

@@ -187,6 +187,98 @@ def inventory() -> str:
     """
     return get_game().get_inventory()
 # =============================================================================
 # Main

     """
     return get_game().get_inventory()
+@mcp.tool()
+def get_valid_actions() -> str:
+    """
+    Return a list of valid actions the agent can take.
+    Avoids calling env.get_valid_actions(). I have tested env.get_valid_actions() but it does nit work at all. Therfore I have tested with fixed valid actions.
+    """
+    game = get_game()
+    if not game.env:
+        return "Game environment not initialized."
+    # Standard movement & basic verbs
+    actions = [
+        "north", "south", "east", "west",
+        "up", "down", "enter", "exit",
+        "look", "inventory", "take all",
+        "open mailbox", "read", "turn on lamp"
+    ]
+    # Optionally, add objects in current observation
+    obs = game.state.observation.lower()
+    objects = []
+    for word in ["lamp", "key", "mailbox", "sword", "coin"]:
+        if word in obs:
+            objects.append(f"take {word}")
+            objects.append(f"examine {word}")
+            objects.append(f"open {word}")
+    actions.extend(objects)
+    return ", ".join(sorted(set(actions)))
+@mcp.tool()
+def get_walkthrough() -> str:
+    """
+    Get the official Jericho walkthrough for the current game. THIS TOOL IS NOT USED IN AGENT.PY
+    Returns:
+        A step-by-step optimal solution path.
+    """
+    game = get_game()
+    if not game.env or not game.env.env:
+        return "Game environment not initialized."
+    try:
+        walkthrough = game.env.env.get_walkthrough()
+    except Exception as e:
+        return f"Could not retrieve walkthrough: {e}"
+    if not walkthrough:
+        return "No walkthrough available for this game."
+    output = ["Official Walkthrough:\n"]
+    for i, action in enumerate(walkthrough, 1):
+        output.append(f"{i}. {action}")
+    return "\n".join(output)
+@mcp.tool()
+def get_world_objects() -> str:
+    """
+    Get all known objects in the game world (from Jericho).
+    Returns:
+        A list of objects and their locations.
+    """
+    game = get_game()
+    if not game.env or not game.env.env:
+        return "Game environment not initialized."
+    try:
+        objects = game.env.env.get_world_objects()
+    except Exception as e:
+        return f"Could not retrieve world objects: {e}"
+    if not objects:
+        return "No world objects found."
+    output = ["World Objects:\n"]
+    for obj in objects:
+        if isinstance(obj, dict):
+            name = obj.get("name", "Unknown")
+            loc = obj.get("location", "Unknown")
+            output.append(f"- {name} (Location: {loc})")
+        else:
+            output.append(f"- {str(obj)}")
+    return "\n".join(output)
 # =============================================================================
 # Main