Spaces:
Sleeping
Sleeping
Final submission
Browse files- .ipynb_checkpoints/README-checkpoint.md +212 -0
- .ipynb_checkpoints/agent-checkpoint.py +614 -0
- .ipynb_checkpoints/app-checkpoint.py +36 -0
- .ipynb_checkpoints/mcp_server-checkpoint.py +520 -0
- README.md +157 -4
- agent.py +507 -198
- mcp_server.py +331 -20
.ipynb_checkpoints/README-checkpoint.md
ADDED
|
@@ -0,0 +1,212 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Text Adventure Agent Submission
|
| 3 |
+
emoji: "\U0001F5FA"
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "5.12.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Text Adventure Agent Submission
|
| 14 |
+
|
| 15 |
+
## Overview
|
| 16 |
+
|
| 17 |
+
This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
|
| 18 |
+
|
| 19 |
+
## Approach
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
# My Report (MCP-Based Text Adventure Agent )
|
| 23 |
+
## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
|
| 24 |
+
|
| 25 |
+
## Overview
|
| 26 |
+
|
| 27 |
+
This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
|
| 28 |
+
|
| 29 |
+
The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## 1. MCP Server Improvements
|
| 34 |
+
|
| 35 |
+
The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
|
| 36 |
+
|
| 37 |
+
### 1.1 Robust Location Extraction
|
| 38 |
+
|
| 39 |
+
Instead of relying solely on the first line of the observation, the server now:
|
| 40 |
+
|
| 41 |
+
- Filters out status-like lines (score, moves, headers, bracketed text)
|
| 42 |
+
- Detects likely room titles heuristically
|
| 43 |
+
- Falls back gracefully when uncertain
|
| 44 |
+
|
| 45 |
+
This improves compatibility across different text adventure engines.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
### 1.2 Structured Memory Output
|
| 50 |
+
|
| 51 |
+
The `memory()` tool was redesigned to provide:
|
| 52 |
+
|
| 53 |
+
- Current game
|
| 54 |
+
- Location
|
| 55 |
+
- Score and moves
|
| 56 |
+
- Extracted visible objects (best-effort heuristics)
|
| 57 |
+
- Mentioned exits
|
| 58 |
+
- Recent action history
|
| 59 |
+
- Full current observation
|
| 60 |
+
|
| 61 |
+
This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
### 1.3 Intelligent Map Construction
|
| 66 |
+
|
| 67 |
+
Movement tracking is no longer naive. A move is recorded only if:
|
| 68 |
+
|
| 69 |
+
- The location actually changes, and
|
| 70 |
+
- The observation does not contain known movement failure phrases.
|
| 71 |
+
|
| 72 |
+
This prevents corrupt map edges and keeps spatial reasoning reliable.
|
| 73 |
+
|
| 74 |
+
The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
### 1.4 Robust Inventory Handling
|
| 79 |
+
|
| 80 |
+
Inventory retrieval now:
|
| 81 |
+
|
| 82 |
+
- Uses structured state inventory when available
|
| 83 |
+
- Falls back to issuing the `inventory` command
|
| 84 |
+
- Cleans and normalizes item strings
|
| 85 |
+
|
| 86 |
+
This ensures cross-game compatibility.
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## 2. Agent-Side Stability and Reasoning Enhancements
|
| 91 |
+
|
| 92 |
+
The ReAct loop was significantly extended to address common LLM failure modes.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
### 2.1 Context Refresh Strategy
|
| 97 |
+
|
| 98 |
+
The agent periodically refreshes:
|
| 99 |
+
|
| 100 |
+
- `memory()` (state grounding)
|
| 101 |
+
- `inventory()` (after item acquisition)
|
| 102 |
+
- `get_map()` (navigation support)
|
| 103 |
+
|
| 104 |
+
This improves decision consistency without consuming extra game moves.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
### 2.2 Action Validation and Normalization
|
| 109 |
+
|
| 110 |
+
Before execution:
|
| 111 |
+
|
| 112 |
+
- Tool names are validated
|
| 113 |
+
- Invalid verbs are mapped to supported equivalents
|
| 114 |
+
- Formatting noise is removed
|
| 115 |
+
- Actions are normalized to consistent lower-case grammar
|
| 116 |
+
|
| 117 |
+
This dramatically reduces invalid command generation.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
### 2.3 Multi-Layer Anti-Loop Mechanisms
|
| 122 |
+
|
| 123 |
+
Several defensive layers were introduced:
|
| 124 |
+
|
| 125 |
+
#### (A) Action Repetition Guard
|
| 126 |
+
If the same action appears three times consecutively, the agent forces a reset (`look`).
|
| 127 |
+
|
| 128 |
+
#### (B) Location-Aware Movement Failure Blocking
|
| 129 |
+
Movement attempts are tracked per `(location, direction)` pair.
|
| 130 |
+
If a direction fails multiple times from the same location, it is blocked.
|
| 131 |
+
|
| 132 |
+
#### (C) Thought + Action + Location Blocking
|
| 133 |
+
A normalized thought signature is computed.
|
| 134 |
+
If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
|
| 135 |
+
|
| 136 |
+
This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
### 2.4 Controlled Movement Policy
|
| 141 |
+
|
| 142 |
+
The agent avoids random wandering by:
|
| 143 |
+
|
| 144 |
+
- Encouraging local interaction before movement
|
| 145 |
+
- Prioritizing dominant objects in the observation
|
| 146 |
+
- Blocking repeated failed transitions
|
| 147 |
+
|
| 148 |
+
This reduces wasted exploration steps.
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## 3. Design Philosophy
|
| 153 |
+
|
| 154 |
+
The key improvements are architectural rather than game-specific:
|
| 155 |
+
|
| 156 |
+
- Clear separation between environment (MCP server) and reasoning (LLM agent)
|
| 157 |
+
- Structured state exposure instead of raw narrative text
|
| 158 |
+
- Defensive programming against repetition and invalid behavior
|
| 159 |
+
- Heuristic generalization instead of hardcoded walkthrough logic
|
| 160 |
+
|
| 161 |
+
The system is modular, interpretable, and extensible.
|
| 162 |
+
|
| 163 |
+
---
|
| 164 |
+
|
| 165 |
+
## 4. Conclusion
|
| 166 |
+
|
| 167 |
+
Compared to the baseline template, this implementation introduces:
|
| 168 |
+
|
| 169 |
+
- Structured memory representation
|
| 170 |
+
- Robust location extraction
|
| 171 |
+
- Intelligent map tracking
|
| 172 |
+
- Inventory normalization
|
| 173 |
+
- Multi-layer loop prevention
|
| 174 |
+
- Location-aware movement validation
|
| 175 |
+
- Thought-action repetition blocking
|
| 176 |
+
- Controlled exploration policy
|
| 177 |
+
|
| 178 |
+
The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
|
| 179 |
+
|
| 180 |
+
## Files
|
| 181 |
+
|
| 182 |
+
| File | Description |
|
| 183 |
+
|------|-------------|
|
| 184 |
+
| `agent.py` | ReAct agent with `StudentAgent` class |
|
| 185 |
+
| `mcp_server.py` | MCP server with game interaction tools |
|
| 186 |
+
| `app.py` | Gradio interface for HF Space |
|
| 187 |
+
| `requirements.txt` | Additional dependencies |
|
| 188 |
+
|
| 189 |
+
## How to Submit
|
| 190 |
+
|
| 191 |
+
1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
|
| 192 |
+
2. Clone your fork locally
|
| 193 |
+
3. Implement your agent in `agent.py` and `mcp_server.py`
|
| 194 |
+
4. Test locally (see below)
|
| 195 |
+
5. Push your changes to your Space
|
| 196 |
+
6. Submit your Space URL on the course platform
|
| 197 |
+
|
| 198 |
+
## Local Testing
|
| 199 |
+
|
| 200 |
+
```bash
|
| 201 |
+
# Install dependencies
|
| 202 |
+
pip install -r requirements.txt
|
| 203 |
+
|
| 204 |
+
# Test the MCP server interactively
|
| 205 |
+
fastmcp dev mcp_server.py
|
| 206 |
+
|
| 207 |
+
# Run your agent on a game
|
| 208 |
+
python run_agent.py --agent . --game lostpig -v -n 20
|
| 209 |
+
|
| 210 |
+
# Run evaluation
|
| 211 |
+
python -m evaluation.evaluate -s . -g lostpig -t 3
|
| 212 |
+
```
|
.ipynb_checkpoints/agent-checkpoint.py
ADDED
|
@@ -0,0 +1,614 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
: MCP ReAct Agent (adapted for your MCP server)
|
| 3 |
+
|
| 4 |
+
Key upgrades:
|
| 5 |
+
- Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
|
| 6 |
+
- Injects those outputs into the LLM prompt (LLM-friendly context)
|
| 7 |
+
- Updates score from BOTH play_action output and memory output
|
| 8 |
+
- Keeps loop detection + action normalization
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import json
|
| 12 |
+
import os
|
| 13 |
+
import re
|
| 14 |
+
from dataclasses import dataclass, field
|
| 15 |
+
from typing import Optional
|
| 16 |
+
|
| 17 |
+
from dotenv import load_dotenv
|
| 18 |
+
from huggingface_hub import InferenceClient
|
| 19 |
+
|
| 20 |
+
load_dotenv()
|
| 21 |
+
|
| 22 |
+
# =============================================================================
|
| 23 |
+
# LLM Configuration - DO NOT MODIFY
|
| 24 |
+
# =============================================================================
|
| 25 |
+
|
| 26 |
+
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 27 |
+
|
| 28 |
+
_hf_token = os.getenv("HF_TOKEN")
|
| 29 |
+
if not _hf_token:
|
| 30 |
+
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
| 31 |
+
|
| 32 |
+
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
|
| 36 |
+
"""Call the LLM with the given prompt."""
|
| 37 |
+
messages = [
|
| 38 |
+
{"role": "system", "content": system_prompt},
|
| 39 |
+
{"role": "user", "content": prompt},
|
| 40 |
+
]
|
| 41 |
+
|
| 42 |
+
response = LLM_CLIENT.chat.completions.create(
|
| 43 |
+
model=LLM_MODEL,
|
| 44 |
+
messages=messages,
|
| 45 |
+
temperature=0.0,
|
| 46 |
+
max_tokens=max_tokens,
|
| 47 |
+
seed=seed,
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
return response.choices[0].message.content
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
@dataclass
|
| 54 |
+
class RunResult:
|
| 55 |
+
"""Result of running the agent. Do not modify this class."""
|
| 56 |
+
final_score: int
|
| 57 |
+
max_score: int
|
| 58 |
+
moves: int
|
| 59 |
+
locations_visited: set[str]
|
| 60 |
+
game_completed: bool
|
| 61 |
+
error: Optional[str] = None
|
| 62 |
+
history: list[tuple[str, str, str]] = field(default_factory=list)
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
# =============================================================================
|
| 66 |
+
# System Prompt
|
| 67 |
+
# =============================================================================
|
| 68 |
+
SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
|
| 69 |
+
|
| 70 |
+
Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
|
| 71 |
+
|
| 72 |
+
This game is small and objective-focused. Avoid unnecessary wandering.
|
| 73 |
+
|
| 74 |
+
AVAILABLE TOOLS (use via MCP):
|
| 75 |
+
1. play_action - Execute valid game commands.
|
| 76 |
+
2. memory - Get structured summary of current state and recent actions.
|
| 77 |
+
3. get_map - See explored locations.
|
| 78 |
+
4. inventory - Check carried items.
|
| 79 |
+
|
| 80 |
+
VALID ACTION STYLE:
|
| 81 |
+
Movement:
|
| 82 |
+
- north, south, east, west, up, down
|
| 83 |
+
- n, s, e, w, u, d
|
| 84 |
+
|
| 85 |
+
Core actions:
|
| 86 |
+
- look
|
| 87 |
+
- examine <thing>
|
| 88 |
+
- take <item>, drop <item>
|
| 89 |
+
- open <thing>, close <thing>
|
| 90 |
+
- talk to <character>
|
| 91 |
+
- give <item> to <character>
|
| 92 |
+
- use specific verbs mentioned in observation
|
| 93 |
+
|
| 94 |
+
AVOID:
|
| 95 |
+
- generic verbs like "use"
|
| 96 |
+
- random movement without purpose
|
| 97 |
+
- repeating failed actions
|
| 98 |
+
|
| 99 |
+
--------------------------------------------------
|
| 100 |
+
CORE STRATEGY (IMPORTANT)
|
| 101 |
+
--------------------------------------------------
|
| 102 |
+
|
| 103 |
+
1) DOMINANT OBJECT RULE (VERY IMPORTANT):
|
| 104 |
+
If a specific object or character is repeatedly mentioned in the observation,
|
| 105 |
+
treat it as the main objective.
|
| 106 |
+
|
| 107 |
+
Do NOT leave the area until you:
|
| 108 |
+
- examine it
|
| 109 |
+
- try multiple meaningful interactions
|
| 110 |
+
- or confirm no new interaction is possible
|
| 111 |
+
|
| 112 |
+
Stay focused before exploring elsewhere.
|
| 113 |
+
|
| 114 |
+
2) PROBLEM-SOLVING PRIORITY:
|
| 115 |
+
If the game clearly revolves around one main goal,
|
| 116 |
+
prioritize actions that directly affect that goal instead of exploring new rooms.
|
| 117 |
+
|
| 118 |
+
3) CONTROLLED MOVEMENT:
|
| 119 |
+
Only move if:
|
| 120 |
+
- you have exhausted interactions in the current room
|
| 121 |
+
- or memory/map suggests a new unexplored path is necessary
|
| 122 |
+
|
| 123 |
+
4) LIMITED RETRIES:
|
| 124 |
+
If an action fails once, try a different verb.
|
| 125 |
+
Do NOT repeat the same failed action more than once.
|
| 126 |
+
|
| 127 |
+
5) OBJECT TRANSFORMATION FOCUS:
|
| 128 |
+
If an object seems central, try actions that might change its state:
|
| 129 |
+
- examine
|
| 130 |
+
- open
|
| 131 |
+
- give something
|
| 132 |
+
- use appropriate verbs mentioned in text
|
| 133 |
+
- interact from different angles
|
| 134 |
+
|
| 135 |
+
--------------------------------------------------
|
| 136 |
+
TOOL USAGE RULES
|
| 137 |
+
--------------------------------------------------
|
| 138 |
+
|
| 139 |
+
- Use memory() when uncertain or before repeating behavior.
|
| 140 |
+
- Use get_map() only if navigation becomes necessary.
|
| 141 |
+
- Use inventory() after obtaining items.
|
| 142 |
+
|
| 143 |
+
--------------------------------------------------
|
| 144 |
+
OUTPUT FORMAT (STRICT)
|
| 145 |
+
--------------------------------------------------
|
| 146 |
+
|
| 147 |
+
THOUGHT: <brief reasoning>
|
| 148 |
+
TOOL: <tool_name>
|
| 149 |
+
ARGS: <JSON arguments>
|
| 150 |
+
|
| 151 |
+
Keep THOUGHT short (1-2 sentences).
|
| 152 |
+
Do not repeat the same action multiple times.
|
| 153 |
+
Prefer solving over wandering.
|
| 154 |
+
"""
|
| 155 |
+
|
| 156 |
+
# =============================================================================
|
| 157 |
+
# Student Agent Implementation
|
| 158 |
+
# =============================================================================
|
| 159 |
+
class StudentAgent:
|
| 160 |
+
"""
|
| 161 |
+
MCP ReAct Agent adapted to your MCP server outputs:
|
| 162 |
+
- memory() returns STATE / RECENT / OBSERVATION
|
| 163 |
+
- get_map() returns MAP ...
|
| 164 |
+
- inventory() returns INVENTORY ...
|
| 165 |
+
"""
|
| 166 |
+
|
| 167 |
+
def __init__(self):
|
| 168 |
+
self.history: list[dict] = []
|
| 169 |
+
self.recent_actions: list[str] = []
|
| 170 |
+
self.score: int = 0
|
| 171 |
+
|
| 172 |
+
# Cached tool outputs
|
| 173 |
+
self.last_memory: str = ""
|
| 174 |
+
self.last_map: str = ""
|
| 175 |
+
self.last_inventory: str = ""
|
| 176 |
+
self.last_observation: str = ""
|
| 177 |
+
|
| 178 |
+
# Exploration / anti-loop state
|
| 179 |
+
self.visit_counts: dict[str, int] = {}
|
| 180 |
+
self.loc_move_failures: dict[tuple[str, str], int] = {}
|
| 181 |
+
self.pending_move: Optional[tuple[str, str]] = None
|
| 182 |
+
|
| 183 |
+
# NEW: prevent repeating same thought+action at same location
|
| 184 |
+
self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
|
| 185 |
+
|
| 186 |
+
# ------------------------------------------------------------
|
| 187 |
+
# Thought normalization helper
|
| 188 |
+
# ------------------------------------------------------------
|
| 189 |
+
def _thought_sig(self, thought: str) -> str:
|
| 190 |
+
t = (thought or "").lower()
|
| 191 |
+
t = re.sub(r"[^a-z0-9\s]", " ", t)
|
| 192 |
+
t = re.sub(r"\s+", " ", t).strip()
|
| 193 |
+
return " ".join(t.split()[:12])
|
| 194 |
+
|
| 195 |
+
async def run(
|
| 196 |
+
self,
|
| 197 |
+
client,
|
| 198 |
+
game: str,
|
| 199 |
+
max_steps: int,
|
| 200 |
+
seed: int,
|
| 201 |
+
verbose: bool = False,
|
| 202 |
+
) -> RunResult:
|
| 203 |
+
|
| 204 |
+
locations_visited = set()
|
| 205 |
+
history = []
|
| 206 |
+
moves = 0
|
| 207 |
+
|
| 208 |
+
MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
|
| 209 |
+
|
| 210 |
+
# Available tools
|
| 211 |
+
tools = await client.list_tools()
|
| 212 |
+
tool_names = [t.name for t in tools]
|
| 213 |
+
|
| 214 |
+
# Initial observation
|
| 215 |
+
result = await client.call_tool("play_action", {"action": "look"})
|
| 216 |
+
observation = self._extract_result(result)
|
| 217 |
+
self.last_observation = observation
|
| 218 |
+
|
| 219 |
+
location = observation.split("\n")[0] if observation else "Unknown"
|
| 220 |
+
locations_visited.add(location)
|
| 221 |
+
self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
|
| 222 |
+
|
| 223 |
+
# Prime context (no moves)
|
| 224 |
+
if "memory" in tool_names:
|
| 225 |
+
self.last_memory = self._extract_result(await client.call_tool("memory", {}))
|
| 226 |
+
self._update_score(self.last_memory)
|
| 227 |
+
|
| 228 |
+
if "inventory" in tool_names:
|
| 229 |
+
self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
|
| 230 |
+
|
| 231 |
+
if verbose:
|
| 232 |
+
print(f"\n{observation}")
|
| 233 |
+
|
| 234 |
+
for step in range(1, max_steps + 1):
|
| 235 |
+
await self._refresh_context_tools(client, tool_names, step, verbose)
|
| 236 |
+
|
| 237 |
+
prompt = self._build_prompt()
|
| 238 |
+
response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
|
| 239 |
+
thought, tool_name, tool_args = self._parse_response(response, tool_names)
|
| 240 |
+
|
| 241 |
+
if verbose:
|
| 242 |
+
print(f"\n--- Step {step} ---")
|
| 243 |
+
print(f"[THOUGHT] {thought}")
|
| 244 |
+
print(f"[TOOL] {tool_name}({tool_args})")
|
| 245 |
+
|
| 246 |
+
tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
|
| 247 |
+
|
| 248 |
+
# ------------------------------------------------------------
|
| 249 |
+
# Block SAME (location + action + thought)
|
| 250 |
+
# ------------------------------------------------------------
|
| 251 |
+
if tool_name == "play_action":
|
| 252 |
+
current_loc = (
|
| 253 |
+
self.last_observation.split("\n")[0].strip()
|
| 254 |
+
if self.last_observation else "Unknown"
|
| 255 |
+
)
|
| 256 |
+
action_norm = tool_args.get("action", "look").strip().lower()
|
| 257 |
+
t_sig = self._thought_sig(thought)
|
| 258 |
+
|
| 259 |
+
triple = (current_loc, action_norm, t_sig)
|
| 260 |
+
self.loc_action_thought_counts[triple] = (
|
| 261 |
+
self.loc_action_thought_counts.get(triple, 0) + 1
|
| 262 |
+
)
|
| 263 |
+
|
| 264 |
+
if self.loc_action_thought_counts[triple] >= 2:
|
| 265 |
+
if verbose:
|
| 266 |
+
print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
|
| 267 |
+
if "get_map" in tool_names:
|
| 268 |
+
tool_name, tool_args = "get_map", {}
|
| 269 |
+
elif "memory" in tool_names:
|
| 270 |
+
tool_name, tool_args = "memory", {}
|
| 271 |
+
else:
|
| 272 |
+
tool_name, tool_args = "play_action", {"action": "look"}
|
| 273 |
+
|
| 274 |
+
# ------------------------------------------------------------
|
| 275 |
+
# Loop detection (same action spam)
|
| 276 |
+
# ------------------------------------------------------------
|
| 277 |
+
if tool_name == "play_action":
|
| 278 |
+
action = tool_args.get("action", "look")
|
| 279 |
+
self.recent_actions.append(action)
|
| 280 |
+
if len(self.recent_actions) > 5:
|
| 281 |
+
self.recent_actions = self.recent_actions[-5:]
|
| 282 |
+
|
| 283 |
+
if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
|
| 284 |
+
if verbose:
|
| 285 |
+
print("[WARNING] Loop detected - forcing 'look'")
|
| 286 |
+
tool_args = {"action": "look"}
|
| 287 |
+
|
| 288 |
+
# ------------------------------------------------------------
|
| 289 |
+
# Anti-backtracking: block only FAILED moves
|
| 290 |
+
# ------------------------------------------------------------
|
| 291 |
+
self.pending_move = None
|
| 292 |
+
|
| 293 |
+
if tool_name == "play_action":
|
| 294 |
+
action_norm = tool_args.get("action", "look").strip().lower()
|
| 295 |
+
|
| 296 |
+
if action_norm in MOVE_CMDS:
|
| 297 |
+
current_loc = (
|
| 298 |
+
self.last_observation.split("\n")[0].strip()
|
| 299 |
+
if self.last_observation else "Unknown"
|
| 300 |
+
)
|
| 301 |
+
key = (current_loc, action_norm)
|
| 302 |
+
|
| 303 |
+
if self.loc_move_failures.get(key, 0) >= 2:
|
| 304 |
+
if verbose:
|
| 305 |
+
print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
|
| 306 |
+
if "get_map" in tool_names:
|
| 307 |
+
tool_name, tool_args = "get_map", {}
|
| 308 |
+
elif "memory" in tool_names:
|
| 309 |
+
tool_name, tool_args = "memory", {}
|
| 310 |
+
else:
|
| 311 |
+
tool_name, tool_args = "play_action", {"action": "look"}
|
| 312 |
+
else:
|
| 313 |
+
self.pending_move = (current_loc, action_norm)
|
| 314 |
+
|
| 315 |
+
# ------------------------------------------------------------
|
| 316 |
+
# Count moves
|
| 317 |
+
# ------------------------------------------------------------
|
| 318 |
+
if tool_name == "play_action":
|
| 319 |
+
moves += 1
|
| 320 |
+
|
| 321 |
+
# ------------------------------------------------------------
|
| 322 |
+
# Execute tool
|
| 323 |
+
# ------------------------------------------------------------
|
| 324 |
+
try:
|
| 325 |
+
result = await client.call_tool(tool_name, tool_args)
|
| 326 |
+
out_text = self._extract_result(result)
|
| 327 |
+
|
| 328 |
+
if tool_name == "play_action":
|
| 329 |
+
observation = out_text
|
| 330 |
+
self.last_observation = observation
|
| 331 |
+
elif tool_name == "memory":
|
| 332 |
+
self.last_memory = out_text
|
| 333 |
+
elif tool_name == "get_map":
|
| 334 |
+
self.last_map = out_text
|
| 335 |
+
elif tool_name == "inventory":
|
| 336 |
+
self.last_inventory = out_text
|
| 337 |
+
|
| 338 |
+
if verbose:
|
| 339 |
+
print(f"[RESULT] {out_text[:200]}...")
|
| 340 |
+
|
| 341 |
+
except Exception as e:
|
| 342 |
+
out_text = f"Error: {e}"
|
| 343 |
+
observation = out_text
|
| 344 |
+
self.last_observation = observation
|
| 345 |
+
if verbose:
|
| 346 |
+
print(f"[ERROR] {e}")
|
| 347 |
+
|
| 348 |
+
# ------------------------------------------------------------
|
| 349 |
+
# Post-move update
|
| 350 |
+
# ------------------------------------------------------------
|
| 351 |
+
if tool_name == "play_action":
|
| 352 |
+
new_location = observation.split("\n")[0] if observation else "Unknown"
|
| 353 |
+
|
| 354 |
+
if self.pending_move is not None:
|
| 355 |
+
prev_loc, prev_action = self.pending_move
|
| 356 |
+
key = (prev_loc, prev_action)
|
| 357 |
+
|
| 358 |
+
if new_location == prev_loc:
|
| 359 |
+
self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
|
| 360 |
+
else:
|
| 361 |
+
self.loc_move_failures[key] = 0
|
| 362 |
+
|
| 363 |
+
self.pending_move = None
|
| 364 |
+
|
| 365 |
+
location = new_location
|
| 366 |
+
locations_visited.add(location)
|
| 367 |
+
self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
|
| 368 |
+
|
| 369 |
+
self._update_score(observation)
|
| 370 |
+
|
| 371 |
+
if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
|
| 372 |
+
if "inventory" in tool_names:
|
| 373 |
+
self.last_inventory = self._extract_result(
|
| 374 |
+
await client.call_tool("inventory", {})
|
| 375 |
+
)
|
| 376 |
+
|
| 377 |
+
# ------------------------------------------------------------
|
| 378 |
+
# History
|
| 379 |
+
# ------------------------------------------------------------
|
| 380 |
+
self.history.append({
|
| 381 |
+
"step": step,
|
| 382 |
+
"thought": thought,
|
| 383 |
+
"tool": tool_name,
|
| 384 |
+
"args": tool_args,
|
| 385 |
+
"result": out_text[:200]
|
| 386 |
+
})
|
| 387 |
+
if len(self.history) > 10:
|
| 388 |
+
self.history = self.history[-10:]
|
| 389 |
+
|
| 390 |
+
history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
|
| 391 |
+
|
| 392 |
+
if self._is_game_over(observation):
|
| 393 |
+
if verbose:
|
| 394 |
+
print("\n*** GAME OVER ***")
|
| 395 |
+
break
|
| 396 |
+
|
| 397 |
+
return RunResult(
|
| 398 |
+
final_score=self.score,
|
| 399 |
+
max_score=350,
|
| 400 |
+
moves=moves,
|
| 401 |
+
locations_visited=locations_visited,
|
| 402 |
+
game_completed=self._is_game_over(self.last_observation),
|
| 403 |
+
history=history,
|
| 404 |
+
)
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
|
| 408 |
+
"""
|
| 409 |
+
Pull structured context from MCP server without spending moves.
|
| 410 |
+
Tuned to your server outputs:
|
| 411 |
+
- memory() is the best single summary
|
| 412 |
+
- get_map() helps navigation
|
| 413 |
+
- inventory() helps object planning
|
| 414 |
+
"""
|
| 415 |
+
# Memory: often (every 4 steps) so LLM doesn't forget state
|
| 416 |
+
if "memory" in tool_names and (step == 1 or step % 4 == 0):
|
| 417 |
+
try:
|
| 418 |
+
self.last_memory = self._extract_result(await client.call_tool("memory", {}))
|
| 419 |
+
self._update_score(self.last_memory)
|
| 420 |
+
except Exception:
|
| 421 |
+
pass
|
| 422 |
+
|
| 423 |
+
# Map: occasionally (every 6 steps), and also if we moved a lot recently
|
| 424 |
+
if "get_map" in tool_names and (step % 6 == 0):
|
| 425 |
+
try:
|
| 426 |
+
self.last_map = self._extract_result(await client.call_tool("get_map", {}))
|
| 427 |
+
except Exception:
|
| 428 |
+
pass
|
| 429 |
+
|
| 430 |
+
# Inventory: occasionally (every 10 steps)
|
| 431 |
+
if "inventory" in tool_names and (step == 1 or step % 10 == 0):
|
| 432 |
+
try:
|
| 433 |
+
self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
|
| 434 |
+
except Exception:
|
| 435 |
+
pass
|
| 436 |
+
|
| 437 |
+
def _build_prompt(self) -> str:
|
| 438 |
+
"""
|
| 439 |
+
Build prompt that is aligned with your MCP server:
|
| 440 |
+
- memory() has STATE/RECENT/OBSERVATION
|
| 441 |
+
- get_map() starts with MAP
|
| 442 |
+
- inventory() starts with INVENTORY
|
| 443 |
+
"""
|
| 444 |
+
parts = []
|
| 445 |
+
parts.append(f"Current best-known score: {self.score}")
|
| 446 |
+
|
| 447 |
+
# Give the model your server-side memory snapshot (truncate to keep prompt lean)
|
| 448 |
+
if self.last_memory:
|
| 449 |
+
mem = self._truncate(self.last_memory, 1200)
|
| 450 |
+
parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
|
| 451 |
+
|
| 452 |
+
if self.last_inventory:
|
| 453 |
+
inv = self._truncate(self.last_inventory, 400)
|
| 454 |
+
parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
|
| 455 |
+
|
| 456 |
+
if self.last_map:
|
| 457 |
+
mp = self._truncate(self.last_map, 700)
|
| 458 |
+
parts.append("\n=== MAP (from MCP server) ===\n" + mp)
|
| 459 |
+
|
| 460 |
+
# Recent local history (anti-loop)
|
| 461 |
+
if self.history:
|
| 462 |
+
parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
|
| 463 |
+
for entry in self.history[-3:]:
|
| 464 |
+
action = entry.get("args", {}).get("action", entry["tool"])
|
| 465 |
+
result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
|
| 466 |
+
parts.append(f" > {action} -> {result_short}")
|
| 467 |
+
|
| 468 |
+
if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
|
| 469 |
+
parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
|
| 470 |
+
|
| 471 |
+
# Always include the most recent raw observation
|
| 472 |
+
parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
|
| 473 |
+
parts.append("\nWhat do you do next?")
|
| 474 |
+
|
| 475 |
+
return "\n".join(parts)
|
| 476 |
+
|
| 477 |
+
def _truncate(self, text: str, limit: int) -> str:
|
| 478 |
+
text = text or ""
|
| 479 |
+
if len(text) <= limit:
|
| 480 |
+
return text
|
| 481 |
+
return text[:limit] + "\n...[truncated]"
|
| 482 |
+
|
| 483 |
+
def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
|
| 484 |
+
thought = "No reasoning provided"
|
| 485 |
+
tool_name = "play_action"
|
| 486 |
+
tool_args = {"action": "look"}
|
| 487 |
+
|
| 488 |
+
lines = response.strip().split("\n")
|
| 489 |
+
for line in lines:
|
| 490 |
+
line_clean = line.strip()
|
| 491 |
+
line_upper = line_clean.upper()
|
| 492 |
+
|
| 493 |
+
if line_upper.startswith("THOUGHT:"):
|
| 494 |
+
thought = line_clean.split(":", 1)[1].strip()
|
| 495 |
+
|
| 496 |
+
elif line_upper.startswith("TOOL:"):
|
| 497 |
+
raw_tool = line_clean.split(":", 1)[1].strip().lower()
|
| 498 |
+
raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
|
| 499 |
+
raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
|
| 500 |
+
tool_name = raw_tool
|
| 501 |
+
|
| 502 |
+
elif line_upper.startswith("ARGS:"):
|
| 503 |
+
args_part = line_clean.split(":", 1)[1].strip()
|
| 504 |
+
if not args_part:
|
| 505 |
+
tool_args = {}
|
| 506 |
+
continue
|
| 507 |
+
try:
|
| 508 |
+
args_part = args_part.replace("'", '"')
|
| 509 |
+
tool_args = json.loads(args_part)
|
| 510 |
+
except json.JSONDecodeError:
|
| 511 |
+
match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
|
| 512 |
+
if match:
|
| 513 |
+
tool_args = {"action": match.group(1)}
|
| 514 |
+
else:
|
| 515 |
+
tool_args = {"action": "look"}
|
| 516 |
+
|
| 517 |
+
return thought, tool_name, tool_args
|
| 518 |
+
|
| 519 |
+
def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
|
| 520 |
+
|
| 521 |
+
|
| 522 |
+
if tool_name not in valid_tools:
|
| 523 |
+
if tool_name in ["action", "do", "command"]:
|
| 524 |
+
tool_name = "play_action"
|
| 525 |
+
elif tool_name in ["map", "location"]:
|
| 526 |
+
tool_name = "get_map"
|
| 527 |
+
elif tool_name in ["mem", "state", "status"]:
|
| 528 |
+
tool_name = "memory"
|
| 529 |
+
elif tool_name in ["inv", "items"]:
|
| 530 |
+
tool_name = "inventory"
|
| 531 |
+
else:
|
| 532 |
+
tool_name = "play_action"
|
| 533 |
+
|
| 534 |
+
if tool_name == "play_action":
|
| 535 |
+
action = tool_args.get("action", "look")
|
| 536 |
+
|
| 537 |
+
invalid_verb_map = {
|
| 538 |
+
"check": "examine",
|
| 539 |
+
"inspect": "examine",
|
| 540 |
+
"search": "look",
|
| 541 |
+
"grab": "take",
|
| 542 |
+
"pick": "take",
|
| 543 |
+
"use": "examine",
|
| 544 |
+
"investigate": "examine",
|
| 545 |
+
}
|
| 546 |
+
|
| 547 |
+
words = action.lower().split()
|
| 548 |
+
if words and words[0] in invalid_verb_map:
|
| 549 |
+
words[0] = invalid_verb_map[words[0]]
|
| 550 |
+
action = " ".join(words)
|
| 551 |
+
|
| 552 |
+
action = action.lower().strip()
|
| 553 |
+
action = action.replace("**", "").replace("*", "").replace("`", "")
|
| 554 |
+
action = " ".join(action.split())
|
| 555 |
+
|
| 556 |
+
tool_args["action"] = action
|
| 557 |
+
|
| 558 |
+
return tool_name, tool_args
|
| 559 |
+
|
| 560 |
+
def _extract_result(self, result) -> str:
|
| 561 |
+
if hasattr(result, 'content') and result.content:
|
| 562 |
+
return result.content[0].text
|
| 563 |
+
if isinstance(result, list) and result:
|
| 564 |
+
return result[0].text if hasattr(result[0], 'text') else str(result[0])
|
| 565 |
+
return str(result)
|
| 566 |
+
|
| 567 |
+
def _update_score(self, text: str) -> None:
|
| 568 |
+
patterns = [
|
| 569 |
+
r'\[Score:\s*(\d+)',
|
| 570 |
+
r'Score:\s*(\d+)\b',
|
| 571 |
+
]
|
| 572 |
+
for pattern in patterns:
|
| 573 |
+
match = re.search(pattern, text, re.IGNORECASE)
|
| 574 |
+
if match:
|
| 575 |
+
self.score = max(self.score, int(match.group(1)))
|
| 576 |
+
|
| 577 |
+
def _is_game_over(self, text: str) -> bool:
|
| 578 |
+
game_over_phrases = [
|
| 579 |
+
"game over",
|
| 580 |
+
"you have died",
|
| 581 |
+
"you are dead",
|
| 582 |
+
"*** you have died ***",
|
| 583 |
+
]
|
| 584 |
+
text_lower = (text or "").lower()
|
| 585 |
+
return any(phrase in text_lower for phrase in game_over_phrases)
|
| 586 |
+
|
| 587 |
+
|
| 588 |
+
# =============================================================================
|
| 589 |
+
# Local Testing
|
| 590 |
+
# =============================================================================
|
| 591 |
+
|
| 592 |
+
async def test_agent():
|
| 593 |
+
from fastmcp import Client
|
| 594 |
+
|
| 595 |
+
agent = StudentAgent()
|
| 596 |
+
|
| 597 |
+
async with Client("mcp_server.py") as client:
|
| 598 |
+
result = await agent.run(
|
| 599 |
+
client=client,
|
| 600 |
+
game="zork1",
|
| 601 |
+
max_steps=20,
|
| 602 |
+
seed=42,
|
| 603 |
+
verbose=True,
|
| 604 |
+
)
|
| 605 |
+
|
| 606 |
+
print(f"\n{'=' * 50}")
|
| 607 |
+
print(f"Final Score: {result.final_score}")
|
| 608 |
+
print(f"Moves: {result.moves}")
|
| 609 |
+
print(f"Locations: {len(result.locations_visited)}")
|
| 610 |
+
|
| 611 |
+
|
| 612 |
+
if __name__ == "__main__":
|
| 613 |
+
import asyncio
|
| 614 |
+
asyncio.run(test_agent())
|
.ipynb_checkpoints/app-checkpoint.py
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Hugging Face Space - Text Adventure Agent Submission
|
| 3 |
+
|
| 4 |
+
This is a code-only Space for submitting your agent implementation.
|
| 5 |
+
The evaluation is run separately.
|
| 6 |
+
|
| 7 |
+
Files in this submission:
|
| 8 |
+
- agent.py: Your ReAct agent implementation
|
| 9 |
+
- mcp_server.py: Your MCP server implementation
|
| 10 |
+
- requirements.txt: Additional dependencies
|
| 11 |
+
|
| 12 |
+
To test locally:
|
| 13 |
+
fastmcp dev mcp_server.py
|
| 14 |
+
python agent.py
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
import gradio as gr
|
| 18 |
+
from pathlib import Path
|
| 19 |
+
|
| 20 |
+
# Create the Gradio interface
|
| 21 |
+
with gr.Blocks(title="Text Adventure Agent Submission") as demo:
|
| 22 |
+
gr.Markdown("# Text Adventure Agent Submission")
|
| 23 |
+
gr.Markdown(
|
| 24 |
+
"This Space contains a template submission for the Text Adventure Agent assignment. "
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
gr.Markdown(
|
| 28 |
+
"---\n"
|
| 29 |
+
"**Note:** This is a code submission Space. "
|
| 30 |
+
"Evaluation is performed using the evaluation script.\n\n"
|
| 31 |
+
"[Back to main assignment page](https://huggingface.co/spaces/LLM-course/Agentic-zork)"
|
| 32 |
+
)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
if __name__ == "__main__":
|
| 36 |
+
demo.launch()
|
.ipynb_checkpoints/mcp_server-checkpoint.py
ADDED
|
@@ -0,0 +1,520 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Student MCP Server for Text Adventure Games
|
| 3 |
+
|
| 4 |
+
This is your MCP server submission. Implement the tools that your agent
|
| 5 |
+
will use to play text adventure games.
|
| 6 |
+
|
| 7 |
+
Required tool:
|
| 8 |
+
play_action(action: str) -> str
|
| 9 |
+
Execute a game command and return the result.
|
| 10 |
+
|
| 11 |
+
Recommended tools:
|
| 12 |
+
memory() -> str
|
| 13 |
+
Return current game state, score, and recent history.
|
| 14 |
+
|
| 15 |
+
inventory() -> str
|
| 16 |
+
Return the player's current inventory.
|
| 17 |
+
|
| 18 |
+
get_map() -> str
|
| 19 |
+
Return a map of explored locations.
|
| 20 |
+
|
| 21 |
+
Test your server with:
|
| 22 |
+
fastmcp dev submission_template/mcp_server.py
|
| 23 |
+
|
| 24 |
+
Then open the MCP Inspector in your browser to test the tools interactively.
|
| 25 |
+
"""
|
| 26 |
+
|
| 27 |
+
import sys
|
| 28 |
+
import os
|
| 29 |
+
|
| 30 |
+
# Add parent directory to path to import games module
|
| 31 |
+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 32 |
+
|
| 33 |
+
from fastmcp import FastMCP
|
| 34 |
+
from games.zork_env import TextAdventureEnv
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
# =============================================================================
|
| 38 |
+
# Create the MCP Server
|
| 39 |
+
# =============================================================================
|
| 40 |
+
|
| 41 |
+
mcp = FastMCP("Student Text Adventure Server")
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
# =============================================================================
|
| 45 |
+
# Game State Management
|
| 46 |
+
# =============================================================================
|
| 47 |
+
|
| 48 |
+
import re
|
| 49 |
+
from typing import Optional
|
| 50 |
+
|
| 51 |
+
class GameManager:
|
| 52 |
+
"""
|
| 53 |
+
Manages the text adventure game state.
|
| 54 |
+
|
| 55 |
+
Extended tracking:
|
| 56 |
+
- Action history (for memory tool)
|
| 57 |
+
- Explored locations (for mapping)
|
| 58 |
+
- Current score and moves
|
| 59 |
+
- Current location (best-effort, robust across games)
|
| 60 |
+
"""
|
| 61 |
+
|
| 62 |
+
# Lines that are often NOT room titles across many IF games
|
| 63 |
+
_HEADER_LIKE_PATTERNS = [
|
| 64 |
+
r"^\s*score\s*[:=]\s*\d+",
|
| 65 |
+
r"^\s*moves?\s*[:=]\s*\d+",
|
| 66 |
+
r"^\s*turns?\s*[:=]\s*\d+",
|
| 67 |
+
r"^\s*time\s*[:=]\s*",
|
| 68 |
+
r"^\s*health\s*[:=]\s*\d+",
|
| 69 |
+
r"^\s*location\s*[:=]\s*",
|
| 70 |
+
r"^\s*\[.*\]\s*$", # bracket-only status lines
|
| 71 |
+
r"^\s*\(.*\)\s*$", # parenthetical-only lines
|
| 72 |
+
r"^\s*you\s+(are|see|can)\b", # narrative sentence starters
|
| 73 |
+
]
|
| 74 |
+
# Movement commands we consider for mapping (Zork-style + abbreviations)
|
| 75 |
+
_MOVE_CMDS = {
|
| 76 |
+
"north", "south", "east", "west", "up", "down", "enter", "exit",
|
| 77 |
+
"n", "s", "e", "w", "u", "d"
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
# Common failure phrases when trying to move (best-effort, not perfect)
|
| 81 |
+
_MOVE_FAIL_PHRASES = [
|
| 82 |
+
"you can't go", "you cannot go", "can't go that way", "cannot go that way",
|
| 83 |
+
"you can't go that way", "you cannot go that way",
|
| 84 |
+
"you can't", "you cannot",
|
| 85 |
+
"there is no way", "you can't see any way", "you see no way",
|
| 86 |
+
"blocked", "closed", "won't open", "is locked", "locked",
|
| 87 |
+
"too dark", "pitch black"
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
def _is_movement_action(self, action: str) -> bool:
|
| 91 |
+
"""Return True if this action is a movement command we track."""
|
| 92 |
+
a = (action or "").strip().lower()
|
| 93 |
+
return a in self._MOVE_CMDS
|
| 94 |
+
|
| 95 |
+
def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
|
| 96 |
+
"""
|
| 97 |
+
Decide whether a move likely succeeded.
|
| 98 |
+
Strong signal: location label changed.
|
| 99 |
+
Negative signal: failure phrases in observation.
|
| 100 |
+
"""
|
| 101 |
+
if new_loc and old_loc and new_loc != old_loc:
|
| 102 |
+
return True
|
| 103 |
+
|
| 104 |
+
text = (observation or "").lower()
|
| 105 |
+
if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
|
| 106 |
+
return False
|
| 107 |
+
|
| 108 |
+
# If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
|
| 109 |
+
return False
|
| 110 |
+
|
| 111 |
+
def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
|
| 112 |
+
"""Record a directed edge old_loc --action--> new_loc in explored_locations."""
|
| 113 |
+
if not old_loc or not new_loc:
|
| 114 |
+
return
|
| 115 |
+
self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
def __init__(self):
|
| 119 |
+
self.env: TextAdventureEnv = None
|
| 120 |
+
self.state = None
|
| 121 |
+
self.game_name: str = ""
|
| 122 |
+
|
| 123 |
+
# Tracking for agent-support tools
|
| 124 |
+
self.history: list[tuple[str, str]] = []
|
| 125 |
+
self.explored_locations: dict[str, set[str]] = {}
|
| 126 |
+
self.current_location: str = "Unknown"
|
| 127 |
+
|
| 128 |
+
def initialize(self, game: str = "zork1"):
|
| 129 |
+
"""Initialize or reset the game."""
|
| 130 |
+
self.game_name = game
|
| 131 |
+
self.env = TextAdventureEnv(game)
|
| 132 |
+
self.state = self.env.reset()
|
| 133 |
+
|
| 134 |
+
# Reset tracking
|
| 135 |
+
self.history = []
|
| 136 |
+
self.explored_locations = {}
|
| 137 |
+
self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
|
| 138 |
+
|
| 139 |
+
return self.state.observation
|
| 140 |
+
|
| 141 |
+
def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
|
| 142 |
+
"""
|
| 143 |
+
Best-effort location extraction from the observation text.
|
| 144 |
+
|
| 145 |
+
Strategy:
|
| 146 |
+
1) Split into lines, skip empties
|
| 147 |
+
2) Skip lines that look like status bars / headers / pure brackets
|
| 148 |
+
3) Prefer a short, title-like line (room name)
|
| 149 |
+
4) If nothing confident, return fallback (usually previous location)
|
| 150 |
+
"""
|
| 151 |
+
if not observation:
|
| 152 |
+
return fallback or "Unknown"
|
| 153 |
+
|
| 154 |
+
lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
|
| 155 |
+
if not lines:
|
| 156 |
+
return fallback or "Unknown"
|
| 157 |
+
|
| 158 |
+
header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
|
| 159 |
+
|
| 160 |
+
def looks_like_header(line: str) -> bool:
|
| 161 |
+
return any(rx.search(line) for rx in header_res)
|
| 162 |
+
|
| 163 |
+
def looks_like_title(line: str) -> bool:
|
| 164 |
+
# Many room titles are short and not ending with punctuation.
|
| 165 |
+
if len(line) > 60:
|
| 166 |
+
return False
|
| 167 |
+
if line.endswith((".", "!", "?", ";", ":")):
|
| 168 |
+
return False
|
| 169 |
+
# Too many digits usually means a status line.
|
| 170 |
+
if sum(ch.isdigit() for ch in line) >= 3:
|
| 171 |
+
return False
|
| 172 |
+
return True
|
| 173 |
+
|
| 174 |
+
# First pass: first "title-like" line that isn't header-like
|
| 175 |
+
for line in lines[:8]: # only inspect top chunk; titles are usually early
|
| 176 |
+
if looks_like_header(line):
|
| 177 |
+
continue
|
| 178 |
+
if looks_like_title(line):
|
| 179 |
+
return line
|
| 180 |
+
|
| 181 |
+
# Second pass: first non-header line
|
| 182 |
+
for line in lines[:8]:
|
| 183 |
+
if not looks_like_header(line):
|
| 184 |
+
return line
|
| 185 |
+
|
| 186 |
+
return fallback or "Unknown"
|
| 187 |
+
|
| 188 |
+
def step(self, action: str) -> str:
|
| 189 |
+
"""Execute an action and return the result."""
|
| 190 |
+
if self.env is None:
|
| 191 |
+
self.initialize()
|
| 192 |
+
|
| 193 |
+
# Save old location before action
|
| 194 |
+
old_location = self.current_location
|
| 195 |
+
|
| 196 |
+
# Apply action to the real game
|
| 197 |
+
self.state = self.env.step(action)
|
| 198 |
+
obs = self.state.observation
|
| 199 |
+
|
| 200 |
+
# Track history (keep last 50)
|
| 201 |
+
self.history.append((action, obs))
|
| 202 |
+
if len(self.history) > 50:
|
| 203 |
+
self.history = self.history[-50:]
|
| 204 |
+
|
| 205 |
+
# Extract new location (fallback to old)
|
| 206 |
+
new_location = self._extract_location(obs, fallback=old_location)
|
| 207 |
+
|
| 208 |
+
# Update map only if it was a movement attempt AND it likely succeeded
|
| 209 |
+
action_norm = (action or "").strip().lower()
|
| 210 |
+
if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
|
| 211 |
+
self._update_map(action_norm, old_location, new_location)
|
| 212 |
+
|
| 213 |
+
# Finally update current location
|
| 214 |
+
self.current_location = new_location
|
| 215 |
+
|
| 216 |
+
return obs
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
def get_score(self) -> int:
|
| 220 |
+
"""Get current score."""
|
| 221 |
+
return self.state.score if self.state else 0
|
| 222 |
+
|
| 223 |
+
def get_moves(self) -> int:
|
| 224 |
+
"""Get number of moves taken."""
|
| 225 |
+
return self.state.moves if self.state else 0
|
| 226 |
+
def _extract_facts(self, observation: str) -> dict:
|
| 227 |
+
"""
|
| 228 |
+
Best-effort extraction of useful 'facts' from the current observation text.
|
| 229 |
+
This is intentionally heuristic so it can work across many games.
|
| 230 |
+
"""
|
| 231 |
+
obs = observation or ""
|
| 232 |
+
text = obs.strip()
|
| 233 |
+
lower = text.lower()
|
| 234 |
+
|
| 235 |
+
# --- Exits mentioned (simple direction scan) ---
|
| 236 |
+
directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
|
| 237 |
+
exits_found = []
|
| 238 |
+
for d in directions:
|
| 239 |
+
# We detect directions as whole words to reduce false matches
|
| 240 |
+
if re.search(rf"\b{re.escape(d)}\b", lower):
|
| 241 |
+
exits_found.append(d)
|
| 242 |
+
exits_found = sorted(set(exits_found))
|
| 243 |
+
|
| 244 |
+
# --- Visible things (very light heuristics) ---
|
| 245 |
+
# We look for common IF patterns like "You see ... here." / "There is ... here."
|
| 246 |
+
visible_candidates: list[str] = []
|
| 247 |
+
|
| 248 |
+
patterns = [
|
| 249 |
+
r"you see (.+?) here\.",
|
| 250 |
+
r"you can see (.+?) here\.",
|
| 251 |
+
r"there is (.+?) here\.",
|
| 252 |
+
r"there are (.+?) here\.",
|
| 253 |
+
r"you notice (.+?)\.",
|
| 254 |
+
]
|
| 255 |
+
for pat in patterns:
|
| 256 |
+
for m in re.finditer(pat, lower):
|
| 257 |
+
chunk = m.group(1).strip()
|
| 258 |
+
if chunk:
|
| 259 |
+
visible_candidates.append(chunk)
|
| 260 |
+
|
| 261 |
+
# Clean visible candidates a bit (split simple lists, avoid huge strings)
|
| 262 |
+
visible = []
|
| 263 |
+
for chunk in visible_candidates:
|
| 264 |
+
# Split on commas and "and" to get smaller pieces
|
| 265 |
+
parts = re.split(r",|\band\b", chunk)
|
| 266 |
+
for p in parts:
|
| 267 |
+
item = p.strip(" .;:!?\t")
|
| 268 |
+
if 1 <= len(item) <= 40:
|
| 269 |
+
visible.append(item)
|
| 270 |
+
|
| 271 |
+
# Deduplicate and limit (so memory stays compact)
|
| 272 |
+
visible = sorted(set(visible))[:10]
|
| 273 |
+
|
| 274 |
+
return {
|
| 275 |
+
"exits_mentioned": exits_found,
|
| 276 |
+
"visible": visible,
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
def get_memory(self) -> str:
|
| 280 |
+
"""
|
| 281 |
+
LLM-friendly summary of current game state.
|
| 282 |
+
Format: Facts first, then recent actions, then the raw observation.
|
| 283 |
+
"""
|
| 284 |
+
game = self.game_name or "Unknown"
|
| 285 |
+
location = self.current_location or "Unknown"
|
| 286 |
+
score = self.get_score()
|
| 287 |
+
moves = self.get_moves()
|
| 288 |
+
|
| 289 |
+
# Recent actions (keep short and anti-loop)
|
| 290 |
+
recent = self.history[-5:] if self.history else []
|
| 291 |
+
if recent:
|
| 292 |
+
recent_lines = []
|
| 293 |
+
for a, r in recent:
|
| 294 |
+
snippet = (r or "").replace("\n", " ").strip()
|
| 295 |
+
if len(snippet) > 80:
|
| 296 |
+
snippet = snippet[:80] + "..."
|
| 297 |
+
recent_lines.append(f"- {a} -> {snippet}")
|
| 298 |
+
recent_str = "\n".join(recent_lines)
|
| 299 |
+
else:
|
| 300 |
+
recent_str = "(none yet)"
|
| 301 |
+
|
| 302 |
+
# Facts extracted from current observation
|
| 303 |
+
obs = self.state.observation if self.state else ""
|
| 304 |
+
facts = self._extract_facts(obs)
|
| 305 |
+
|
| 306 |
+
exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
|
| 307 |
+
visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
|
| 308 |
+
|
| 309 |
+
return (
|
| 310 |
+
"STATE\n"
|
| 311 |
+
f"Game: {game}\n"
|
| 312 |
+
f"Location: {location}\n"
|
| 313 |
+
f"Score: {score} Moves: {moves}\n"
|
| 314 |
+
f"Visible (best effort): {visible_txt}\n"
|
| 315 |
+
f"Exits mentioned (best effort): {exits_txt}\n"
|
| 316 |
+
"\n"
|
| 317 |
+
"RECENT\n"
|
| 318 |
+
f"{recent_str}\n"
|
| 319 |
+
"\n"
|
| 320 |
+
"OBSERVATION\n"
|
| 321 |
+
f"{obs}"
|
| 322 |
+
)
|
| 323 |
+
def get_map(self) -> str:
|
| 324 |
+
"""
|
| 325 |
+
Return a readable map of explored locations.
|
| 326 |
+
Uses explored_locations built during movement actions.
|
| 327 |
+
|
| 328 |
+
Output is stable + compact for LLM use.
|
| 329 |
+
"""
|
| 330 |
+
if not self.explored_locations:
|
| 331 |
+
return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
|
| 332 |
+
|
| 333 |
+
lines = ["MAP", "Explored locations and exits:"]
|
| 334 |
+
for loc in sorted(self.explored_locations.keys()):
|
| 335 |
+
exits = sorted(self.explored_locations[loc])
|
| 336 |
+
lines.append(f"\n* {loc}")
|
| 337 |
+
for e in exits:
|
| 338 |
+
lines.append(f" - {e}")
|
| 339 |
+
|
| 340 |
+
lines.append(f"\n[Current] {self.current_location}")
|
| 341 |
+
return "\n".join(lines)
|
| 342 |
+
def get_inventory(self) -> str:
|
| 343 |
+
"""
|
| 344 |
+
Return inventory in a robust way across different games/envs.
|
| 345 |
+
|
| 346 |
+
Strategy:
|
| 347 |
+
1) If state.inventory exists and is non-empty -> format it
|
| 348 |
+
2) Otherwise, fall back to issuing the command "inventory"
|
| 349 |
+
through the environment and return that observation
|
| 350 |
+
"""
|
| 351 |
+
# 1) Try structured inventory if provided by env
|
| 352 |
+
items = []
|
| 353 |
+
if self.state is not None and hasattr(self.state, "inventory"):
|
| 354 |
+
inv = getattr(self.state, "inventory")
|
| 355 |
+
if inv:
|
| 356 |
+
# Normalize to strings
|
| 357 |
+
try:
|
| 358 |
+
items = [str(x).strip() for x in inv if str(x).strip()]
|
| 359 |
+
except Exception:
|
| 360 |
+
items = []
|
| 361 |
+
|
| 362 |
+
if items:
|
| 363 |
+
# Keep it simple and safe: just join a cleaned list
|
| 364 |
+
# (Avoid overly aggressive parsing that breaks across games)
|
| 365 |
+
items = sorted(set(items))
|
| 366 |
+
return "INVENTORY\n" + ", ".join(items)
|
| 367 |
+
|
| 368 |
+
# 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
|
| 369 |
+
# NOTE: We do not want to record this as agent history/map; this is a server-side query.
|
| 370 |
+
if self.env is None:
|
| 371 |
+
self.initialize()
|
| 372 |
+
|
| 373 |
+
try:
|
| 374 |
+
tmp_state = self.env.step("inventory")
|
| 375 |
+
inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
|
| 376 |
+
except Exception:
|
| 377 |
+
inv_text = "Inventory: (unable to retrieve)"
|
| 378 |
+
|
| 379 |
+
return "INVENTORY\n" + inv_text.strip()
|
| 380 |
+
|
| 381 |
+
|
| 382 |
+
# Global game manager
|
| 383 |
+
_game = GameManager()
|
| 384 |
+
|
| 385 |
+
|
| 386 |
+
def get_game() -> GameManager:
|
| 387 |
+
"""Get or initialize the game manager."""
|
| 388 |
+
global _game
|
| 389 |
+
if _game.env is None:
|
| 390 |
+
# Get game from environment variable (set by evaluator)
|
| 391 |
+
game = os.environ.get("GAME", "zork1")
|
| 392 |
+
_game.initialize(game)
|
| 393 |
+
return _game
|
| 394 |
+
|
| 395 |
+
|
| 396 |
+
# =============================================================================
|
| 397 |
+
# MCP Tools - IMPLEMENT THESE
|
| 398 |
+
# =============================================================================
|
| 399 |
+
|
| 400 |
+
@mcp.tool()
|
| 401 |
+
def play_action(action: str) -> str:
|
| 402 |
+
"""
|
| 403 |
+
Execute a game command and return the result.
|
| 404 |
+
|
| 405 |
+
This is the main tool for interacting with the game.
|
| 406 |
+
|
| 407 |
+
Args:
|
| 408 |
+
action: The command to execute (e.g., "north", "take lamp", "open mailbox")
|
| 409 |
+
|
| 410 |
+
Returns:
|
| 411 |
+
The game's response to the action
|
| 412 |
+
|
| 413 |
+
Valid commands include:
|
| 414 |
+
- Movement: north, south, east, west, up, down, enter, exit
|
| 415 |
+
- Objects: take <item>, drop <item>, open <thing>, examine <thing>
|
| 416 |
+
- Other: look, inventory, read <thing>, turn on lamp
|
| 417 |
+
"""
|
| 418 |
+
game = get_game()
|
| 419 |
+
|
| 420 |
+
# TODO: You might want to add action validation here
|
| 421 |
+
# TODO: You might want to include score changes in the response
|
| 422 |
+
|
| 423 |
+
result = game.step(action)
|
| 424 |
+
|
| 425 |
+
# Append score/moves for clearer feedback (LLM-friendly, low noise)
|
| 426 |
+
result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 427 |
+
return result
|
| 428 |
+
|
| 429 |
+
# Optional: Append score info
|
| 430 |
+
# result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 431 |
+
|
| 432 |
+
|
| 433 |
+
@mcp.tool()
|
| 434 |
+
def memory() -> str:
|
| 435 |
+
"""
|
| 436 |
+
Return an LLM-friendly summary of the current game state.
|
| 437 |
+
"""
|
| 438 |
+
game = get_game()
|
| 439 |
+
return game.get_memory()
|
| 440 |
+
@mcp.tool()
|
| 441 |
+
def get_map() -> str:
|
| 442 |
+
"""
|
| 443 |
+
Return a map of explored locations and recorded exits.
|
| 444 |
+
"""
|
| 445 |
+
game = get_game()
|
| 446 |
+
return game.get_map()
|
| 447 |
+
|
| 448 |
+
@mcp.tool()
|
| 449 |
+
def inventory() -> str:
|
| 450 |
+
"""
|
| 451 |
+
Return the player's inventory in a robust way.
|
| 452 |
+
"""
|
| 453 |
+
game = get_game()
|
| 454 |
+
return game.get_inventory()
|
| 455 |
+
|
| 456 |
+
|
| 457 |
+
# TODO: Implement additional tools to help your agent
|
| 458 |
+
|
| 459 |
+
# @mcp.tool()
|
| 460 |
+
# def memory() -> str:
|
| 461 |
+
# """
|
| 462 |
+
# Get the current game state summary.
|
| 463 |
+
#
|
| 464 |
+
# Returns:
|
| 465 |
+
# A summary including current location, score, moves, and recent history
|
| 466 |
+
# """
|
| 467 |
+
# game = get_game()
|
| 468 |
+
# # TODO: Return useful state information
|
| 469 |
+
# pass
|
| 470 |
+
|
| 471 |
+
|
| 472 |
+
# @mcp.tool()
|
| 473 |
+
# def inventory() -> str:
|
| 474 |
+
# """
|
| 475 |
+
# Check what the player is carrying.
|
| 476 |
+
#
|
| 477 |
+
# Returns:
|
| 478 |
+
# List of items in the player's inventory
|
| 479 |
+
# """
|
| 480 |
+
# game = get_game()
|
| 481 |
+
# result = game.step("inventory")
|
| 482 |
+
# return result
|
| 483 |
+
|
| 484 |
+
|
| 485 |
+
# @mcp.tool()
|
| 486 |
+
# def get_map() -> str:
|
| 487 |
+
# """
|
| 488 |
+
# Get a map of explored locations.
|
| 489 |
+
#
|
| 490 |
+
# Returns:
|
| 491 |
+
# A text representation of explored locations and connections
|
| 492 |
+
# """
|
| 493 |
+
# game = get_game()
|
| 494 |
+
# # TODO: Return map of explored locations
|
| 495 |
+
# pass
|
| 496 |
+
|
| 497 |
+
|
| 498 |
+
# @mcp.tool()
|
| 499 |
+
# def get_valid_actions() -> str:
|
| 500 |
+
# """
|
| 501 |
+
# Get a list of likely valid actions from the current location.
|
| 502 |
+
#
|
| 503 |
+
# Returns:
|
| 504 |
+
# List of actions that might work here
|
| 505 |
+
# """
|
| 506 |
+
# # This is a hint: Jericho provides get_valid_actions()
|
| 507 |
+
# game = get_game()
|
| 508 |
+
# if game.env and game.env.env:
|
| 509 |
+
# valid = game.env.env.get_valid_actions()
|
| 510 |
+
# return "Valid actions: " + ", ".join(valid[:20])
|
| 511 |
+
# return "Could not determine valid actions"
|
| 512 |
+
|
| 513 |
+
|
| 514 |
+
# =============================================================================
|
| 515 |
+
# Run the server
|
| 516 |
+
# =============================================================================
|
| 517 |
+
|
| 518 |
+
if __name__ == "__main__":
|
| 519 |
+
# This runs the server with stdio transport (for MCP clients)
|
| 520 |
+
mcp.run()
|
README.md
CHANGED
|
@@ -18,11 +18,164 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
|
|
| 18 |
|
| 19 |
## Approach
|
| 20 |
|
| 21 |
-
<!-- Describe your approach here -->
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
|
|
|
| 18 |
|
| 19 |
## Approach
|
| 20 |
|
|
|
|
| 21 |
|
| 22 |
+
# My Report (MCP-Based Text Adventure Agent )
|
| 23 |
+
## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
|
| 24 |
+
|
| 25 |
+
## Overview
|
| 26 |
+
|
| 27 |
+
This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
|
| 28 |
+
|
| 29 |
+
The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## 1. MCP Server Improvements
|
| 34 |
+
|
| 35 |
+
The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
|
| 36 |
+
|
| 37 |
+
### 1.1 Robust Location Extraction
|
| 38 |
+
|
| 39 |
+
Instead of relying solely on the first line of the observation, the server now:
|
| 40 |
+
|
| 41 |
+
- Filters out status-like lines (score, moves, headers, bracketed text)
|
| 42 |
+
- Detects likely room titles heuristically
|
| 43 |
+
- Falls back gracefully when uncertain
|
| 44 |
+
|
| 45 |
+
This improves compatibility across different text adventure engines.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
### 1.2 Structured Memory Output
|
| 50 |
+
|
| 51 |
+
The `memory()` tool was redesigned to provide:
|
| 52 |
+
|
| 53 |
+
- Current game
|
| 54 |
+
- Location
|
| 55 |
+
- Score and moves
|
| 56 |
+
- Extracted visible objects (best-effort heuristics)
|
| 57 |
+
- Mentioned exits
|
| 58 |
+
- Recent action history
|
| 59 |
+
- Full current observation
|
| 60 |
+
|
| 61 |
+
This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
### 1.3 Intelligent Map Construction
|
| 66 |
+
|
| 67 |
+
Movement tracking is no longer naive. A move is recorded only if:
|
| 68 |
+
|
| 69 |
+
- The location actually changes, and
|
| 70 |
+
- The observation does not contain known movement failure phrases.
|
| 71 |
+
|
| 72 |
+
This prevents corrupt map edges and keeps spatial reasoning reliable.
|
| 73 |
+
|
| 74 |
+
The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
### 1.4 Robust Inventory Handling
|
| 79 |
+
|
| 80 |
+
Inventory retrieval now:
|
| 81 |
+
|
| 82 |
+
- Uses structured state inventory when available
|
| 83 |
+
- Falls back to issuing the `inventory` command
|
| 84 |
+
- Cleans and normalizes item strings
|
| 85 |
+
|
| 86 |
+
This ensures cross-game compatibility.
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## 2. Agent-Side Stability and Reasoning Enhancements
|
| 91 |
+
|
| 92 |
+
The ReAct loop was significantly extended to address common LLM failure modes.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
### 2.1 Context Refresh Strategy
|
| 97 |
+
|
| 98 |
+
The agent periodically refreshes:
|
| 99 |
+
|
| 100 |
+
- `memory()` (state grounding)
|
| 101 |
+
- `inventory()` (after item acquisition)
|
| 102 |
+
- `get_map()` (navigation support)
|
| 103 |
+
|
| 104 |
+
This improves decision consistency without consuming extra game moves.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
### 2.2 Action Validation and Normalization
|
| 109 |
+
|
| 110 |
+
Before execution:
|
| 111 |
+
|
| 112 |
+
- Tool names are validated
|
| 113 |
+
- Invalid verbs are mapped to supported equivalents
|
| 114 |
+
- Formatting noise is removed
|
| 115 |
+
- Actions are normalized to consistent lower-case grammar
|
| 116 |
+
|
| 117 |
+
This dramatically reduces invalid command generation.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
### 2.3 Multi-Layer Anti-Loop Mechanisms
|
| 122 |
+
|
| 123 |
+
Several defensive layers were introduced:
|
| 124 |
+
|
| 125 |
+
#### (A) Action Repetition Guard
|
| 126 |
+
If the same action appears three times consecutively, the agent forces a reset (`look`).
|
| 127 |
+
|
| 128 |
+
#### (B) Location-Aware Movement Failure Blocking
|
| 129 |
+
Movement attempts are tracked per `(location, direction)` pair.
|
| 130 |
+
If a direction fails multiple times from the same location, it is blocked.
|
| 131 |
+
|
| 132 |
+
#### (C) Thought + Action + Location Blocking
|
| 133 |
+
A normalized thought signature is computed.
|
| 134 |
+
If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
|
| 135 |
+
|
| 136 |
+
This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
### 2.4 Controlled Movement Policy
|
| 141 |
+
|
| 142 |
+
The agent avoids random wandering by:
|
| 143 |
+
|
| 144 |
+
- Encouraging local interaction before movement
|
| 145 |
+
- Prioritizing dominant objects in the observation
|
| 146 |
+
- Blocking repeated failed transitions
|
| 147 |
+
|
| 148 |
+
This reduces wasted exploration steps.
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
## 3. Design Philosophy
|
| 153 |
+
|
| 154 |
+
The key improvements are architectural rather than game-specific:
|
| 155 |
+
|
| 156 |
+
- Clear separation between environment (MCP server) and reasoning (LLM agent)
|
| 157 |
+
- Structured state exposure instead of raw narrative text
|
| 158 |
+
- Defensive programming against repetition and invalid behavior
|
| 159 |
+
- Heuristic generalization instead of hardcoded walkthrough logic
|
| 160 |
+
|
| 161 |
+
The system is modular, interpretable, and extensible.
|
| 162 |
+
|
| 163 |
+
---
|
| 164 |
+
|
| 165 |
+
## 4. Conclusion
|
| 166 |
+
|
| 167 |
+
Compared to the baseline template, this implementation introduces:
|
| 168 |
+
|
| 169 |
+
- Structured memory representation
|
| 170 |
+
- Robust location extraction
|
| 171 |
+
- Intelligent map tracking
|
| 172 |
+
- Inventory normalization
|
| 173 |
+
- Multi-layer loop prevention
|
| 174 |
+
- Location-aware movement validation
|
| 175 |
+
- Thought-action repetition blocking
|
| 176 |
+
- Controlled exploration policy
|
| 177 |
+
|
| 178 |
+
The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
|
| 179 |
|
| 180 |
## Files
|
| 181 |
|
agent.py
CHANGED
|
@@ -1,26 +1,11 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
2. Use the ReAct pattern (Thought -> Action -> Observation)
|
| 10 |
-
3. Call MCP tools to interact with the game
|
| 11 |
-
4. Maximize the game score within the step limit
|
| 12 |
-
|
| 13 |
-
Required method:
|
| 14 |
-
async def run(self, client, game, max_steps, seed, verbose) -> RunResult
|
| 15 |
-
|
| 16 |
-
The 'client' is a FastMCP Client already connected to your MCP server.
|
| 17 |
-
Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
|
| 18 |
-
|
| 19 |
-
Tips:
|
| 20 |
-
- Start by looking around and understanding your environment
|
| 21 |
-
- Keep track of visited locations to avoid loops
|
| 22 |
-
- Pick up useful items (lamp, sword, etc.)
|
| 23 |
-
- The seed parameter should be used to set your LLM's seed for reproducibility
|
| 24 |
"""
|
| 25 |
|
| 26 |
import json
|
|
@@ -32,79 +17,32 @@ from typing import Optional
|
|
| 32 |
from dotenv import load_dotenv
|
| 33 |
from huggingface_hub import InferenceClient
|
| 34 |
|
| 35 |
-
# Load environment variables
|
| 36 |
load_dotenv()
|
| 37 |
|
| 38 |
-
# Set USE_LOCAL_MODEL=1 in your .env to use a locally downloaded model
|
| 39 |
-
USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "0").strip() in ("1", "true", "yes")
|
| 40 |
-
LOCAL_MODEL_ID = os.getenv("LOCAL_MODEL_ID", "Qwen/Qwen2.5-3B-Instruct")
|
| 41 |
-
|
| 42 |
# =============================================================================
|
| 43 |
# LLM Configuration - DO NOT MODIFY
|
| 44 |
# =============================================================================
|
| 45 |
|
| 46 |
-
# Model to use (fixed for fair evaluation)
|
| 47 |
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 48 |
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
if USE_LOCAL_MODEL:
|
| 53 |
-
import torch
|
| 54 |
-
from transformers import pipeline as _hf_pipeline
|
| 55 |
|
| 56 |
-
|
| 57 |
-
"text-generation",
|
| 58 |
-
model=LOCAL_MODEL_ID,
|
| 59 |
-
torch_dtype=torch.bfloat16,
|
| 60 |
-
device_map="auto",
|
| 61 |
-
)
|
| 62 |
-
LLM_CLIENT = None
|
| 63 |
-
else:
|
| 64 |
-
_hf_token = os.getenv("HF_TOKEN")
|
| 65 |
-
if not _hf_token:
|
| 66 |
-
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
| 67 |
-
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 68 |
|
| 69 |
|
| 70 |
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
|
| 71 |
-
"""
|
| 72 |
-
Call the LLM with the given prompt. Use this function in your agent.
|
| 73 |
-
|
| 74 |
-
Args:
|
| 75 |
-
prompt: The user prompt (current game state, history, etc.)
|
| 76 |
-
system_prompt: The system prompt (instructions for the agent)
|
| 77 |
-
seed: Random seed for reproducibility
|
| 78 |
-
max_tokens: Maximum tokens in response (default: 300)
|
| 79 |
-
|
| 80 |
-
Returns:
|
| 81 |
-
The LLM's response text
|
| 82 |
-
|
| 83 |
-
Example:
|
| 84 |
-
response = call_llm(
|
| 85 |
-
prompt="You are in a forest. What do you do?",
|
| 86 |
-
system_prompt=SYSTEM_PROMPT,
|
| 87 |
-
seed=42,
|
| 88 |
-
)
|
| 89 |
-
"""
|
| 90 |
messages = [
|
| 91 |
{"role": "system", "content": system_prompt},
|
| 92 |
{"role": "user", "content": prompt},
|
| 93 |
]
|
| 94 |
|
| 95 |
-
if USE_LOCAL_MODEL and _local_pipeline is not None:
|
| 96 |
-
outputs = _local_pipeline(
|
| 97 |
-
messages,
|
| 98 |
-
max_new_tokens=max_tokens,
|
| 99 |
-
temperature=0.0001, # Near-deterministic (0.0 unsupported by some backends)
|
| 100 |
-
do_sample=True,
|
| 101 |
-
)
|
| 102 |
-
return outputs[0]["generated_text"][-1]["content"]
|
| 103 |
-
|
| 104 |
response = LLM_CLIENT.chat.completions.create(
|
| 105 |
model=LLM_MODEL,
|
| 106 |
messages=messages,
|
| 107 |
-
temperature=0.0,
|
| 108 |
max_tokens=max_tokens,
|
| 109 |
seed=seed,
|
| 110 |
)
|
|
@@ -125,179 +63,550 @@ class RunResult:
|
|
| 125 |
|
| 126 |
|
| 127 |
# =============================================================================
|
| 128 |
-
# System Prompt
|
| 129 |
# =============================================================================
|
|
|
|
| 130 |
|
| 131 |
-
|
| 132 |
|
| 133 |
-
|
| 134 |
|
| 135 |
AVAILABLE TOOLS (use via MCP):
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
TOOL: <tool_name>
|
| 148 |
-
ARGS: <JSON arguments
|
| 149 |
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
ARGS: {"action": "look"}
|
| 154 |
"""
|
| 155 |
|
| 156 |
-
|
| 157 |
# =============================================================================
|
| 158 |
-
# Student Agent
|
| 159 |
# =============================================================================
|
| 160 |
-
|
| 161 |
class StudentAgent:
|
| 162 |
"""
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
2. Parse LLM responses to extract tool calls
|
| 168 |
-
3. Track state and avoid loops
|
| 169 |
-
|
| 170 |
-
Use the provided call_llm() function to interact with the LLM.
|
| 171 |
"""
|
| 172 |
-
|
| 173 |
def __init__(self):
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
async def run(
|
| 181 |
self,
|
| 182 |
-
client,
|
| 183 |
game: str,
|
| 184 |
max_steps: int,
|
| 185 |
seed: int,
|
| 186 |
verbose: bool = False,
|
| 187 |
) -> RunResult:
|
| 188 |
-
|
| 189 |
-
Run the agent for a game session.
|
| 190 |
-
|
| 191 |
-
Args:
|
| 192 |
-
client: FastMCP Client connected to your MCP server
|
| 193 |
-
game: Name of the game being played (e.g., "zork1")
|
| 194 |
-
max_steps: Maximum number of steps to take
|
| 195 |
-
seed: Random seed for reproducibility (use for LLM calls)
|
| 196 |
-
verbose: Whether to print detailed output
|
| 197 |
-
|
| 198 |
-
Returns:
|
| 199 |
-
RunResult with final score and statistics
|
| 200 |
-
"""
|
| 201 |
-
# TODO: Implement your ReAct loop here
|
| 202 |
-
#
|
| 203 |
-
# Basic structure:
|
| 204 |
-
# 1. Get initial observation (call play_action with "look")
|
| 205 |
-
# 2. Loop for max_steps:
|
| 206 |
-
# a. Build prompt with current observation and history
|
| 207 |
-
# b. Call LLM to get thought and action
|
| 208 |
-
# c. Parse the response to extract tool and args
|
| 209 |
-
# d. Call the tool via client.call_tool(tool_name, args)
|
| 210 |
-
# e. Update history and state
|
| 211 |
-
# f. Check for game over
|
| 212 |
-
# 3. Return RunResult with final statistics
|
| 213 |
-
|
| 214 |
-
# Example of calling a tool:
|
| 215 |
-
# result = await client.call_tool("play_action", {"action": "look"})
|
| 216 |
-
# observation = result[0].text if result else "No response"
|
| 217 |
-
|
| 218 |
-
# Example of calling the LLM:
|
| 219 |
-
# response = call_llm(
|
| 220 |
-
# prompt="Current observation: " + observation,
|
| 221 |
-
# system_prompt=SYSTEM_PROMPT,
|
| 222 |
-
# seed=seed,
|
| 223 |
-
# )
|
| 224 |
-
|
| 225 |
-
# Placeholder implementation - replace with your code
|
| 226 |
locations_visited = set()
|
| 227 |
history = []
|
| 228 |
-
final_score = 0
|
| 229 |
moves = 0
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
return RunResult(
|
| 235 |
-
final_score=
|
| 236 |
-
max_score=350,
|
| 237 |
moves=moves,
|
| 238 |
locations_visited=locations_visited,
|
| 239 |
-
game_completed=
|
| 240 |
history=history,
|
| 241 |
)
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
Build the prompt for the LLM.
|
| 246 |
-
|
| 247 |
-
TODO: Implement this to create effective prompts
|
| 248 |
-
"""
|
| 249 |
-
# TODO: Combine system prompt, history, and current observation
|
| 250 |
-
pass
|
| 251 |
-
|
| 252 |
-
def _parse_response(self, response: str) -> tuple[str, str, dict]:
|
| 253 |
"""
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
Tuple of (thought, tool_name, args_dict)
|
| 260 |
"""
|
| 261 |
-
#
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 268 |
"""
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
|
|
|
| 272 |
"""
|
| 273 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 274 |
|
| 275 |
|
| 276 |
# =============================================================================
|
| 277 |
-
#
|
| 278 |
# =============================================================================
|
| 279 |
|
| 280 |
async def test_agent():
|
| 281 |
-
"""Test the agent locally."""
|
| 282 |
from fastmcp import Client
|
| 283 |
-
|
| 284 |
-
# Path to your MCP server
|
| 285 |
-
server_path = "mcp_server.py"
|
| 286 |
-
|
| 287 |
agent = StudentAgent()
|
| 288 |
-
|
| 289 |
-
async with Client(
|
| 290 |
result = await agent.run(
|
| 291 |
client=client,
|
| 292 |
game="zork1",
|
| 293 |
-
max_steps=
|
| 294 |
seed=42,
|
| 295 |
verbose=True,
|
| 296 |
)
|
| 297 |
-
|
| 298 |
-
print(f"\
|
|
|
|
| 299 |
print(f"Moves: {result.moves}")
|
| 300 |
-
print(f"Locations: {result.locations_visited}")
|
| 301 |
|
| 302 |
|
| 303 |
if __name__ == "__main__":
|
|
|
|
| 1 |
"""
|
| 2 |
+
: MCP ReAct Agent (adapted for your MCP server)
|
| 3 |
|
| 4 |
+
Key upgrades:
|
| 5 |
+
- Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
|
| 6 |
+
- Injects those outputs into the LLM prompt (LLM-friendly context)
|
| 7 |
+
- Updates score from BOTH play_action output and memory output
|
| 8 |
+
- Keeps loop detection + action normalization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
"""
|
| 10 |
|
| 11 |
import json
|
|
|
|
| 17 |
from dotenv import load_dotenv
|
| 18 |
from huggingface_hub import InferenceClient
|
| 19 |
|
|
|
|
| 20 |
load_dotenv()
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
# =============================================================================
|
| 23 |
# LLM Configuration - DO NOT MODIFY
|
| 24 |
# =============================================================================
|
| 25 |
|
|
|
|
| 26 |
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 27 |
|
| 28 |
+
_hf_token = os.getenv("HF_TOKEN")
|
| 29 |
+
if not _hf_token:
|
| 30 |
+
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
LLM_CLIENT = InferenceClient(token=_hf_token)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
|
| 36 |
+
"""Call the LLM with the given prompt."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
messages = [
|
| 38 |
{"role": "system", "content": system_prompt},
|
| 39 |
{"role": "user", "content": prompt},
|
| 40 |
]
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
response = LLM_CLIENT.chat.completions.create(
|
| 43 |
model=LLM_MODEL,
|
| 44 |
messages=messages,
|
| 45 |
+
temperature=0.0,
|
| 46 |
max_tokens=max_tokens,
|
| 47 |
seed=seed,
|
| 48 |
)
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
# =============================================================================
|
| 66 |
+
# System Prompt
|
| 67 |
# =============================================================================
|
| 68 |
+
SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
|
| 69 |
|
| 70 |
+
Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
|
| 71 |
|
| 72 |
+
This game is small and objective-focused. Avoid unnecessary wandering.
|
| 73 |
|
| 74 |
AVAILABLE TOOLS (use via MCP):
|
| 75 |
+
1. play_action - Execute valid game commands.
|
| 76 |
+
2. memory - Get structured summary of current state and recent actions.
|
| 77 |
+
3. get_map - See explored locations.
|
| 78 |
+
4. inventory - Check carried items.
|
| 79 |
+
|
| 80 |
+
VALID ACTION STYLE:
|
| 81 |
+
Movement:
|
| 82 |
+
- north, south, east, west, up, down
|
| 83 |
+
- n, s, e, w, u, d
|
| 84 |
+
|
| 85 |
+
Core actions:
|
| 86 |
+
- look
|
| 87 |
+
- examine <thing>
|
| 88 |
+
- take <item>, drop <item>
|
| 89 |
+
- open <thing>, close <thing>
|
| 90 |
+
- talk to <character>
|
| 91 |
+
- give <item> to <character>
|
| 92 |
+
- use specific verbs mentioned in observation
|
| 93 |
+
|
| 94 |
+
AVOID:
|
| 95 |
+
- generic verbs like "use"
|
| 96 |
+
- random movement without purpose
|
| 97 |
+
- repeating failed actions
|
| 98 |
+
|
| 99 |
+
--------------------------------------------------
|
| 100 |
+
CORE STRATEGY (IMPORTANT)
|
| 101 |
+
--------------------------------------------------
|
| 102 |
+
|
| 103 |
+
1) DOMINANT OBJECT RULE (VERY IMPORTANT):
|
| 104 |
+
If a specific object or character is repeatedly mentioned in the observation,
|
| 105 |
+
treat it as the main objective.
|
| 106 |
+
|
| 107 |
+
Do NOT leave the area until you:
|
| 108 |
+
- examine it
|
| 109 |
+
- try multiple meaningful interactions
|
| 110 |
+
- or confirm no new interaction is possible
|
| 111 |
+
|
| 112 |
+
Stay focused before exploring elsewhere.
|
| 113 |
+
|
| 114 |
+
2) PROBLEM-SOLVING PRIORITY:
|
| 115 |
+
If the game clearly revolves around one main goal,
|
| 116 |
+
prioritize actions that directly affect that goal instead of exploring new rooms.
|
| 117 |
+
|
| 118 |
+
3) CONTROLLED MOVEMENT:
|
| 119 |
+
Only move if:
|
| 120 |
+
- you have exhausted interactions in the current room
|
| 121 |
+
- or memory/map suggests a new unexplored path is necessary
|
| 122 |
+
|
| 123 |
+
4) LIMITED RETRIES:
|
| 124 |
+
If an action fails once, try a different verb.
|
| 125 |
+
Do NOT repeat the same failed action more than once.
|
| 126 |
+
|
| 127 |
+
5) OBJECT TRANSFORMATION FOCUS:
|
| 128 |
+
If an object seems central, try actions that might change its state:
|
| 129 |
+
- examine
|
| 130 |
+
- open
|
| 131 |
+
- give something
|
| 132 |
+
- use appropriate verbs mentioned in text
|
| 133 |
+
- interact from different angles
|
| 134 |
+
|
| 135 |
+
--------------------------------------------------
|
| 136 |
+
TOOL USAGE RULES
|
| 137 |
+
--------------------------------------------------
|
| 138 |
+
|
| 139 |
+
- Use memory() when uncertain or before repeating behavior.
|
| 140 |
+
- Use get_map() only if navigation becomes necessary.
|
| 141 |
+
- Use inventory() after obtaining items.
|
| 142 |
+
|
| 143 |
+
--------------------------------------------------
|
| 144 |
+
OUTPUT FORMAT (STRICT)
|
| 145 |
+
--------------------------------------------------
|
| 146 |
+
|
| 147 |
+
THOUGHT: <brief reasoning>
|
| 148 |
TOOL: <tool_name>
|
| 149 |
+
ARGS: <JSON arguments>
|
| 150 |
|
| 151 |
+
Keep THOUGHT short (1-2 sentences).
|
| 152 |
+
Do not repeat the same action multiple times.
|
| 153 |
+
Prefer solving over wandering.
|
|
|
|
| 154 |
"""
|
| 155 |
|
|
|
|
| 156 |
# =============================================================================
|
| 157 |
+
# Student Agent Implementation
|
| 158 |
# =============================================================================
|
|
|
|
| 159 |
class StudentAgent:
|
| 160 |
"""
|
| 161 |
+
MCP ReAct Agent adapted to your MCP server outputs:
|
| 162 |
+
- memory() returns STATE / RECENT / OBSERVATION
|
| 163 |
+
- get_map() returns MAP ...
|
| 164 |
+
- inventory() returns INVENTORY ...
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
"""
|
| 166 |
+
|
| 167 |
def __init__(self):
|
| 168 |
+
self.history: list[dict] = []
|
| 169 |
+
self.recent_actions: list[str] = []
|
| 170 |
+
self.score: int = 0
|
| 171 |
+
|
| 172 |
+
# Cached tool outputs
|
| 173 |
+
self.last_memory: str = ""
|
| 174 |
+
self.last_map: str = ""
|
| 175 |
+
self.last_inventory: str = ""
|
| 176 |
+
self.last_observation: str = ""
|
| 177 |
+
|
| 178 |
+
# Exploration / anti-loop state
|
| 179 |
+
self.visit_counts: dict[str, int] = {}
|
| 180 |
+
self.loc_move_failures: dict[tuple[str, str], int] = {}
|
| 181 |
+
self.pending_move: Optional[tuple[str, str]] = None
|
| 182 |
+
|
| 183 |
+
# NEW: prevent repeating same thought+action at same location
|
| 184 |
+
self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
|
| 185 |
+
|
| 186 |
+
# ------------------------------------------------------------
|
| 187 |
+
# Thought normalization helper
|
| 188 |
+
# ------------------------------------------------------------
|
| 189 |
+
def _thought_sig(self, thought: str) -> str:
|
| 190 |
+
t = (thought or "").lower()
|
| 191 |
+
t = re.sub(r"[^a-z0-9\s]", " ", t)
|
| 192 |
+
t = re.sub(r"\s+", " ", t).strip()
|
| 193 |
+
return " ".join(t.split()[:12])
|
| 194 |
+
|
| 195 |
async def run(
|
| 196 |
self,
|
| 197 |
+
client,
|
| 198 |
game: str,
|
| 199 |
max_steps: int,
|
| 200 |
seed: int,
|
| 201 |
verbose: bool = False,
|
| 202 |
) -> RunResult:
|
| 203 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
locations_visited = set()
|
| 205 |
history = []
|
|
|
|
| 206 |
moves = 0
|
| 207 |
+
|
| 208 |
+
MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
|
| 209 |
+
|
| 210 |
+
# Available tools
|
| 211 |
+
tools = await client.list_tools()
|
| 212 |
+
tool_names = [t.name for t in tools]
|
| 213 |
+
|
| 214 |
+
# Initial observation
|
| 215 |
+
result = await client.call_tool("play_action", {"action": "look"})
|
| 216 |
+
observation = self._extract_result(result)
|
| 217 |
+
self.last_observation = observation
|
| 218 |
+
|
| 219 |
+
location = observation.split("\n")[0] if observation else "Unknown"
|
| 220 |
+
locations_visited.add(location)
|
| 221 |
+
self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
|
| 222 |
+
|
| 223 |
+
# Prime context (no moves)
|
| 224 |
+
if "memory" in tool_names:
|
| 225 |
+
self.last_memory = self._extract_result(await client.call_tool("memory", {}))
|
| 226 |
+
self._update_score(self.last_memory)
|
| 227 |
+
|
| 228 |
+
if "inventory" in tool_names:
|
| 229 |
+
self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
|
| 230 |
+
|
| 231 |
+
if verbose:
|
| 232 |
+
print(f"\n{observation}")
|
| 233 |
+
|
| 234 |
+
for step in range(1, max_steps + 1):
|
| 235 |
+
await self._refresh_context_tools(client, tool_names, step, verbose)
|
| 236 |
+
|
| 237 |
+
prompt = self._build_prompt()
|
| 238 |
+
response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
|
| 239 |
+
thought, tool_name, tool_args = self._parse_response(response, tool_names)
|
| 240 |
+
|
| 241 |
+
if verbose:
|
| 242 |
+
print(f"\n--- Step {step} ---")
|
| 243 |
+
print(f"[THOUGHT] {thought}")
|
| 244 |
+
print(f"[TOOL] {tool_name}({tool_args})")
|
| 245 |
+
|
| 246 |
+
tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
|
| 247 |
+
|
| 248 |
+
# ------------------------------------------------------------
|
| 249 |
+
# Block SAME (location + action + thought)
|
| 250 |
+
# ------------------------------------------------------------
|
| 251 |
+
if tool_name == "play_action":
|
| 252 |
+
current_loc = (
|
| 253 |
+
self.last_observation.split("\n")[0].strip()
|
| 254 |
+
if self.last_observation else "Unknown"
|
| 255 |
+
)
|
| 256 |
+
action_norm = tool_args.get("action", "look").strip().lower()
|
| 257 |
+
t_sig = self._thought_sig(thought)
|
| 258 |
+
|
| 259 |
+
triple = (current_loc, action_norm, t_sig)
|
| 260 |
+
self.loc_action_thought_counts[triple] = (
|
| 261 |
+
self.loc_action_thought_counts.get(triple, 0) + 1
|
| 262 |
+
)
|
| 263 |
+
|
| 264 |
+
if self.loc_action_thought_counts[triple] >= 2:
|
| 265 |
+
if verbose:
|
| 266 |
+
print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
|
| 267 |
+
if "get_map" in tool_names:
|
| 268 |
+
tool_name, tool_args = "get_map", {}
|
| 269 |
+
elif "memory" in tool_names:
|
| 270 |
+
tool_name, tool_args = "memory", {}
|
| 271 |
+
else:
|
| 272 |
+
tool_name, tool_args = "play_action", {"action": "look"}
|
| 273 |
+
|
| 274 |
+
# ------------------------------------------------------------
|
| 275 |
+
# Loop detection (same action spam)
|
| 276 |
+
# ------------------------------------------------------------
|
| 277 |
+
if tool_name == "play_action":
|
| 278 |
+
action = tool_args.get("action", "look")
|
| 279 |
+
self.recent_actions.append(action)
|
| 280 |
+
if len(self.recent_actions) > 5:
|
| 281 |
+
self.recent_actions = self.recent_actions[-5:]
|
| 282 |
+
|
| 283 |
+
if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
|
| 284 |
+
if verbose:
|
| 285 |
+
print("[WARNING] Loop detected - forcing 'look'")
|
| 286 |
+
tool_args = {"action": "look"}
|
| 287 |
+
|
| 288 |
+
# ------------------------------------------------------------
|
| 289 |
+
# Anti-backtracking: block only FAILED moves
|
| 290 |
+
# ------------------------------------------------------------
|
| 291 |
+
self.pending_move = None
|
| 292 |
+
|
| 293 |
+
if tool_name == "play_action":
|
| 294 |
+
action_norm = tool_args.get("action", "look").strip().lower()
|
| 295 |
+
|
| 296 |
+
if action_norm in MOVE_CMDS:
|
| 297 |
+
current_loc = (
|
| 298 |
+
self.last_observation.split("\n")[0].strip()
|
| 299 |
+
if self.last_observation else "Unknown"
|
| 300 |
+
)
|
| 301 |
+
key = (current_loc, action_norm)
|
| 302 |
+
|
| 303 |
+
if self.loc_move_failures.get(key, 0) >= 2:
|
| 304 |
+
if verbose:
|
| 305 |
+
print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
|
| 306 |
+
if "get_map" in tool_names:
|
| 307 |
+
tool_name, tool_args = "get_map", {}
|
| 308 |
+
elif "memory" in tool_names:
|
| 309 |
+
tool_name, tool_args = "memory", {}
|
| 310 |
+
else:
|
| 311 |
+
tool_name, tool_args = "play_action", {"action": "look"}
|
| 312 |
+
else:
|
| 313 |
+
self.pending_move = (current_loc, action_norm)
|
| 314 |
+
|
| 315 |
+
# ------------------------------------------------------------
|
| 316 |
+
# Count moves
|
| 317 |
+
# ------------------------------------------------------------
|
| 318 |
+
if tool_name == "play_action":
|
| 319 |
+
moves += 1
|
| 320 |
+
|
| 321 |
+
# ------------------------------------------------------------
|
| 322 |
+
# Execute tool
|
| 323 |
+
# ------------------------------------------------------------
|
| 324 |
+
try:
|
| 325 |
+
result = await client.call_tool(tool_name, tool_args)
|
| 326 |
+
out_text = self._extract_result(result)
|
| 327 |
+
|
| 328 |
+
if tool_name == "play_action":
|
| 329 |
+
observation = out_text
|
| 330 |
+
self.last_observation = observation
|
| 331 |
+
elif tool_name == "memory":
|
| 332 |
+
self.last_memory = out_text
|
| 333 |
+
elif tool_name == "get_map":
|
| 334 |
+
self.last_map = out_text
|
| 335 |
+
elif tool_name == "inventory":
|
| 336 |
+
self.last_inventory = out_text
|
| 337 |
+
|
| 338 |
+
if verbose:
|
| 339 |
+
print(f"[RESULT] {out_text[:200]}...")
|
| 340 |
+
|
| 341 |
+
except Exception as e:
|
| 342 |
+
out_text = f"Error: {e}"
|
| 343 |
+
observation = out_text
|
| 344 |
+
self.last_observation = observation
|
| 345 |
+
if verbose:
|
| 346 |
+
print(f"[ERROR] {e}")
|
| 347 |
+
|
| 348 |
+
# ------------------------------------------------------------
|
| 349 |
+
# Post-move update
|
| 350 |
+
# ------------------------------------------------------------
|
| 351 |
+
if tool_name == "play_action":
|
| 352 |
+
new_location = observation.split("\n")[0] if observation else "Unknown"
|
| 353 |
+
|
| 354 |
+
if self.pending_move is not None:
|
| 355 |
+
prev_loc, prev_action = self.pending_move
|
| 356 |
+
key = (prev_loc, prev_action)
|
| 357 |
+
|
| 358 |
+
if new_location == prev_loc:
|
| 359 |
+
self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
|
| 360 |
+
else:
|
| 361 |
+
self.loc_move_failures[key] = 0
|
| 362 |
+
|
| 363 |
+
self.pending_move = None
|
| 364 |
+
|
| 365 |
+
location = new_location
|
| 366 |
+
locations_visited.add(location)
|
| 367 |
+
self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
|
| 368 |
+
|
| 369 |
+
self._update_score(observation)
|
| 370 |
+
|
| 371 |
+
if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
|
| 372 |
+
if "inventory" in tool_names:
|
| 373 |
+
self.last_inventory = self._extract_result(
|
| 374 |
+
await client.call_tool("inventory", {})
|
| 375 |
+
)
|
| 376 |
+
|
| 377 |
+
# ------------------------------------------------------------
|
| 378 |
+
# History
|
| 379 |
+
# ------------------------------------------------------------
|
| 380 |
+
self.history.append({
|
| 381 |
+
"step": step,
|
| 382 |
+
"thought": thought,
|
| 383 |
+
"tool": tool_name,
|
| 384 |
+
"args": tool_args,
|
| 385 |
+
"result": out_text[:200]
|
| 386 |
+
})
|
| 387 |
+
if len(self.history) > 10:
|
| 388 |
+
self.history = self.history[-10:]
|
| 389 |
+
|
| 390 |
+
history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
|
| 391 |
+
|
| 392 |
+
if self._is_game_over(observation):
|
| 393 |
+
if verbose:
|
| 394 |
+
print("\n*** GAME OVER ***")
|
| 395 |
+
break
|
| 396 |
+
|
| 397 |
return RunResult(
|
| 398 |
+
final_score=self.score,
|
| 399 |
+
max_score=350,
|
| 400 |
moves=moves,
|
| 401 |
locations_visited=locations_visited,
|
| 402 |
+
game_completed=self._is_game_over(self.last_observation),
|
| 403 |
history=history,
|
| 404 |
)
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 408 |
"""
|
| 409 |
+
Pull structured context from MCP server without spending moves.
|
| 410 |
+
Tuned to your server outputs:
|
| 411 |
+
- memory() is the best single summary
|
| 412 |
+
- get_map() helps navigation
|
| 413 |
+
- inventory() helps object planning
|
|
|
|
| 414 |
"""
|
| 415 |
+
# Memory: often (every 4 steps) so LLM doesn't forget state
|
| 416 |
+
if "memory" in tool_names and (step == 1 or step % 4 == 0):
|
| 417 |
+
try:
|
| 418 |
+
self.last_memory = self._extract_result(await client.call_tool("memory", {}))
|
| 419 |
+
self._update_score(self.last_memory)
|
| 420 |
+
except Exception:
|
| 421 |
+
pass
|
| 422 |
+
|
| 423 |
+
# Map: occasionally (every 6 steps), and also if we moved a lot recently
|
| 424 |
+
if "get_map" in tool_names and (step % 6 == 0):
|
| 425 |
+
try:
|
| 426 |
+
self.last_map = self._extract_result(await client.call_tool("get_map", {}))
|
| 427 |
+
except Exception:
|
| 428 |
+
pass
|
| 429 |
+
|
| 430 |
+
# Inventory: occasionally (every 10 steps)
|
| 431 |
+
if "inventory" in tool_names and (step == 1 or step % 10 == 0):
|
| 432 |
+
try:
|
| 433 |
+
self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
|
| 434 |
+
except Exception:
|
| 435 |
+
pass
|
| 436 |
+
|
| 437 |
+
def _build_prompt(self) -> str:
|
| 438 |
"""
|
| 439 |
+
Build prompt that is aligned with your MCP server:
|
| 440 |
+
- memory() has STATE/RECENT/OBSERVATION
|
| 441 |
+
- get_map() starts with MAP
|
| 442 |
+
- inventory() starts with INVENTORY
|
| 443 |
"""
|
| 444 |
+
parts = []
|
| 445 |
+
parts.append(f"Current best-known score: {self.score}")
|
| 446 |
+
|
| 447 |
+
# Give the model your server-side memory snapshot (truncate to keep prompt lean)
|
| 448 |
+
if self.last_memory:
|
| 449 |
+
mem = self._truncate(self.last_memory, 1200)
|
| 450 |
+
parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
|
| 451 |
+
|
| 452 |
+
if self.last_inventory:
|
| 453 |
+
inv = self._truncate(self.last_inventory, 400)
|
| 454 |
+
parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
|
| 455 |
+
|
| 456 |
+
if self.last_map:
|
| 457 |
+
mp = self._truncate(self.last_map, 700)
|
| 458 |
+
parts.append("\n=== MAP (from MCP server) ===\n" + mp)
|
| 459 |
+
|
| 460 |
+
# Recent local history (anti-loop)
|
| 461 |
+
if self.history:
|
| 462 |
+
parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
|
| 463 |
+
for entry in self.history[-3:]:
|
| 464 |
+
action = entry.get("args", {}).get("action", entry["tool"])
|
| 465 |
+
result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
|
| 466 |
+
parts.append(f" > {action} -> {result_short}")
|
| 467 |
+
|
| 468 |
+
if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
|
| 469 |
+
parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
|
| 470 |
+
|
| 471 |
+
# Always include the most recent raw observation
|
| 472 |
+
parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
|
| 473 |
+
parts.append("\nWhat do you do next?")
|
| 474 |
+
|
| 475 |
+
return "\n".join(parts)
|
| 476 |
+
|
| 477 |
+
def _truncate(self, text: str, limit: int) -> str:
|
| 478 |
+
text = text or ""
|
| 479 |
+
if len(text) <= limit:
|
| 480 |
+
return text
|
| 481 |
+
return text[:limit] + "\n...[truncated]"
|
| 482 |
+
|
| 483 |
+
def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
|
| 484 |
+
thought = "No reasoning provided"
|
| 485 |
+
tool_name = "play_action"
|
| 486 |
+
tool_args = {"action": "look"}
|
| 487 |
+
|
| 488 |
+
lines = response.strip().split("\n")
|
| 489 |
+
for line in lines:
|
| 490 |
+
line_clean = line.strip()
|
| 491 |
+
line_upper = line_clean.upper()
|
| 492 |
+
|
| 493 |
+
if line_upper.startswith("THOUGHT:"):
|
| 494 |
+
thought = line_clean.split(":", 1)[1].strip()
|
| 495 |
+
|
| 496 |
+
elif line_upper.startswith("TOOL:"):
|
| 497 |
+
raw_tool = line_clean.split(":", 1)[1].strip().lower()
|
| 498 |
+
raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
|
| 499 |
+
raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
|
| 500 |
+
tool_name = raw_tool
|
| 501 |
+
|
| 502 |
+
elif line_upper.startswith("ARGS:"):
|
| 503 |
+
args_part = line_clean.split(":", 1)[1].strip()
|
| 504 |
+
if not args_part:
|
| 505 |
+
tool_args = {}
|
| 506 |
+
continue
|
| 507 |
+
try:
|
| 508 |
+
args_part = args_part.replace("'", '"')
|
| 509 |
+
tool_args = json.loads(args_part)
|
| 510 |
+
except json.JSONDecodeError:
|
| 511 |
+
match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
|
| 512 |
+
if match:
|
| 513 |
+
tool_args = {"action": match.group(1)}
|
| 514 |
+
else:
|
| 515 |
+
tool_args = {"action": "look"}
|
| 516 |
+
|
| 517 |
+
return thought, tool_name, tool_args
|
| 518 |
+
|
| 519 |
+
def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
|
| 520 |
+
|
| 521 |
+
|
| 522 |
+
if tool_name not in valid_tools:
|
| 523 |
+
if tool_name in ["action", "do", "command"]:
|
| 524 |
+
tool_name = "play_action"
|
| 525 |
+
elif tool_name in ["map", "location"]:
|
| 526 |
+
tool_name = "get_map"
|
| 527 |
+
elif tool_name in ["mem", "state", "status"]:
|
| 528 |
+
tool_name = "memory"
|
| 529 |
+
elif tool_name in ["inv", "items"]:
|
| 530 |
+
tool_name = "inventory"
|
| 531 |
+
else:
|
| 532 |
+
tool_name = "play_action"
|
| 533 |
+
|
| 534 |
+
if tool_name == "play_action":
|
| 535 |
+
action = tool_args.get("action", "look")
|
| 536 |
+
|
| 537 |
+
invalid_verb_map = {
|
| 538 |
+
"check": "examine",
|
| 539 |
+
"inspect": "examine",
|
| 540 |
+
"search": "look",
|
| 541 |
+
"grab": "take",
|
| 542 |
+
"pick": "take",
|
| 543 |
+
"use": "examine",
|
| 544 |
+
"investigate": "examine",
|
| 545 |
+
}
|
| 546 |
+
|
| 547 |
+
words = action.lower().split()
|
| 548 |
+
if words and words[0] in invalid_verb_map:
|
| 549 |
+
words[0] = invalid_verb_map[words[0]]
|
| 550 |
+
action = " ".join(words)
|
| 551 |
+
|
| 552 |
+
action = action.lower().strip()
|
| 553 |
+
action = action.replace("**", "").replace("*", "").replace("`", "")
|
| 554 |
+
action = " ".join(action.split())
|
| 555 |
+
|
| 556 |
+
tool_args["action"] = action
|
| 557 |
+
|
| 558 |
+
return tool_name, tool_args
|
| 559 |
+
|
| 560 |
+
def _extract_result(self, result) -> str:
|
| 561 |
+
if hasattr(result, 'content') and result.content:
|
| 562 |
+
return result.content[0].text
|
| 563 |
+
if isinstance(result, list) and result:
|
| 564 |
+
return result[0].text if hasattr(result[0], 'text') else str(result[0])
|
| 565 |
+
return str(result)
|
| 566 |
+
|
| 567 |
+
def _update_score(self, text: str) -> None:
|
| 568 |
+
patterns = [
|
| 569 |
+
r'\[Score:\s*(\d+)',
|
| 570 |
+
r'Score:\s*(\d+)\b',
|
| 571 |
+
]
|
| 572 |
+
for pattern in patterns:
|
| 573 |
+
match = re.search(pattern, text, re.IGNORECASE)
|
| 574 |
+
if match:
|
| 575 |
+
self.score = max(self.score, int(match.group(1)))
|
| 576 |
+
|
| 577 |
+
def _is_game_over(self, text: str) -> bool:
|
| 578 |
+
game_over_phrases = [
|
| 579 |
+
"game over",
|
| 580 |
+
"you have died",
|
| 581 |
+
"you are dead",
|
| 582 |
+
"*** you have died ***",
|
| 583 |
+
]
|
| 584 |
+
text_lower = (text or "").lower()
|
| 585 |
+
return any(phrase in text_lower for phrase in game_over_phrases)
|
| 586 |
|
| 587 |
|
| 588 |
# =============================================================================
|
| 589 |
+
# Local Testing
|
| 590 |
# =============================================================================
|
| 591 |
|
| 592 |
async def test_agent():
|
|
|
|
| 593 |
from fastmcp import Client
|
| 594 |
+
|
|
|
|
|
|
|
|
|
|
| 595 |
agent = StudentAgent()
|
| 596 |
+
|
| 597 |
+
async with Client("mcp_server.py") as client:
|
| 598 |
result = await agent.run(
|
| 599 |
client=client,
|
| 600 |
game="zork1",
|
| 601 |
+
max_steps=20,
|
| 602 |
seed=42,
|
| 603 |
verbose=True,
|
| 604 |
)
|
| 605 |
+
|
| 606 |
+
print(f"\n{'=' * 50}")
|
| 607 |
+
print(f"Final Score: {result.final_score}")
|
| 608 |
print(f"Moves: {result.moves}")
|
| 609 |
+
print(f"Locations: {len(result.locations_visited)}")
|
| 610 |
|
| 611 |
|
| 612 |
if __name__ == "__main__":
|
mcp_server.py
CHANGED
|
@@ -45,53 +45,338 @@ mcp = FastMCP("Student Text Adventure Server")
|
|
| 45 |
# Game State Management
|
| 46 |
# =============================================================================
|
| 47 |
|
|
|
|
|
|
|
|
|
|
| 48 |
class GameManager:
|
| 49 |
"""
|
| 50 |
Manages the text adventure game state.
|
| 51 |
-
|
| 52 |
-
|
| 53 |
- Action history (for memory tool)
|
| 54 |
- Explored locations (for mapping)
|
| 55 |
- Current score and moves
|
|
|
|
| 56 |
"""
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
def __init__(self):
|
| 59 |
self.env: TextAdventureEnv = None
|
| 60 |
self.state = None
|
| 61 |
self.game_name: str = ""
|
| 62 |
-
|
| 63 |
-
#
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
|
|
|
| 67 |
def initialize(self, game: str = "zork1"):
|
| 68 |
"""Initialize or reset the game."""
|
| 69 |
self.game_name = game
|
| 70 |
self.env = TextAdventureEnv(game)
|
| 71 |
self.state = self.env.reset()
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
return self.state.observation
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
def step(self, action: str) -> str:
|
| 76 |
"""Execute an action and return the result."""
|
| 77 |
if self.env is None:
|
| 78 |
self.initialize()
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
self.state = self.env.step(action)
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
#
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
def get_score(self) -> int:
|
| 89 |
"""Get current score."""
|
| 90 |
return self.state.score if self.state else 0
|
| 91 |
-
|
| 92 |
def get_moves(self) -> int:
|
| 93 |
"""Get number of moves taken."""
|
| 94 |
return self.state.moves if self.state else 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
|
| 97 |
# Global game manager
|
|
@@ -136,11 +421,37 @@ def play_action(action: str) -> str:
|
|
| 136 |
# TODO: You might want to include score changes in the response
|
| 137 |
|
| 138 |
result = game.step(action)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
# Optional: Append score info
|
| 141 |
# result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 142 |
|
| 143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
|
| 146 |
# TODO: Implement additional tools to help your agent
|
|
|
|
| 45 |
# Game State Management
|
| 46 |
# =============================================================================
|
| 47 |
|
| 48 |
+
import re
|
| 49 |
+
from typing import Optional
|
| 50 |
+
|
| 51 |
class GameManager:
|
| 52 |
"""
|
| 53 |
Manages the text adventure game state.
|
| 54 |
+
|
| 55 |
+
Extended tracking:
|
| 56 |
- Action history (for memory tool)
|
| 57 |
- Explored locations (for mapping)
|
| 58 |
- Current score and moves
|
| 59 |
+
- Current location (best-effort, robust across games)
|
| 60 |
"""
|
| 61 |
+
|
| 62 |
+
# Lines that are often NOT room titles across many IF games
|
| 63 |
+
_HEADER_LIKE_PATTERNS = [
|
| 64 |
+
r"^\s*score\s*[:=]\s*\d+",
|
| 65 |
+
r"^\s*moves?\s*[:=]\s*\d+",
|
| 66 |
+
r"^\s*turns?\s*[:=]\s*\d+",
|
| 67 |
+
r"^\s*time\s*[:=]\s*",
|
| 68 |
+
r"^\s*health\s*[:=]\s*\d+",
|
| 69 |
+
r"^\s*location\s*[:=]\s*",
|
| 70 |
+
r"^\s*\[.*\]\s*$", # bracket-only status lines
|
| 71 |
+
r"^\s*\(.*\)\s*$", # parenthetical-only lines
|
| 72 |
+
r"^\s*you\s+(are|see|can)\b", # narrative sentence starters
|
| 73 |
+
]
|
| 74 |
+
# Movement commands we consider for mapping (Zork-style + abbreviations)
|
| 75 |
+
_MOVE_CMDS = {
|
| 76 |
+
"north", "south", "east", "west", "up", "down", "enter", "exit",
|
| 77 |
+
"n", "s", "e", "w", "u", "d"
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
# Common failure phrases when trying to move (best-effort, not perfect)
|
| 81 |
+
_MOVE_FAIL_PHRASES = [
|
| 82 |
+
"you can't go", "you cannot go", "can't go that way", "cannot go that way",
|
| 83 |
+
"you can't go that way", "you cannot go that way",
|
| 84 |
+
"you can't", "you cannot",
|
| 85 |
+
"there is no way", "you can't see any way", "you see no way",
|
| 86 |
+
"blocked", "closed", "won't open", "is locked", "locked",
|
| 87 |
+
"too dark", "pitch black"
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
def _is_movement_action(self, action: str) -> bool:
|
| 91 |
+
"""Return True if this action is a movement command we track."""
|
| 92 |
+
a = (action or "").strip().lower()
|
| 93 |
+
return a in self._MOVE_CMDS
|
| 94 |
+
|
| 95 |
+
def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
|
| 96 |
+
"""
|
| 97 |
+
Decide whether a move likely succeeded.
|
| 98 |
+
Strong signal: location label changed.
|
| 99 |
+
Negative signal: failure phrases in observation.
|
| 100 |
+
"""
|
| 101 |
+
if new_loc and old_loc and new_loc != old_loc:
|
| 102 |
+
return True
|
| 103 |
+
|
| 104 |
+
text = (observation or "").lower()
|
| 105 |
+
if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
|
| 106 |
+
return False
|
| 107 |
+
|
| 108 |
+
# If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
|
| 109 |
+
return False
|
| 110 |
+
|
| 111 |
+
def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
|
| 112 |
+
"""Record a directed edge old_loc --action--> new_loc in explored_locations."""
|
| 113 |
+
if not old_loc or not new_loc:
|
| 114 |
+
return
|
| 115 |
+
self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
|
| 116 |
+
|
| 117 |
+
|
| 118 |
def __init__(self):
|
| 119 |
self.env: TextAdventureEnv = None
|
| 120 |
self.state = None
|
| 121 |
self.game_name: str = ""
|
| 122 |
+
|
| 123 |
+
# Tracking for agent-support tools
|
| 124 |
+
self.history: list[tuple[str, str]] = []
|
| 125 |
+
self.explored_locations: dict[str, set[str]] = {}
|
| 126 |
+
self.current_location: str = "Unknown"
|
| 127 |
+
|
| 128 |
def initialize(self, game: str = "zork1"):
|
| 129 |
"""Initialize or reset the game."""
|
| 130 |
self.game_name = game
|
| 131 |
self.env = TextAdventureEnv(game)
|
| 132 |
self.state = self.env.reset()
|
| 133 |
+
|
| 134 |
+
# Reset tracking
|
| 135 |
+
self.history = []
|
| 136 |
+
self.explored_locations = {}
|
| 137 |
+
self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
|
| 138 |
+
|
| 139 |
return self.state.observation
|
| 140 |
+
|
| 141 |
+
def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
|
| 142 |
+
"""
|
| 143 |
+
Best-effort location extraction from the observation text.
|
| 144 |
+
|
| 145 |
+
Strategy:
|
| 146 |
+
1) Split into lines, skip empties
|
| 147 |
+
2) Skip lines that look like status bars / headers / pure brackets
|
| 148 |
+
3) Prefer a short, title-like line (room name)
|
| 149 |
+
4) If nothing confident, return fallback (usually previous location)
|
| 150 |
+
"""
|
| 151 |
+
if not observation:
|
| 152 |
+
return fallback or "Unknown"
|
| 153 |
+
|
| 154 |
+
lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
|
| 155 |
+
if not lines:
|
| 156 |
+
return fallback or "Unknown"
|
| 157 |
+
|
| 158 |
+
header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
|
| 159 |
+
|
| 160 |
+
def looks_like_header(line: str) -> bool:
|
| 161 |
+
return any(rx.search(line) for rx in header_res)
|
| 162 |
+
|
| 163 |
+
def looks_like_title(line: str) -> bool:
|
| 164 |
+
# Many room titles are short and not ending with punctuation.
|
| 165 |
+
if len(line) > 60:
|
| 166 |
+
return False
|
| 167 |
+
if line.endswith((".", "!", "?", ";", ":")):
|
| 168 |
+
return False
|
| 169 |
+
# Too many digits usually means a status line.
|
| 170 |
+
if sum(ch.isdigit() for ch in line) >= 3:
|
| 171 |
+
return False
|
| 172 |
+
return True
|
| 173 |
+
|
| 174 |
+
# First pass: first "title-like" line that isn't header-like
|
| 175 |
+
for line in lines[:8]: # only inspect top chunk; titles are usually early
|
| 176 |
+
if looks_like_header(line):
|
| 177 |
+
continue
|
| 178 |
+
if looks_like_title(line):
|
| 179 |
+
return line
|
| 180 |
+
|
| 181 |
+
# Second pass: first non-header line
|
| 182 |
+
for line in lines[:8]:
|
| 183 |
+
if not looks_like_header(line):
|
| 184 |
+
return line
|
| 185 |
+
|
| 186 |
+
return fallback or "Unknown"
|
| 187 |
+
|
| 188 |
def step(self, action: str) -> str:
|
| 189 |
"""Execute an action and return the result."""
|
| 190 |
if self.env is None:
|
| 191 |
self.initialize()
|
| 192 |
+
|
| 193 |
+
# Save old location before action
|
| 194 |
+
old_location = self.current_location
|
| 195 |
+
|
| 196 |
+
# Apply action to the real game
|
| 197 |
self.state = self.env.step(action)
|
| 198 |
+
obs = self.state.observation
|
| 199 |
+
|
| 200 |
+
# Track history (keep last 50)
|
| 201 |
+
self.history.append((action, obs))
|
| 202 |
+
if len(self.history) > 50:
|
| 203 |
+
self.history = self.history[-50:]
|
| 204 |
+
|
| 205 |
+
# Extract new location (fallback to old)
|
| 206 |
+
new_location = self._extract_location(obs, fallback=old_location)
|
| 207 |
+
|
| 208 |
+
# Update map only if it was a movement attempt AND it likely succeeded
|
| 209 |
+
action_norm = (action or "").strip().lower()
|
| 210 |
+
if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
|
| 211 |
+
self._update_map(action_norm, old_location, new_location)
|
| 212 |
+
|
| 213 |
+
# Finally update current location
|
| 214 |
+
self.current_location = new_location
|
| 215 |
+
|
| 216 |
+
return obs
|
| 217 |
+
|
| 218 |
+
|
| 219 |
def get_score(self) -> int:
|
| 220 |
"""Get current score."""
|
| 221 |
return self.state.score if self.state else 0
|
| 222 |
+
|
| 223 |
def get_moves(self) -> int:
|
| 224 |
"""Get number of moves taken."""
|
| 225 |
return self.state.moves if self.state else 0
|
| 226 |
+
def _extract_facts(self, observation: str) -> dict:
|
| 227 |
+
"""
|
| 228 |
+
Best-effort extraction of useful 'facts' from the current observation text.
|
| 229 |
+
This is intentionally heuristic so it can work across many games.
|
| 230 |
+
"""
|
| 231 |
+
obs = observation or ""
|
| 232 |
+
text = obs.strip()
|
| 233 |
+
lower = text.lower()
|
| 234 |
+
|
| 235 |
+
# --- Exits mentioned (simple direction scan) ---
|
| 236 |
+
directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
|
| 237 |
+
exits_found = []
|
| 238 |
+
for d in directions:
|
| 239 |
+
# We detect directions as whole words to reduce false matches
|
| 240 |
+
if re.search(rf"\b{re.escape(d)}\b", lower):
|
| 241 |
+
exits_found.append(d)
|
| 242 |
+
exits_found = sorted(set(exits_found))
|
| 243 |
+
|
| 244 |
+
# --- Visible things (very light heuristics) ---
|
| 245 |
+
# We look for common IF patterns like "You see ... here." / "There is ... here."
|
| 246 |
+
visible_candidates: list[str] = []
|
| 247 |
+
|
| 248 |
+
patterns = [
|
| 249 |
+
r"you see (.+?) here\.",
|
| 250 |
+
r"you can see (.+?) here\.",
|
| 251 |
+
r"there is (.+?) here\.",
|
| 252 |
+
r"there are (.+?) here\.",
|
| 253 |
+
r"you notice (.+?)\.",
|
| 254 |
+
]
|
| 255 |
+
for pat in patterns:
|
| 256 |
+
for m in re.finditer(pat, lower):
|
| 257 |
+
chunk = m.group(1).strip()
|
| 258 |
+
if chunk:
|
| 259 |
+
visible_candidates.append(chunk)
|
| 260 |
+
|
| 261 |
+
# Clean visible candidates a bit (split simple lists, avoid huge strings)
|
| 262 |
+
visible = []
|
| 263 |
+
for chunk in visible_candidates:
|
| 264 |
+
# Split on commas and "and" to get smaller pieces
|
| 265 |
+
parts = re.split(r",|\band\b", chunk)
|
| 266 |
+
for p in parts:
|
| 267 |
+
item = p.strip(" .;:!?\t")
|
| 268 |
+
if 1 <= len(item) <= 40:
|
| 269 |
+
visible.append(item)
|
| 270 |
+
|
| 271 |
+
# Deduplicate and limit (so memory stays compact)
|
| 272 |
+
visible = sorted(set(visible))[:10]
|
| 273 |
+
|
| 274 |
+
return {
|
| 275 |
+
"exits_mentioned": exits_found,
|
| 276 |
+
"visible": visible,
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
def get_memory(self) -> str:
|
| 280 |
+
"""
|
| 281 |
+
LLM-friendly summary of current game state.
|
| 282 |
+
Format: Facts first, then recent actions, then the raw observation.
|
| 283 |
+
"""
|
| 284 |
+
game = self.game_name or "Unknown"
|
| 285 |
+
location = self.current_location or "Unknown"
|
| 286 |
+
score = self.get_score()
|
| 287 |
+
moves = self.get_moves()
|
| 288 |
+
|
| 289 |
+
# Recent actions (keep short and anti-loop)
|
| 290 |
+
recent = self.history[-5:] if self.history else []
|
| 291 |
+
if recent:
|
| 292 |
+
recent_lines = []
|
| 293 |
+
for a, r in recent:
|
| 294 |
+
snippet = (r or "").replace("\n", " ").strip()
|
| 295 |
+
if len(snippet) > 80:
|
| 296 |
+
snippet = snippet[:80] + "..."
|
| 297 |
+
recent_lines.append(f"- {a} -> {snippet}")
|
| 298 |
+
recent_str = "\n".join(recent_lines)
|
| 299 |
+
else:
|
| 300 |
+
recent_str = "(none yet)"
|
| 301 |
+
|
| 302 |
+
# Facts extracted from current observation
|
| 303 |
+
obs = self.state.observation if self.state else ""
|
| 304 |
+
facts = self._extract_facts(obs)
|
| 305 |
+
|
| 306 |
+
exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
|
| 307 |
+
visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
|
| 308 |
+
|
| 309 |
+
return (
|
| 310 |
+
"STATE\n"
|
| 311 |
+
f"Game: {game}\n"
|
| 312 |
+
f"Location: {location}\n"
|
| 313 |
+
f"Score: {score} Moves: {moves}\n"
|
| 314 |
+
f"Visible (best effort): {visible_txt}\n"
|
| 315 |
+
f"Exits mentioned (best effort): {exits_txt}\n"
|
| 316 |
+
"\n"
|
| 317 |
+
"RECENT\n"
|
| 318 |
+
f"{recent_str}\n"
|
| 319 |
+
"\n"
|
| 320 |
+
"OBSERVATION\n"
|
| 321 |
+
f"{obs}"
|
| 322 |
+
)
|
| 323 |
+
def get_map(self) -> str:
|
| 324 |
+
"""
|
| 325 |
+
Return a readable map of explored locations.
|
| 326 |
+
Uses explored_locations built during movement actions.
|
| 327 |
+
|
| 328 |
+
Output is stable + compact for LLM use.
|
| 329 |
+
"""
|
| 330 |
+
if not self.explored_locations:
|
| 331 |
+
return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
|
| 332 |
+
|
| 333 |
+
lines = ["MAP", "Explored locations and exits:"]
|
| 334 |
+
for loc in sorted(self.explored_locations.keys()):
|
| 335 |
+
exits = sorted(self.explored_locations[loc])
|
| 336 |
+
lines.append(f"\n* {loc}")
|
| 337 |
+
for e in exits:
|
| 338 |
+
lines.append(f" - {e}")
|
| 339 |
+
|
| 340 |
+
lines.append(f"\n[Current] {self.current_location}")
|
| 341 |
+
return "\n".join(lines)
|
| 342 |
+
def get_inventory(self) -> str:
|
| 343 |
+
"""
|
| 344 |
+
Return inventory in a robust way across different games/envs.
|
| 345 |
+
|
| 346 |
+
Strategy:
|
| 347 |
+
1) If state.inventory exists and is non-empty -> format it
|
| 348 |
+
2) Otherwise, fall back to issuing the command "inventory"
|
| 349 |
+
through the environment and return that observation
|
| 350 |
+
"""
|
| 351 |
+
# 1) Try structured inventory if provided by env
|
| 352 |
+
items = []
|
| 353 |
+
if self.state is not None and hasattr(self.state, "inventory"):
|
| 354 |
+
inv = getattr(self.state, "inventory")
|
| 355 |
+
if inv:
|
| 356 |
+
# Normalize to strings
|
| 357 |
+
try:
|
| 358 |
+
items = [str(x).strip() for x in inv if str(x).strip()]
|
| 359 |
+
except Exception:
|
| 360 |
+
items = []
|
| 361 |
+
|
| 362 |
+
if items:
|
| 363 |
+
# Keep it simple and safe: just join a cleaned list
|
| 364 |
+
# (Avoid overly aggressive parsing that breaks across games)
|
| 365 |
+
items = sorted(set(items))
|
| 366 |
+
return "INVENTORY\n" + ", ".join(items)
|
| 367 |
+
|
| 368 |
+
# 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
|
| 369 |
+
# NOTE: We do not want to record this as agent history/map; this is a server-side query.
|
| 370 |
+
if self.env is None:
|
| 371 |
+
self.initialize()
|
| 372 |
+
|
| 373 |
+
try:
|
| 374 |
+
tmp_state = self.env.step("inventory")
|
| 375 |
+
inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
|
| 376 |
+
except Exception:
|
| 377 |
+
inv_text = "Inventory: (unable to retrieve)"
|
| 378 |
+
|
| 379 |
+
return "INVENTORY\n" + inv_text.strip()
|
| 380 |
|
| 381 |
|
| 382 |
# Global game manager
|
|
|
|
| 421 |
# TODO: You might want to include score changes in the response
|
| 422 |
|
| 423 |
result = game.step(action)
|
| 424 |
+
|
| 425 |
+
# Append score/moves for clearer feedback (LLM-friendly, low noise)
|
| 426 |
+
result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 427 |
+
return result
|
| 428 |
|
| 429 |
# Optional: Append score info
|
| 430 |
# result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 431 |
|
| 432 |
+
|
| 433 |
+
@mcp.tool()
|
| 434 |
+
def memory() -> str:
|
| 435 |
+
"""
|
| 436 |
+
Return an LLM-friendly summary of the current game state.
|
| 437 |
+
"""
|
| 438 |
+
game = get_game()
|
| 439 |
+
return game.get_memory()
|
| 440 |
+
@mcp.tool()
|
| 441 |
+
def get_map() -> str:
|
| 442 |
+
"""
|
| 443 |
+
Return a map of explored locations and recorded exits.
|
| 444 |
+
"""
|
| 445 |
+
game = get_game()
|
| 446 |
+
return game.get_map()
|
| 447 |
+
|
| 448 |
+
@mcp.tool()
|
| 449 |
+
def inventory() -> str:
|
| 450 |
+
"""
|
| 451 |
+
Return the player's inventory in a robust way.
|
| 452 |
+
"""
|
| 453 |
+
game = get_game()
|
| 454 |
+
return game.get_inventory()
|
| 455 |
|
| 456 |
|
| 457 |
# TODO: Implement additional tools to help your agent
|