File size: 6,171 Bytes
e1da269
 
 
 
 
 
0eac350
e1da269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57900f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: Text Adventure Agent Submission
emoji: "\U0001F5FA"
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "5.12.0"
app_file: app.py
pinned: false
license: mit
---

# Text Adventure Agent Submission

## Overview

This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.

## Approach

<!-- Describe your approach here -->

- What strategy does your agent use?
- What tools did you implement in your MCP server?
- Any interesting techniques or optimizations?

## Files

| File | Description |
|------|-------------|
| `agent.py` | ReAct agent with `StudentAgent` class |
| `mcp_server.py` | MCP server with game interaction tools |
| `app.py` | Gradio interface for HF Space |
| `requirements.txt` | Additional dependencies |

## How to Submit

1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
2. Clone your fork locally
3. Implement your agent in `agent.py` and `mcp_server.py`
4. Test locally (see below)
5. Push your changes to your Space
6. Submit your Space URL on the course platform

## Local Testing

```bash
# Install dependencies
pip install -r requirements.txt

# Test the MCP server interactively
fastmcp dev mcp_server.py

# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20

# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3
```





---





# 🧠 MCP ReAct Agent for Text Adventure Games

This project implements a complete **MCP-based ReAct agent** that plays classic text adventure games (e.g., `zork1`) using a tool-driven architecture.

It consists of:

* An **MCP server** exposing the game environment as structured tools
* A **ReAct-style LLM agent** that reasons and acts via those tools
* Loop detection, score tracking, and structured parsing
* Experimental improvements and debugging attempts

---

# 📦 Project Structure

## 1️⃣ MCP Server (`mcp_server.py`)

Built using `FastMCP`, this server wraps a `TextAdventureEnv` and exposes game functionality as callable tools.

### Core Features

#### 🎮 Game State Management

The `GameState` class manages:

* Current environment state
* Score and move tracking
* Action history (last 50 steps)
* Explored locations (map tracking)
* Inventory parsing
* Location extraction from observations

---

## 🛠️ Exposed MCP Tools

The server provides the following tools:

### `play_action`

Executes a game command (e.g., `north`, `take lamp`, `open mailbox`).

Returns:

* Game observation
* Score updates
* Move count
* Game over notice

---

### `memory`

Returns a structured summary of:

* Current location
* Score
* Moves
* Recent actions
* Current observation

This helps the agent reason about the current state.

---

### `get_map`

Displays explored locations and directional transitions discovered so far.

---

### `inventory`

Returns cleaned inventory information, parsing object strings from Jericho.

---

### `get_valid_actions`

A fallback tool that returns a **fixed list of possible actions** plus context-aware object interactions based on keywords in the observation.

Note:

* `env.get_valid_actions()` was tested and debugged.
* It **did not work reliably** in this setup.
* Therefore, I implemented a **manually defined valid action set**.
* However, using fixed valid actions **did not improve the score**.

---

### `get_walkthrough`

Returns the official Jericho walkthrough (not used in `agent.py`).

---

### `get_world_objects`

Returns all known world objects from Jericho.

---

# 🤖 ReAct Agent (`agent.py`)

The agent is a complete ReAct implementation using:

* Thought → Tool → Observation loop
* Structured output parsing
* Loop detection
* Score extraction
* Action validation

It uses:

```
Qwen/Qwen2.5-72B-Instruct
```

via HuggingFace Inference API.

---

# Agent Architecture

## ReAct Loop

At each step:

1. Build prompt with:

   * Current score
   * Recent actions
   * Current observation
2. Call LLM
3. Parse structured response:

   ```
   THOUGHT:
   TOOL:
   ARGS:
   ```
4. Validate tool call
5. Execute tool via MCP
6. Update:

   * Score
   * History
   * Visited locations
7. Detect loops

---

## Loop Detection

If the agent repeats the same action 3 times:

* It automatically forces a `"look"` action.
* A warning is injected into the prompt.

---

## Tool Validation & Auto-Fixes

The agent corrects:

* Invalid tool names
* Unsupported verbs (e.g., `inspect → examine`)
* Markdown artifacts in responses
* JSON formatting errors

---

## Score Tracking

Score is extracted from:

* `Score: X`
* `[Score: X | Moves: Y]`
* Case-insensitive regex matching

The agent keeps the maximum observed score.

---

# 🔬 Experiments & Debugging Attempts

## 1️⃣ Fixed Valid Actions

I replaced `env.get_valid_actions()` with a manually defined action set.

* Added movement commands
* Basic verbs
* Context-aware object interactions (lamp, key, mailbox, etc.)

**Result:**

* Did not improve score (in contrary it became worse)
* Agent still plateaued

---

## 2️⃣ Debugging `env.get_valid_actions()`

I attempted to use and debug:

```python
env.get_valid_actions()
```

However:

* It consistently failed or returned unusable results
* Therefore, it was not used in the final setup

---

## 3️⃣ Prompt Enrichment with Memory + History

I experimented with:

* Injecting full memory output into the prompt
* Including longer history traces
* Combining map information + memory + past actions

**Issue:**

* Prompt grew very large quickly
* Context length became inefficient
* No noticeable improvement in performance
* Slower inference due to longer inputs

Therefore, I reverted to a **lightweight context strategy**:

* Last 3 actions
* Current observation
* Current score
* Loop warning if necessary

---

# 📊 Current Performance Characteristics

* The agent explores systematically
* Picks up obvious items (lamp, mailbox interactions, etc.)
* Avoids simple loops
* Tracks visited locations
* Maintains structured reasoning

However:

* No planning memory across long horizons
* No true valid action constraint from the environment