DevZoneX commited on
Commit
57900f7
·
1 Parent(s): 7ec4d32

Final Commit

Browse files
Files changed (2) hide show
  1. README.md +263 -0
  2. mcp_server.py +92 -0
README.md CHANGED
@@ -57,3 +57,266 @@ python run_agent.py --agent . --game lostpig -v -n 20
57
  # Run evaluation
58
  python -m evaluation.evaluate -s . -g lostpig -t 3
59
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  # Run evaluation
58
  python -m evaluation.evaluate -s . -g lostpig -t 3
59
  ```
60
+
61
+
62
+
63
+
64
+
65
+ ---
66
+
67
+
68
+
69
+
70
+
71
+ # 🧠 MCP ReAct Agent for Text Adventure Games
72
+
73
+ This project implements a complete **MCP-based ReAct agent** that plays classic text adventure games (e.g., `zork1`) using a tool-driven architecture.
74
+
75
+ It consists of:
76
+
77
+ * An **MCP server** exposing the game environment as structured tools
78
+ * A **ReAct-style LLM agent** that reasons and acts via those tools
79
+ * Loop detection, score tracking, and structured parsing
80
+ * Experimental improvements and debugging attempts
81
+
82
+ ---
83
+
84
+ # 📦 Project Structure
85
+
86
+ ## 1️⃣ MCP Server (`mcp_server.py`)
87
+
88
+ Built using `FastMCP`, this server wraps a `TextAdventureEnv` and exposes game functionality as callable tools.
89
+
90
+ ### Core Features
91
+
92
+ #### 🎮 Game State Management
93
+
94
+ The `GameState` class manages:
95
+
96
+ * Current environment state
97
+ * Score and move tracking
98
+ * Action history (last 50 steps)
99
+ * Explored locations (map tracking)
100
+ * Inventory parsing
101
+ * Location extraction from observations
102
+
103
+ ---
104
+
105
+ ## 🛠️ Exposed MCP Tools
106
+
107
+ The server provides the following tools:
108
+
109
+ ### `play_action`
110
+
111
+ Executes a game command (e.g., `north`, `take lamp`, `open mailbox`).
112
+
113
+ Returns:
114
+
115
+ * Game observation
116
+ * Score updates
117
+ * Move count
118
+ * Game over notice
119
+
120
+ ---
121
+
122
+ ### `memory`
123
+
124
+ Returns a structured summary of:
125
+
126
+ * Current location
127
+ * Score
128
+ * Moves
129
+ * Recent actions
130
+ * Current observation
131
+
132
+ This helps the agent reason about the current state.
133
+
134
+ ---
135
+
136
+ ### `get_map`
137
+
138
+ Displays explored locations and directional transitions discovered so far.
139
+
140
+ ---
141
+
142
+ ### `inventory`
143
+
144
+ Returns cleaned inventory information, parsing object strings from Jericho.
145
+
146
+ ---
147
+
148
+ ### `get_valid_actions`
149
+
150
+ A fallback tool that returns a **fixed list of possible actions** plus context-aware object interactions based on keywords in the observation.
151
+
152
+ Note:
153
+
154
+ * `env.get_valid_actions()` was tested and debugged.
155
+ * It **did not work reliably** in this setup.
156
+ * Therefore, I implemented a **manually defined valid action set**.
157
+ * However, using fixed valid actions **did not improve the score**.
158
+
159
+ ---
160
+
161
+ ### `get_walkthrough`
162
+
163
+ Returns the official Jericho walkthrough (not used in `agent.py`).
164
+
165
+ ---
166
+
167
+ ### `get_world_objects`
168
+
169
+ Returns all known world objects from Jericho.
170
+
171
+ ---
172
+
173
+ # 🤖 ReAct Agent (`agent.py`)
174
+
175
+ The agent is a complete ReAct implementation using:
176
+
177
+ * Thought → Tool → Observation loop
178
+ * Structured output parsing
179
+ * Loop detection
180
+ * Score extraction
181
+ * Action validation
182
+
183
+ It uses:
184
+
185
+ ```
186
+ Qwen/Qwen2.5-72B-Instruct
187
+ ```
188
+
189
+ via HuggingFace Inference API.
190
+
191
+ ---
192
+
193
+ # Agent Architecture
194
+
195
+ ## ReAct Loop
196
+
197
+ At each step:
198
+
199
+ 1. Build prompt with:
200
+
201
+ * Current score
202
+ * Recent actions
203
+ * Current observation
204
+ 2. Call LLM
205
+ 3. Parse structured response:
206
+
207
+ ```
208
+ THOUGHT:
209
+ TOOL:
210
+ ARGS:
211
+ ```
212
+ 4. Validate tool call
213
+ 5. Execute tool via MCP
214
+ 6. Update:
215
+
216
+ * Score
217
+ * History
218
+ * Visited locations
219
+ 7. Detect loops
220
+
221
+ ---
222
+
223
+ ## Loop Detection
224
+
225
+ If the agent repeats the same action 3 times:
226
+
227
+ * It automatically forces a `"look"` action.
228
+ * A warning is injected into the prompt.
229
+
230
+ ---
231
+
232
+ ## Tool Validation & Auto-Fixes
233
+
234
+ The agent corrects:
235
+
236
+ * Invalid tool names
237
+ * Unsupported verbs (e.g., `inspect → examine`)
238
+ * Markdown artifacts in responses
239
+ * JSON formatting errors
240
+
241
+ ---
242
+
243
+ ## Score Tracking
244
+
245
+ Score is extracted from:
246
+
247
+ * `Score: X`
248
+ * `[Score: X | Moves: Y]`
249
+ * Case-insensitive regex matching
250
+
251
+ The agent keeps the maximum observed score.
252
+
253
+ ---
254
+
255
+ # 🔬 Experiments & Debugging Attempts
256
+
257
+ ## 1️⃣ Fixed Valid Actions
258
+
259
+ I replaced `env.get_valid_actions()` with a manually defined action set.
260
+
261
+ * Added movement commands
262
+ * Basic verbs
263
+ * Context-aware object interactions (lamp, key, mailbox, etc.)
264
+
265
+ **Result:**
266
+
267
+ * Did not improve score (in contrary it became worse)
268
+ * Agent still plateaued
269
+
270
+ ---
271
+
272
+ ## 2️⃣ Debugging `env.get_valid_actions()`
273
+
274
+ I attempted to use and debug:
275
+
276
+ ```python
277
+ env.get_valid_actions()
278
+ ```
279
+
280
+ However:
281
+
282
+ * It consistently failed or returned unusable results
283
+ * Therefore, it was not used in the final setup
284
+
285
+ ---
286
+
287
+ ## 3️⃣ Prompt Enrichment with Memory + History
288
+
289
+ I experimented with:
290
+
291
+ * Injecting full memory output into the prompt
292
+ * Including longer history traces
293
+ * Combining map information + memory + past actions
294
+
295
+ **Issue:**
296
+
297
+ * Prompt grew very large quickly
298
+ * Context length became inefficient
299
+ * No noticeable improvement in performance
300
+ * Slower inference due to longer inputs
301
+
302
+ Therefore, I reverted to a **lightweight context strategy**:
303
+
304
+ * Last 3 actions
305
+ * Current observation
306
+ * Current score
307
+ * Loop warning if necessary
308
+
309
+ ---
310
+
311
+ # 📊 Current Performance Characteristics
312
+
313
+ * The agent explores systematically
314
+ * Picks up obvious items (lamp, mailbox interactions, etc.)
315
+ * Avoids simple loops
316
+ * Tracks visited locations
317
+ * Maintains structured reasoning
318
+
319
+ However:
320
+
321
+ * No planning memory across long horizons
322
+ * No true valid action constraint from the environment
mcp_server.py CHANGED
@@ -187,6 +187,98 @@ def inventory() -> str:
187
  """
188
  return get_game().get_inventory()
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  # =============================================================================
192
  # Main
 
187
  """
188
  return get_game().get_inventory()
189
 
190
+ @mcp.tool()
191
+ def get_valid_actions() -> str:
192
+ """
193
+ Return a list of valid actions the agent can take.
194
+ Avoids calling env.get_valid_actions(). I have tested env.get_valid_actions() but it does nit work at all. Therfore I have tested with fixed valid actions.
195
+ """
196
+ game = get_game()
197
+
198
+ if not game.env:
199
+ return "Game environment not initialized."
200
+
201
+ # Standard movement & basic verbs
202
+ actions = [
203
+ "north", "south", "east", "west",
204
+ "up", "down", "enter", "exit",
205
+ "look", "inventory", "take all",
206
+ "open mailbox", "read", "turn on lamp"
207
+ ]
208
+
209
+ # Optionally, add objects in current observation
210
+ obs = game.state.observation.lower()
211
+ objects = []
212
+ for word in ["lamp", "key", "mailbox", "sword", "coin"]:
213
+ if word in obs:
214
+ objects.append(f"take {word}")
215
+ objects.append(f"examine {word}")
216
+ objects.append(f"open {word}")
217
+
218
+ actions.extend(objects)
219
+
220
+ return ", ".join(sorted(set(actions)))
221
+
222
+ @mcp.tool()
223
+ def get_walkthrough() -> str:
224
+ """
225
+ Get the official Jericho walkthrough for the current game. THIS TOOL IS NOT USED IN AGENT.PY
226
+
227
+ Returns:
228
+ A step-by-step optimal solution path.
229
+ """
230
+ game = get_game()
231
+
232
+ if not game.env or not game.env.env:
233
+ return "Game environment not initialized."
234
+
235
+ try:
236
+ walkthrough = game.env.env.get_walkthrough()
237
+ except Exception as e:
238
+ return f"Could not retrieve walkthrough: {e}"
239
+
240
+ if not walkthrough:
241
+ return "No walkthrough available for this game."
242
+
243
+ output = ["Official Walkthrough:\n"]
244
+
245
+ for i, action in enumerate(walkthrough, 1):
246
+ output.append(f"{i}. {action}")
247
+
248
+ return "\n".join(output)
249
+
250
+ @mcp.tool()
251
+ def get_world_objects() -> str:
252
+ """
253
+ Get all known objects in the game world (from Jericho).
254
+
255
+ Returns:
256
+ A list of objects and their locations.
257
+ """
258
+ game = get_game()
259
+
260
+ if not game.env or not game.env.env:
261
+ return "Game environment not initialized."
262
+
263
+ try:
264
+ objects = game.env.env.get_world_objects()
265
+ except Exception as e:
266
+ return f"Could not retrieve world objects: {e}"
267
+
268
+ if not objects:
269
+ return "No world objects found."
270
+
271
+ output = ["World Objects:\n"]
272
+
273
+ for obj in objects:
274
+ if isinstance(obj, dict):
275
+ name = obj.get("name", "Unknown")
276
+ loc = obj.get("location", "Unknown")
277
+ output.append(f"- {name} (Location: {loc})")
278
+ else:
279
+ output.append(f"- {str(obj)}")
280
+
281
+ return "\n".join(output)
282
 
283
  # =============================================================================
284
  # Main