Ryn11H commited on
Commit
3b082d0
·
1 Parent(s): 615a63b

Final submission

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Text Adventure Agent Submission
3
+ emoji: "\U0001F5FA"
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: "5.12.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # Text Adventure Agent Submission
14
+
15
+ ## Overview
16
+
17
+ This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
18
+
19
+ ## Approach
20
+
21
+
22
+ # My Report (MCP-Based Text Adventure Agent )
23
+ ## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
24
+
25
+ ## Overview
26
+
27
+ This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
28
+
29
+ The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
30
+
31
+ ---
32
+
33
+ ## 1. MCP Server Improvements
34
+
35
+ The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
36
+
37
+ ### 1.1 Robust Location Extraction
38
+
39
+ Instead of relying solely on the first line of the observation, the server now:
40
+
41
+ - Filters out status-like lines (score, moves, headers, bracketed text)
42
+ - Detects likely room titles heuristically
43
+ - Falls back gracefully when uncertain
44
+
45
+ This improves compatibility across different text adventure engines.
46
+
47
+ ---
48
+
49
+ ### 1.2 Structured Memory Output
50
+
51
+ The `memory()` tool was redesigned to provide:
52
+
53
+ - Current game
54
+ - Location
55
+ - Score and moves
56
+ - Extracted visible objects (best-effort heuristics)
57
+ - Mentioned exits
58
+ - Recent action history
59
+ - Full current observation
60
+
61
+ This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
62
+
63
+ ---
64
+
65
+ ### 1.3 Intelligent Map Construction
66
+
67
+ Movement tracking is no longer naive. A move is recorded only if:
68
+
69
+ - The location actually changes, and
70
+ - The observation does not contain known movement failure phrases.
71
+
72
+ This prevents corrupt map edges and keeps spatial reasoning reliable.
73
+
74
+ The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
75
+
76
+ ---
77
+
78
+ ### 1.4 Robust Inventory Handling
79
+
80
+ Inventory retrieval now:
81
+
82
+ - Uses structured state inventory when available
83
+ - Falls back to issuing the `inventory` command
84
+ - Cleans and normalizes item strings
85
+
86
+ This ensures cross-game compatibility.
87
+
88
+ ---
89
+
90
+ ## 2. Agent-Side Stability and Reasoning Enhancements
91
+
92
+ The ReAct loop was significantly extended to address common LLM failure modes.
93
+
94
+ ---
95
+
96
+ ### 2.1 Context Refresh Strategy
97
+
98
+ The agent periodically refreshes:
99
+
100
+ - `memory()` (state grounding)
101
+ - `inventory()` (after item acquisition)
102
+ - `get_map()` (navigation support)
103
+
104
+ This improves decision consistency without consuming extra game moves.
105
+
106
+ ---
107
+
108
+ ### 2.2 Action Validation and Normalization
109
+
110
+ Before execution:
111
+
112
+ - Tool names are validated
113
+ - Invalid verbs are mapped to supported equivalents
114
+ - Formatting noise is removed
115
+ - Actions are normalized to consistent lower-case grammar
116
+
117
+ This dramatically reduces invalid command generation.
118
+
119
+ ---
120
+
121
+ ### 2.3 Multi-Layer Anti-Loop Mechanisms
122
+
123
+ Several defensive layers were introduced:
124
+
125
+ #### (A) Action Repetition Guard
126
+ If the same action appears three times consecutively, the agent forces a reset (`look`).
127
+
128
+ #### (B) Location-Aware Movement Failure Blocking
129
+ Movement attempts are tracked per `(location, direction)` pair.
130
+ If a direction fails multiple times from the same location, it is blocked.
131
+
132
+ #### (C) Thought + Action + Location Blocking
133
+ A normalized thought signature is computed.
134
+ If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
135
+
136
+ This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
137
+
138
+ ---
139
+
140
+ ### 2.4 Controlled Movement Policy
141
+
142
+ The agent avoids random wandering by:
143
+
144
+ - Encouraging local interaction before movement
145
+ - Prioritizing dominant objects in the observation
146
+ - Blocking repeated failed transitions
147
+
148
+ This reduces wasted exploration steps.
149
+
150
+ ---
151
+
152
+ ## 3. Design Philosophy
153
+
154
+ The key improvements are architectural rather than game-specific:
155
+
156
+ - Clear separation between environment (MCP server) and reasoning (LLM agent)
157
+ - Structured state exposure instead of raw narrative text
158
+ - Defensive programming against repetition and invalid behavior
159
+ - Heuristic generalization instead of hardcoded walkthrough logic
160
+
161
+ The system is modular, interpretable, and extensible.
162
+
163
+ ---
164
+
165
+ ## 4. Conclusion
166
+
167
+ Compared to the baseline template, this implementation introduces:
168
+
169
+ - Structured memory representation
170
+ - Robust location extraction
171
+ - Intelligent map tracking
172
+ - Inventory normalization
173
+ - Multi-layer loop prevention
174
+ - Location-aware movement validation
175
+ - Thought-action repetition blocking
176
+ - Controlled exploration policy
177
+
178
+ The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
179
+
180
+ ## Files
181
+
182
+ | File | Description |
183
+ |------|-------------|
184
+ | `agent.py` | ReAct agent with `StudentAgent` class |
185
+ | `mcp_server.py` | MCP server with game interaction tools |
186
+ | `app.py` | Gradio interface for HF Space |
187
+ | `requirements.txt` | Additional dependencies |
188
+
189
+ ## How to Submit
190
+
191
+ 1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
192
+ 2. Clone your fork locally
193
+ 3. Implement your agent in `agent.py` and `mcp_server.py`
194
+ 4. Test locally (see below)
195
+ 5. Push your changes to your Space
196
+ 6. Submit your Space URL on the course platform
197
+
198
+ ## Local Testing
199
+
200
+ ```bash
201
+ # Install dependencies
202
+ pip install -r requirements.txt
203
+
204
+ # Test the MCP server interactively
205
+ fastmcp dev mcp_server.py
206
+
207
+ # Run your agent on a game
208
+ python run_agent.py --agent . --game lostpig -v -n 20
209
+
210
+ # Run evaluation
211
+ python -m evaluation.evaluate -s . -g lostpig -t 3
212
+ ```
.ipynb_checkpoints/agent-checkpoint.py ADDED
@@ -0,0 +1,614 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ : MCP ReAct Agent (adapted for your MCP server)
3
+
4
+ Key upgrades:
5
+ - Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
6
+ - Injects those outputs into the LLM prompt (LLM-friendly context)
7
+ - Updates score from BOTH play_action output and memory output
8
+ - Keeps loop detection + action normalization
9
+ """
10
+
11
+ import json
12
+ import os
13
+ import re
14
+ from dataclasses import dataclass, field
15
+ from typing import Optional
16
+
17
+ from dotenv import load_dotenv
18
+ from huggingface_hub import InferenceClient
19
+
20
+ load_dotenv()
21
+
22
+ # =============================================================================
23
+ # LLM Configuration - DO NOT MODIFY
24
+ # =============================================================================
25
+
26
+ LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
27
+
28
+ _hf_token = os.getenv("HF_TOKEN")
29
+ if not _hf_token:
30
+ raise ValueError("HF_TOKEN not found. Set it in your .env file.")
31
+
32
+ LLM_CLIENT = InferenceClient(token=_hf_token)
33
+
34
+
35
+ def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
36
+ """Call the LLM with the given prompt."""
37
+ messages = [
38
+ {"role": "system", "content": system_prompt},
39
+ {"role": "user", "content": prompt},
40
+ ]
41
+
42
+ response = LLM_CLIENT.chat.completions.create(
43
+ model=LLM_MODEL,
44
+ messages=messages,
45
+ temperature=0.0,
46
+ max_tokens=max_tokens,
47
+ seed=seed,
48
+ )
49
+
50
+ return response.choices[0].message.content
51
+
52
+
53
+ @dataclass
54
+ class RunResult:
55
+ """Result of running the agent. Do not modify this class."""
56
+ final_score: int
57
+ max_score: int
58
+ moves: int
59
+ locations_visited: set[str]
60
+ game_completed: bool
61
+ error: Optional[str] = None
62
+ history: list[tuple[str, str, str]] = field(default_factory=list)
63
+
64
+
65
+ # =============================================================================
66
+ # System Prompt
67
+ # =============================================================================
68
+ SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
69
+
70
+ Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
71
+
72
+ This game is small and objective-focused. Avoid unnecessary wandering.
73
+
74
+ AVAILABLE TOOLS (use via MCP):
75
+ 1. play_action - Execute valid game commands.
76
+ 2. memory - Get structured summary of current state and recent actions.
77
+ 3. get_map - See explored locations.
78
+ 4. inventory - Check carried items.
79
+
80
+ VALID ACTION STYLE:
81
+ Movement:
82
+ - north, south, east, west, up, down
83
+ - n, s, e, w, u, d
84
+
85
+ Core actions:
86
+ - look
87
+ - examine <thing>
88
+ - take <item>, drop <item>
89
+ - open <thing>, close <thing>
90
+ - talk to <character>
91
+ - give <item> to <character>
92
+ - use specific verbs mentioned in observation
93
+
94
+ AVOID:
95
+ - generic verbs like "use"
96
+ - random movement without purpose
97
+ - repeating failed actions
98
+
99
+ --------------------------------------------------
100
+ CORE STRATEGY (IMPORTANT)
101
+ --------------------------------------------------
102
+
103
+ 1) DOMINANT OBJECT RULE (VERY IMPORTANT):
104
+ If a specific object or character is repeatedly mentioned in the observation,
105
+ treat it as the main objective.
106
+
107
+ Do NOT leave the area until you:
108
+ - examine it
109
+ - try multiple meaningful interactions
110
+ - or confirm no new interaction is possible
111
+
112
+ Stay focused before exploring elsewhere.
113
+
114
+ 2) PROBLEM-SOLVING PRIORITY:
115
+ If the game clearly revolves around one main goal,
116
+ prioritize actions that directly affect that goal instead of exploring new rooms.
117
+
118
+ 3) CONTROLLED MOVEMENT:
119
+ Only move if:
120
+ - you have exhausted interactions in the current room
121
+ - or memory/map suggests a new unexplored path is necessary
122
+
123
+ 4) LIMITED RETRIES:
124
+ If an action fails once, try a different verb.
125
+ Do NOT repeat the same failed action more than once.
126
+
127
+ 5) OBJECT TRANSFORMATION FOCUS:
128
+ If an object seems central, try actions that might change its state:
129
+ - examine
130
+ - open
131
+ - give something
132
+ - use appropriate verbs mentioned in text
133
+ - interact from different angles
134
+
135
+ --------------------------------------------------
136
+ TOOL USAGE RULES
137
+ --------------------------------------------------
138
+
139
+ - Use memory() when uncertain or before repeating behavior.
140
+ - Use get_map() only if navigation becomes necessary.
141
+ - Use inventory() after obtaining items.
142
+
143
+ --------------------------------------------------
144
+ OUTPUT FORMAT (STRICT)
145
+ --------------------------------------------------
146
+
147
+ THOUGHT: <brief reasoning>
148
+ TOOL: <tool_name>
149
+ ARGS: <JSON arguments>
150
+
151
+ Keep THOUGHT short (1-2 sentences).
152
+ Do not repeat the same action multiple times.
153
+ Prefer solving over wandering.
154
+ """
155
+
156
+ # =============================================================================
157
+ # Student Agent Implementation
158
+ # =============================================================================
159
+ class StudentAgent:
160
+ """
161
+ MCP ReAct Agent adapted to your MCP server outputs:
162
+ - memory() returns STATE / RECENT / OBSERVATION
163
+ - get_map() returns MAP ...
164
+ - inventory() returns INVENTORY ...
165
+ """
166
+
167
+ def __init__(self):
168
+ self.history: list[dict] = []
169
+ self.recent_actions: list[str] = []
170
+ self.score: int = 0
171
+
172
+ # Cached tool outputs
173
+ self.last_memory: str = ""
174
+ self.last_map: str = ""
175
+ self.last_inventory: str = ""
176
+ self.last_observation: str = ""
177
+
178
+ # Exploration / anti-loop state
179
+ self.visit_counts: dict[str, int] = {}
180
+ self.loc_move_failures: dict[tuple[str, str], int] = {}
181
+ self.pending_move: Optional[tuple[str, str]] = None
182
+
183
+ # NEW: prevent repeating same thought+action at same location
184
+ self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
185
+
186
+ # ------------------------------------------------------------
187
+ # Thought normalization helper
188
+ # ------------------------------------------------------------
189
+ def _thought_sig(self, thought: str) -> str:
190
+ t = (thought or "").lower()
191
+ t = re.sub(r"[^a-z0-9\s]", " ", t)
192
+ t = re.sub(r"\s+", " ", t).strip()
193
+ return " ".join(t.split()[:12])
194
+
195
+ async def run(
196
+ self,
197
+ client,
198
+ game: str,
199
+ max_steps: int,
200
+ seed: int,
201
+ verbose: bool = False,
202
+ ) -> RunResult:
203
+
204
+ locations_visited = set()
205
+ history = []
206
+ moves = 0
207
+
208
+ MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
209
+
210
+ # Available tools
211
+ tools = await client.list_tools()
212
+ tool_names = [t.name for t in tools]
213
+
214
+ # Initial observation
215
+ result = await client.call_tool("play_action", {"action": "look"})
216
+ observation = self._extract_result(result)
217
+ self.last_observation = observation
218
+
219
+ location = observation.split("\n")[0] if observation else "Unknown"
220
+ locations_visited.add(location)
221
+ self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
222
+
223
+ # Prime context (no moves)
224
+ if "memory" in tool_names:
225
+ self.last_memory = self._extract_result(await client.call_tool("memory", {}))
226
+ self._update_score(self.last_memory)
227
+
228
+ if "inventory" in tool_names:
229
+ self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
230
+
231
+ if verbose:
232
+ print(f"\n{observation}")
233
+
234
+ for step in range(1, max_steps + 1):
235
+ await self._refresh_context_tools(client, tool_names, step, verbose)
236
+
237
+ prompt = self._build_prompt()
238
+ response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
239
+ thought, tool_name, tool_args = self._parse_response(response, tool_names)
240
+
241
+ if verbose:
242
+ print(f"\n--- Step {step} ---")
243
+ print(f"[THOUGHT] {thought}")
244
+ print(f"[TOOL] {tool_name}({tool_args})")
245
+
246
+ tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
247
+
248
+ # ------------------------------------------------------------
249
+ # Block SAME (location + action + thought)
250
+ # ------------------------------------------------------------
251
+ if tool_name == "play_action":
252
+ current_loc = (
253
+ self.last_observation.split("\n")[0].strip()
254
+ if self.last_observation else "Unknown"
255
+ )
256
+ action_norm = tool_args.get("action", "look").strip().lower()
257
+ t_sig = self._thought_sig(thought)
258
+
259
+ triple = (current_loc, action_norm, t_sig)
260
+ self.loc_action_thought_counts[triple] = (
261
+ self.loc_action_thought_counts.get(triple, 0) + 1
262
+ )
263
+
264
+ if self.loc_action_thought_counts[triple] >= 2:
265
+ if verbose:
266
+ print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
267
+ if "get_map" in tool_names:
268
+ tool_name, tool_args = "get_map", {}
269
+ elif "memory" in tool_names:
270
+ tool_name, tool_args = "memory", {}
271
+ else:
272
+ tool_name, tool_args = "play_action", {"action": "look"}
273
+
274
+ # ------------------------------------------------------------
275
+ # Loop detection (same action spam)
276
+ # ------------------------------------------------------------
277
+ if tool_name == "play_action":
278
+ action = tool_args.get("action", "look")
279
+ self.recent_actions.append(action)
280
+ if len(self.recent_actions) > 5:
281
+ self.recent_actions = self.recent_actions[-5:]
282
+
283
+ if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
284
+ if verbose:
285
+ print("[WARNING] Loop detected - forcing 'look'")
286
+ tool_args = {"action": "look"}
287
+
288
+ # ------------------------------------------------------------
289
+ # Anti-backtracking: block only FAILED moves
290
+ # ------------------------------------------------------------
291
+ self.pending_move = None
292
+
293
+ if tool_name == "play_action":
294
+ action_norm = tool_args.get("action", "look").strip().lower()
295
+
296
+ if action_norm in MOVE_CMDS:
297
+ current_loc = (
298
+ self.last_observation.split("\n")[0].strip()
299
+ if self.last_observation else "Unknown"
300
+ )
301
+ key = (current_loc, action_norm)
302
+
303
+ if self.loc_move_failures.get(key, 0) >= 2:
304
+ if verbose:
305
+ print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
306
+ if "get_map" in tool_names:
307
+ tool_name, tool_args = "get_map", {}
308
+ elif "memory" in tool_names:
309
+ tool_name, tool_args = "memory", {}
310
+ else:
311
+ tool_name, tool_args = "play_action", {"action": "look"}
312
+ else:
313
+ self.pending_move = (current_loc, action_norm)
314
+
315
+ # ------------------------------------------------------------
316
+ # Count moves
317
+ # ------------------------------------------------------------
318
+ if tool_name == "play_action":
319
+ moves += 1
320
+
321
+ # ------------------------------------------------------------
322
+ # Execute tool
323
+ # ------------------------------------------------------------
324
+ try:
325
+ result = await client.call_tool(tool_name, tool_args)
326
+ out_text = self._extract_result(result)
327
+
328
+ if tool_name == "play_action":
329
+ observation = out_text
330
+ self.last_observation = observation
331
+ elif tool_name == "memory":
332
+ self.last_memory = out_text
333
+ elif tool_name == "get_map":
334
+ self.last_map = out_text
335
+ elif tool_name == "inventory":
336
+ self.last_inventory = out_text
337
+
338
+ if verbose:
339
+ print(f"[RESULT] {out_text[:200]}...")
340
+
341
+ except Exception as e:
342
+ out_text = f"Error: {e}"
343
+ observation = out_text
344
+ self.last_observation = observation
345
+ if verbose:
346
+ print(f"[ERROR] {e}")
347
+
348
+ # ------------------------------------------------------------
349
+ # Post-move update
350
+ # ------------------------------------------------------------
351
+ if tool_name == "play_action":
352
+ new_location = observation.split("\n")[0] if observation else "Unknown"
353
+
354
+ if self.pending_move is not None:
355
+ prev_loc, prev_action = self.pending_move
356
+ key = (prev_loc, prev_action)
357
+
358
+ if new_location == prev_loc:
359
+ self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
360
+ else:
361
+ self.loc_move_failures[key] = 0
362
+
363
+ self.pending_move = None
364
+
365
+ location = new_location
366
+ locations_visited.add(location)
367
+ self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
368
+
369
+ self._update_score(observation)
370
+
371
+ if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
372
+ if "inventory" in tool_names:
373
+ self.last_inventory = self._extract_result(
374
+ await client.call_tool("inventory", {})
375
+ )
376
+
377
+ # ------------------------------------------------------------
378
+ # History
379
+ # ------------------------------------------------------------
380
+ self.history.append({
381
+ "step": step,
382
+ "thought": thought,
383
+ "tool": tool_name,
384
+ "args": tool_args,
385
+ "result": out_text[:200]
386
+ })
387
+ if len(self.history) > 10:
388
+ self.history = self.history[-10:]
389
+
390
+ history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
391
+
392
+ if self._is_game_over(observation):
393
+ if verbose:
394
+ print("\n*** GAME OVER ***")
395
+ break
396
+
397
+ return RunResult(
398
+ final_score=self.score,
399
+ max_score=350,
400
+ moves=moves,
401
+ locations_visited=locations_visited,
402
+ game_completed=self._is_game_over(self.last_observation),
403
+ history=history,
404
+ )
405
+
406
+
407
+ async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
408
+ """
409
+ Pull structured context from MCP server without spending moves.
410
+ Tuned to your server outputs:
411
+ - memory() is the best single summary
412
+ - get_map() helps navigation
413
+ - inventory() helps object planning
414
+ """
415
+ # Memory: often (every 4 steps) so LLM doesn't forget state
416
+ if "memory" in tool_names and (step == 1 or step % 4 == 0):
417
+ try:
418
+ self.last_memory = self._extract_result(await client.call_tool("memory", {}))
419
+ self._update_score(self.last_memory)
420
+ except Exception:
421
+ pass
422
+
423
+ # Map: occasionally (every 6 steps), and also if we moved a lot recently
424
+ if "get_map" in tool_names and (step % 6 == 0):
425
+ try:
426
+ self.last_map = self._extract_result(await client.call_tool("get_map", {}))
427
+ except Exception:
428
+ pass
429
+
430
+ # Inventory: occasionally (every 10 steps)
431
+ if "inventory" in tool_names and (step == 1 or step % 10 == 0):
432
+ try:
433
+ self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
434
+ except Exception:
435
+ pass
436
+
437
+ def _build_prompt(self) -> str:
438
+ """
439
+ Build prompt that is aligned with your MCP server:
440
+ - memory() has STATE/RECENT/OBSERVATION
441
+ - get_map() starts with MAP
442
+ - inventory() starts with INVENTORY
443
+ """
444
+ parts = []
445
+ parts.append(f"Current best-known score: {self.score}")
446
+
447
+ # Give the model your server-side memory snapshot (truncate to keep prompt lean)
448
+ if self.last_memory:
449
+ mem = self._truncate(self.last_memory, 1200)
450
+ parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
451
+
452
+ if self.last_inventory:
453
+ inv = self._truncate(self.last_inventory, 400)
454
+ parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
455
+
456
+ if self.last_map:
457
+ mp = self._truncate(self.last_map, 700)
458
+ parts.append("\n=== MAP (from MCP server) ===\n" + mp)
459
+
460
+ # Recent local history (anti-loop)
461
+ if self.history:
462
+ parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
463
+ for entry in self.history[-3:]:
464
+ action = entry.get("args", {}).get("action", entry["tool"])
465
+ result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
466
+ parts.append(f" > {action} -> {result_short}")
467
+
468
+ if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
469
+ parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
470
+
471
+ # Always include the most recent raw observation
472
+ parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
473
+ parts.append("\nWhat do you do next?")
474
+
475
+ return "\n".join(parts)
476
+
477
+ def _truncate(self, text: str, limit: int) -> str:
478
+ text = text or ""
479
+ if len(text) <= limit:
480
+ return text
481
+ return text[:limit] + "\n...[truncated]"
482
+
483
+ def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
484
+ thought = "No reasoning provided"
485
+ tool_name = "play_action"
486
+ tool_args = {"action": "look"}
487
+
488
+ lines = response.strip().split("\n")
489
+ for line in lines:
490
+ line_clean = line.strip()
491
+ line_upper = line_clean.upper()
492
+
493
+ if line_upper.startswith("THOUGHT:"):
494
+ thought = line_clean.split(":", 1)[1].strip()
495
+
496
+ elif line_upper.startswith("TOOL:"):
497
+ raw_tool = line_clean.split(":", 1)[1].strip().lower()
498
+ raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
499
+ raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
500
+ tool_name = raw_tool
501
+
502
+ elif line_upper.startswith("ARGS:"):
503
+ args_part = line_clean.split(":", 1)[1].strip()
504
+ if not args_part:
505
+ tool_args = {}
506
+ continue
507
+ try:
508
+ args_part = args_part.replace("'", '"')
509
+ tool_args = json.loads(args_part)
510
+ except json.JSONDecodeError:
511
+ match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
512
+ if match:
513
+ tool_args = {"action": match.group(1)}
514
+ else:
515
+ tool_args = {"action": "look"}
516
+
517
+ return thought, tool_name, tool_args
518
+
519
+ def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
520
+
521
+
522
+ if tool_name not in valid_tools:
523
+ if tool_name in ["action", "do", "command"]:
524
+ tool_name = "play_action"
525
+ elif tool_name in ["map", "location"]:
526
+ tool_name = "get_map"
527
+ elif tool_name in ["mem", "state", "status"]:
528
+ tool_name = "memory"
529
+ elif tool_name in ["inv", "items"]:
530
+ tool_name = "inventory"
531
+ else:
532
+ tool_name = "play_action"
533
+
534
+ if tool_name == "play_action":
535
+ action = tool_args.get("action", "look")
536
+
537
+ invalid_verb_map = {
538
+ "check": "examine",
539
+ "inspect": "examine",
540
+ "search": "look",
541
+ "grab": "take",
542
+ "pick": "take",
543
+ "use": "examine",
544
+ "investigate": "examine",
545
+ }
546
+
547
+ words = action.lower().split()
548
+ if words and words[0] in invalid_verb_map:
549
+ words[0] = invalid_verb_map[words[0]]
550
+ action = " ".join(words)
551
+
552
+ action = action.lower().strip()
553
+ action = action.replace("**", "").replace("*", "").replace("`", "")
554
+ action = " ".join(action.split())
555
+
556
+ tool_args["action"] = action
557
+
558
+ return tool_name, tool_args
559
+
560
+ def _extract_result(self, result) -> str:
561
+ if hasattr(result, 'content') and result.content:
562
+ return result.content[0].text
563
+ if isinstance(result, list) and result:
564
+ return result[0].text if hasattr(result[0], 'text') else str(result[0])
565
+ return str(result)
566
+
567
+ def _update_score(self, text: str) -> None:
568
+ patterns = [
569
+ r'\[Score:\s*(\d+)',
570
+ r'Score:\s*(\d+)\b',
571
+ ]
572
+ for pattern in patterns:
573
+ match = re.search(pattern, text, re.IGNORECASE)
574
+ if match:
575
+ self.score = max(self.score, int(match.group(1)))
576
+
577
+ def _is_game_over(self, text: str) -> bool:
578
+ game_over_phrases = [
579
+ "game over",
580
+ "you have died",
581
+ "you are dead",
582
+ "*** you have died ***",
583
+ ]
584
+ text_lower = (text or "").lower()
585
+ return any(phrase in text_lower for phrase in game_over_phrases)
586
+
587
+
588
+ # =============================================================================
589
+ # Local Testing
590
+ # =============================================================================
591
+
592
+ async def test_agent():
593
+ from fastmcp import Client
594
+
595
+ agent = StudentAgent()
596
+
597
+ async with Client("mcp_server.py") as client:
598
+ result = await agent.run(
599
+ client=client,
600
+ game="zork1",
601
+ max_steps=20,
602
+ seed=42,
603
+ verbose=True,
604
+ )
605
+
606
+ print(f"\n{'=' * 50}")
607
+ print(f"Final Score: {result.final_score}")
608
+ print(f"Moves: {result.moves}")
609
+ print(f"Locations: {len(result.locations_visited)}")
610
+
611
+
612
+ if __name__ == "__main__":
613
+ import asyncio
614
+ asyncio.run(test_agent())
.ipynb_checkpoints/app-checkpoint.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Space - Text Adventure Agent Submission
3
+
4
+ This is a code-only Space for submitting your agent implementation.
5
+ The evaluation is run separately.
6
+
7
+ Files in this submission:
8
+ - agent.py: Your ReAct agent implementation
9
+ - mcp_server.py: Your MCP server implementation
10
+ - requirements.txt: Additional dependencies
11
+
12
+ To test locally:
13
+ fastmcp dev mcp_server.py
14
+ python agent.py
15
+ """
16
+
17
+ import gradio as gr
18
+ from pathlib import Path
19
+
20
+ # Create the Gradio interface
21
+ with gr.Blocks(title="Text Adventure Agent Submission") as demo:
22
+ gr.Markdown("# Text Adventure Agent Submission")
23
+ gr.Markdown(
24
+ "This Space contains a template submission for the Text Adventure Agent assignment. "
25
+ )
26
+
27
+ gr.Markdown(
28
+ "---\n"
29
+ "**Note:** This is a code submission Space. "
30
+ "Evaluation is performed using the evaluation script.\n\n"
31
+ "[Back to main assignment page](https://huggingface.co/spaces/LLM-course/Agentic-zork)"
32
+ )
33
+
34
+
35
+ if __name__ == "__main__":
36
+ demo.launch()
.ipynb_checkpoints/mcp_server-checkpoint.py ADDED
@@ -0,0 +1,520 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Student MCP Server for Text Adventure Games
3
+
4
+ This is your MCP server submission. Implement the tools that your agent
5
+ will use to play text adventure games.
6
+
7
+ Required tool:
8
+ play_action(action: str) -> str
9
+ Execute a game command and return the result.
10
+
11
+ Recommended tools:
12
+ memory() -> str
13
+ Return current game state, score, and recent history.
14
+
15
+ inventory() -> str
16
+ Return the player's current inventory.
17
+
18
+ get_map() -> str
19
+ Return a map of explored locations.
20
+
21
+ Test your server with:
22
+ fastmcp dev submission_template/mcp_server.py
23
+
24
+ Then open the MCP Inspector in your browser to test the tools interactively.
25
+ """
26
+
27
+ import sys
28
+ import os
29
+
30
+ # Add parent directory to path to import games module
31
+ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
32
+
33
+ from fastmcp import FastMCP
34
+ from games.zork_env import TextAdventureEnv
35
+
36
+
37
+ # =============================================================================
38
+ # Create the MCP Server
39
+ # =============================================================================
40
+
41
+ mcp = FastMCP("Student Text Adventure Server")
42
+
43
+
44
+ # =============================================================================
45
+ # Game State Management
46
+ # =============================================================================
47
+
48
+ import re
49
+ from typing import Optional
50
+
51
+ class GameManager:
52
+ """
53
+ Manages the text adventure game state.
54
+
55
+ Extended tracking:
56
+ - Action history (for memory tool)
57
+ - Explored locations (for mapping)
58
+ - Current score and moves
59
+ - Current location (best-effort, robust across games)
60
+ """
61
+
62
+ # Lines that are often NOT room titles across many IF games
63
+ _HEADER_LIKE_PATTERNS = [
64
+ r"^\s*score\s*[:=]\s*\d+",
65
+ r"^\s*moves?\s*[:=]\s*\d+",
66
+ r"^\s*turns?\s*[:=]\s*\d+",
67
+ r"^\s*time\s*[:=]\s*",
68
+ r"^\s*health\s*[:=]\s*\d+",
69
+ r"^\s*location\s*[:=]\s*",
70
+ r"^\s*\[.*\]\s*$", # bracket-only status lines
71
+ r"^\s*\(.*\)\s*$", # parenthetical-only lines
72
+ r"^\s*you\s+(are|see|can)\b", # narrative sentence starters
73
+ ]
74
+ # Movement commands we consider for mapping (Zork-style + abbreviations)
75
+ _MOVE_CMDS = {
76
+ "north", "south", "east", "west", "up", "down", "enter", "exit",
77
+ "n", "s", "e", "w", "u", "d"
78
+ }
79
+
80
+ # Common failure phrases when trying to move (best-effort, not perfect)
81
+ _MOVE_FAIL_PHRASES = [
82
+ "you can't go", "you cannot go", "can't go that way", "cannot go that way",
83
+ "you can't go that way", "you cannot go that way",
84
+ "you can't", "you cannot",
85
+ "there is no way", "you can't see any way", "you see no way",
86
+ "blocked", "closed", "won't open", "is locked", "locked",
87
+ "too dark", "pitch black"
88
+ ]
89
+
90
+ def _is_movement_action(self, action: str) -> bool:
91
+ """Return True if this action is a movement command we track."""
92
+ a = (action or "").strip().lower()
93
+ return a in self._MOVE_CMDS
94
+
95
+ def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
96
+ """
97
+ Decide whether a move likely succeeded.
98
+ Strong signal: location label changed.
99
+ Negative signal: failure phrases in observation.
100
+ """
101
+ if new_loc and old_loc and new_loc != old_loc:
102
+ return True
103
+
104
+ text = (observation or "").lower()
105
+ if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
106
+ return False
107
+
108
+ # If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
109
+ return False
110
+
111
+ def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
112
+ """Record a directed edge old_loc --action--> new_loc in explored_locations."""
113
+ if not old_loc or not new_loc:
114
+ return
115
+ self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
116
+
117
+
118
+ def __init__(self):
119
+ self.env: TextAdventureEnv = None
120
+ self.state = None
121
+ self.game_name: str = ""
122
+
123
+ # Tracking for agent-support tools
124
+ self.history: list[tuple[str, str]] = []
125
+ self.explored_locations: dict[str, set[str]] = {}
126
+ self.current_location: str = "Unknown"
127
+
128
+ def initialize(self, game: str = "zork1"):
129
+ """Initialize or reset the game."""
130
+ self.game_name = game
131
+ self.env = TextAdventureEnv(game)
132
+ self.state = self.env.reset()
133
+
134
+ # Reset tracking
135
+ self.history = []
136
+ self.explored_locations = {}
137
+ self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
138
+
139
+ return self.state.observation
140
+
141
+ def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
142
+ """
143
+ Best-effort location extraction from the observation text.
144
+
145
+ Strategy:
146
+ 1) Split into lines, skip empties
147
+ 2) Skip lines that look like status bars / headers / pure brackets
148
+ 3) Prefer a short, title-like line (room name)
149
+ 4) If nothing confident, return fallback (usually previous location)
150
+ """
151
+ if not observation:
152
+ return fallback or "Unknown"
153
+
154
+ lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
155
+ if not lines:
156
+ return fallback or "Unknown"
157
+
158
+ header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
159
+
160
+ def looks_like_header(line: str) -> bool:
161
+ return any(rx.search(line) for rx in header_res)
162
+
163
+ def looks_like_title(line: str) -> bool:
164
+ # Many room titles are short and not ending with punctuation.
165
+ if len(line) > 60:
166
+ return False
167
+ if line.endswith((".", "!", "?", ";", ":")):
168
+ return False
169
+ # Too many digits usually means a status line.
170
+ if sum(ch.isdigit() for ch in line) >= 3:
171
+ return False
172
+ return True
173
+
174
+ # First pass: first "title-like" line that isn't header-like
175
+ for line in lines[:8]: # only inspect top chunk; titles are usually early
176
+ if looks_like_header(line):
177
+ continue
178
+ if looks_like_title(line):
179
+ return line
180
+
181
+ # Second pass: first non-header line
182
+ for line in lines[:8]:
183
+ if not looks_like_header(line):
184
+ return line
185
+
186
+ return fallback or "Unknown"
187
+
188
+ def step(self, action: str) -> str:
189
+ """Execute an action and return the result."""
190
+ if self.env is None:
191
+ self.initialize()
192
+
193
+ # Save old location before action
194
+ old_location = self.current_location
195
+
196
+ # Apply action to the real game
197
+ self.state = self.env.step(action)
198
+ obs = self.state.observation
199
+
200
+ # Track history (keep last 50)
201
+ self.history.append((action, obs))
202
+ if len(self.history) > 50:
203
+ self.history = self.history[-50:]
204
+
205
+ # Extract new location (fallback to old)
206
+ new_location = self._extract_location(obs, fallback=old_location)
207
+
208
+ # Update map only if it was a movement attempt AND it likely succeeded
209
+ action_norm = (action or "").strip().lower()
210
+ if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
211
+ self._update_map(action_norm, old_location, new_location)
212
+
213
+ # Finally update current location
214
+ self.current_location = new_location
215
+
216
+ return obs
217
+
218
+
219
+ def get_score(self) -> int:
220
+ """Get current score."""
221
+ return self.state.score if self.state else 0
222
+
223
+ def get_moves(self) -> int:
224
+ """Get number of moves taken."""
225
+ return self.state.moves if self.state else 0
226
+ def _extract_facts(self, observation: str) -> dict:
227
+ """
228
+ Best-effort extraction of useful 'facts' from the current observation text.
229
+ This is intentionally heuristic so it can work across many games.
230
+ """
231
+ obs = observation or ""
232
+ text = obs.strip()
233
+ lower = text.lower()
234
+
235
+ # --- Exits mentioned (simple direction scan) ---
236
+ directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
237
+ exits_found = []
238
+ for d in directions:
239
+ # We detect directions as whole words to reduce false matches
240
+ if re.search(rf"\b{re.escape(d)}\b", lower):
241
+ exits_found.append(d)
242
+ exits_found = sorted(set(exits_found))
243
+
244
+ # --- Visible things (very light heuristics) ---
245
+ # We look for common IF patterns like "You see ... here." / "There is ... here."
246
+ visible_candidates: list[str] = []
247
+
248
+ patterns = [
249
+ r"you see (.+?) here\.",
250
+ r"you can see (.+?) here\.",
251
+ r"there is (.+?) here\.",
252
+ r"there are (.+?) here\.",
253
+ r"you notice (.+?)\.",
254
+ ]
255
+ for pat in patterns:
256
+ for m in re.finditer(pat, lower):
257
+ chunk = m.group(1).strip()
258
+ if chunk:
259
+ visible_candidates.append(chunk)
260
+
261
+ # Clean visible candidates a bit (split simple lists, avoid huge strings)
262
+ visible = []
263
+ for chunk in visible_candidates:
264
+ # Split on commas and "and" to get smaller pieces
265
+ parts = re.split(r",|\band\b", chunk)
266
+ for p in parts:
267
+ item = p.strip(" .;:!?\t")
268
+ if 1 <= len(item) <= 40:
269
+ visible.append(item)
270
+
271
+ # Deduplicate and limit (so memory stays compact)
272
+ visible = sorted(set(visible))[:10]
273
+
274
+ return {
275
+ "exits_mentioned": exits_found,
276
+ "visible": visible,
277
+ }
278
+
279
+ def get_memory(self) -> str:
280
+ """
281
+ LLM-friendly summary of current game state.
282
+ Format: Facts first, then recent actions, then the raw observation.
283
+ """
284
+ game = self.game_name or "Unknown"
285
+ location = self.current_location or "Unknown"
286
+ score = self.get_score()
287
+ moves = self.get_moves()
288
+
289
+ # Recent actions (keep short and anti-loop)
290
+ recent = self.history[-5:] if self.history else []
291
+ if recent:
292
+ recent_lines = []
293
+ for a, r in recent:
294
+ snippet = (r or "").replace("\n", " ").strip()
295
+ if len(snippet) > 80:
296
+ snippet = snippet[:80] + "..."
297
+ recent_lines.append(f"- {a} -> {snippet}")
298
+ recent_str = "\n".join(recent_lines)
299
+ else:
300
+ recent_str = "(none yet)"
301
+
302
+ # Facts extracted from current observation
303
+ obs = self.state.observation if self.state else ""
304
+ facts = self._extract_facts(obs)
305
+
306
+ exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
307
+ visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
308
+
309
+ return (
310
+ "STATE\n"
311
+ f"Game: {game}\n"
312
+ f"Location: {location}\n"
313
+ f"Score: {score} Moves: {moves}\n"
314
+ f"Visible (best effort): {visible_txt}\n"
315
+ f"Exits mentioned (best effort): {exits_txt}\n"
316
+ "\n"
317
+ "RECENT\n"
318
+ f"{recent_str}\n"
319
+ "\n"
320
+ "OBSERVATION\n"
321
+ f"{obs}"
322
+ )
323
+ def get_map(self) -> str:
324
+ """
325
+ Return a readable map of explored locations.
326
+ Uses explored_locations built during movement actions.
327
+
328
+ Output is stable + compact for LLM use.
329
+ """
330
+ if not self.explored_locations:
331
+ return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
332
+
333
+ lines = ["MAP", "Explored locations and exits:"]
334
+ for loc in sorted(self.explored_locations.keys()):
335
+ exits = sorted(self.explored_locations[loc])
336
+ lines.append(f"\n* {loc}")
337
+ for e in exits:
338
+ lines.append(f" - {e}")
339
+
340
+ lines.append(f"\n[Current] {self.current_location}")
341
+ return "\n".join(lines)
342
+ def get_inventory(self) -> str:
343
+ """
344
+ Return inventory in a robust way across different games/envs.
345
+
346
+ Strategy:
347
+ 1) If state.inventory exists and is non-empty -> format it
348
+ 2) Otherwise, fall back to issuing the command "inventory"
349
+ through the environment and return that observation
350
+ """
351
+ # 1) Try structured inventory if provided by env
352
+ items = []
353
+ if self.state is not None and hasattr(self.state, "inventory"):
354
+ inv = getattr(self.state, "inventory")
355
+ if inv:
356
+ # Normalize to strings
357
+ try:
358
+ items = [str(x).strip() for x in inv if str(x).strip()]
359
+ except Exception:
360
+ items = []
361
+
362
+ if items:
363
+ # Keep it simple and safe: just join a cleaned list
364
+ # (Avoid overly aggressive parsing that breaks across games)
365
+ items = sorted(set(items))
366
+ return "INVENTORY\n" + ", ".join(items)
367
+
368
+ # 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
369
+ # NOTE: We do not want to record this as agent history/map; this is a server-side query.
370
+ if self.env is None:
371
+ self.initialize()
372
+
373
+ try:
374
+ tmp_state = self.env.step("inventory")
375
+ inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
376
+ except Exception:
377
+ inv_text = "Inventory: (unable to retrieve)"
378
+
379
+ return "INVENTORY\n" + inv_text.strip()
380
+
381
+
382
+ # Global game manager
383
+ _game = GameManager()
384
+
385
+
386
+ def get_game() -> GameManager:
387
+ """Get or initialize the game manager."""
388
+ global _game
389
+ if _game.env is None:
390
+ # Get game from environment variable (set by evaluator)
391
+ game = os.environ.get("GAME", "zork1")
392
+ _game.initialize(game)
393
+ return _game
394
+
395
+
396
+ # =============================================================================
397
+ # MCP Tools - IMPLEMENT THESE
398
+ # =============================================================================
399
+
400
+ @mcp.tool()
401
+ def play_action(action: str) -> str:
402
+ """
403
+ Execute a game command and return the result.
404
+
405
+ This is the main tool for interacting with the game.
406
+
407
+ Args:
408
+ action: The command to execute (e.g., "north", "take lamp", "open mailbox")
409
+
410
+ Returns:
411
+ The game's response to the action
412
+
413
+ Valid commands include:
414
+ - Movement: north, south, east, west, up, down, enter, exit
415
+ - Objects: take <item>, drop <item>, open <thing>, examine <thing>
416
+ - Other: look, inventory, read <thing>, turn on lamp
417
+ """
418
+ game = get_game()
419
+
420
+ # TODO: You might want to add action validation here
421
+ # TODO: You might want to include score changes in the response
422
+
423
+ result = game.step(action)
424
+
425
+ # Append score/moves for clearer feedback (LLM-friendly, low noise)
426
+ result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
427
+ return result
428
+
429
+ # Optional: Append score info
430
+ # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
431
+
432
+
433
+ @mcp.tool()
434
+ def memory() -> str:
435
+ """
436
+ Return an LLM-friendly summary of the current game state.
437
+ """
438
+ game = get_game()
439
+ return game.get_memory()
440
+ @mcp.tool()
441
+ def get_map() -> str:
442
+ """
443
+ Return a map of explored locations and recorded exits.
444
+ """
445
+ game = get_game()
446
+ return game.get_map()
447
+
448
+ @mcp.tool()
449
+ def inventory() -> str:
450
+ """
451
+ Return the player's inventory in a robust way.
452
+ """
453
+ game = get_game()
454
+ return game.get_inventory()
455
+
456
+
457
+ # TODO: Implement additional tools to help your agent
458
+
459
+ # @mcp.tool()
460
+ # def memory() -> str:
461
+ # """
462
+ # Get the current game state summary.
463
+ #
464
+ # Returns:
465
+ # A summary including current location, score, moves, and recent history
466
+ # """
467
+ # game = get_game()
468
+ # # TODO: Return useful state information
469
+ # pass
470
+
471
+
472
+ # @mcp.tool()
473
+ # def inventory() -> str:
474
+ # """
475
+ # Check what the player is carrying.
476
+ #
477
+ # Returns:
478
+ # List of items in the player's inventory
479
+ # """
480
+ # game = get_game()
481
+ # result = game.step("inventory")
482
+ # return result
483
+
484
+
485
+ # @mcp.tool()
486
+ # def get_map() -> str:
487
+ # """
488
+ # Get a map of explored locations.
489
+ #
490
+ # Returns:
491
+ # A text representation of explored locations and connections
492
+ # """
493
+ # game = get_game()
494
+ # # TODO: Return map of explored locations
495
+ # pass
496
+
497
+
498
+ # @mcp.tool()
499
+ # def get_valid_actions() -> str:
500
+ # """
501
+ # Get a list of likely valid actions from the current location.
502
+ #
503
+ # Returns:
504
+ # List of actions that might work here
505
+ # """
506
+ # # This is a hint: Jericho provides get_valid_actions()
507
+ # game = get_game()
508
+ # if game.env and game.env.env:
509
+ # valid = game.env.env.get_valid_actions()
510
+ # return "Valid actions: " + ", ".join(valid[:20])
511
+ # return "Could not determine valid actions"
512
+
513
+
514
+ # =============================================================================
515
+ # Run the server
516
+ # =============================================================================
517
+
518
+ if __name__ == "__main__":
519
+ # This runs the server with stdio transport (for MCP clients)
520
+ mcp.run()
README.md CHANGED
@@ -18,11 +18,164 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
18
 
19
  ## Approach
20
 
21
- <!-- Describe your approach here -->
22
 
23
- - What strategy does your agent use?
24
- - What tools did you implement in your MCP server?
25
- - Any interesting techniques or optimizations?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Files
28
 
 
18
 
19
  ## Approach
20
 
 
21
 
22
+ # My Report (MCP-Based Text Adventure Agent )
23
+ ## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
24
+
25
+ ## Overview
26
+
27
+ This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
28
+
29
+ The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
30
+
31
+ ---
32
+
33
+ ## 1. MCP Server Improvements
34
+
35
+ The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
36
+
37
+ ### 1.1 Robust Location Extraction
38
+
39
+ Instead of relying solely on the first line of the observation, the server now:
40
+
41
+ - Filters out status-like lines (score, moves, headers, bracketed text)
42
+ - Detects likely room titles heuristically
43
+ - Falls back gracefully when uncertain
44
+
45
+ This improves compatibility across different text adventure engines.
46
+
47
+ ---
48
+
49
+ ### 1.2 Structured Memory Output
50
+
51
+ The `memory()` tool was redesigned to provide:
52
+
53
+ - Current game
54
+ - Location
55
+ - Score and moves
56
+ - Extracted visible objects (best-effort heuristics)
57
+ - Mentioned exits
58
+ - Recent action history
59
+ - Full current observation
60
+
61
+ This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
62
+
63
+ ---
64
+
65
+ ### 1.3 Intelligent Map Construction
66
+
67
+ Movement tracking is no longer naive. A move is recorded only if:
68
+
69
+ - The location actually changes, and
70
+ - The observation does not contain known movement failure phrases.
71
+
72
+ This prevents corrupt map edges and keeps spatial reasoning reliable.
73
+
74
+ The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
75
+
76
+ ---
77
+
78
+ ### 1.4 Robust Inventory Handling
79
+
80
+ Inventory retrieval now:
81
+
82
+ - Uses structured state inventory when available
83
+ - Falls back to issuing the `inventory` command
84
+ - Cleans and normalizes item strings
85
+
86
+ This ensures cross-game compatibility.
87
+
88
+ ---
89
+
90
+ ## 2. Agent-Side Stability and Reasoning Enhancements
91
+
92
+ The ReAct loop was significantly extended to address common LLM failure modes.
93
+
94
+ ---
95
+
96
+ ### 2.1 Context Refresh Strategy
97
+
98
+ The agent periodically refreshes:
99
+
100
+ - `memory()` (state grounding)
101
+ - `inventory()` (after item acquisition)
102
+ - `get_map()` (navigation support)
103
+
104
+ This improves decision consistency without consuming extra game moves.
105
+
106
+ ---
107
+
108
+ ### 2.2 Action Validation and Normalization
109
+
110
+ Before execution:
111
+
112
+ - Tool names are validated
113
+ - Invalid verbs are mapped to supported equivalents
114
+ - Formatting noise is removed
115
+ - Actions are normalized to consistent lower-case grammar
116
+
117
+ This dramatically reduces invalid command generation.
118
+
119
+ ---
120
+
121
+ ### 2.3 Multi-Layer Anti-Loop Mechanisms
122
+
123
+ Several defensive layers were introduced:
124
+
125
+ #### (A) Action Repetition Guard
126
+ If the same action appears three times consecutively, the agent forces a reset (`look`).
127
+
128
+ #### (B) Location-Aware Movement Failure Blocking
129
+ Movement attempts are tracked per `(location, direction)` pair.
130
+ If a direction fails multiple times from the same location, it is blocked.
131
+
132
+ #### (C) Thought + Action + Location Blocking
133
+ A normalized thought signature is computed.
134
+ If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
135
+
136
+ This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
137
+
138
+ ---
139
+
140
+ ### 2.4 Controlled Movement Policy
141
+
142
+ The agent avoids random wandering by:
143
+
144
+ - Encouraging local interaction before movement
145
+ - Prioritizing dominant objects in the observation
146
+ - Blocking repeated failed transitions
147
+
148
+ This reduces wasted exploration steps.
149
+
150
+ ---
151
+
152
+ ## 3. Design Philosophy
153
+
154
+ The key improvements are architectural rather than game-specific:
155
+
156
+ - Clear separation between environment (MCP server) and reasoning (LLM agent)
157
+ - Structured state exposure instead of raw narrative text
158
+ - Defensive programming against repetition and invalid behavior
159
+ - Heuristic generalization instead of hardcoded walkthrough logic
160
+
161
+ The system is modular, interpretable, and extensible.
162
+
163
+ ---
164
+
165
+ ## 4. Conclusion
166
+
167
+ Compared to the baseline template, this implementation introduces:
168
+
169
+ - Structured memory representation
170
+ - Robust location extraction
171
+ - Intelligent map tracking
172
+ - Inventory normalization
173
+ - Multi-layer loop prevention
174
+ - Location-aware movement validation
175
+ - Thought-action repetition blocking
176
+ - Controlled exploration policy
177
+
178
+ The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
179
 
180
  ## Files
181
 
agent.py CHANGED
@@ -1,26 +1,11 @@
1
  """
2
- Student Agent for Text Adventure Games
3
 
4
- This is your submission file. Implement the StudentAgent class to play
5
- text adventure games using the MCP server you also implement.
6
-
7
- Your agent should:
8
- 1. Connect to the MCP server via the provided client
9
- 2. Use the ReAct pattern (Thought -> Action -> Observation)
10
- 3. Call MCP tools to interact with the game
11
- 4. Maximize the game score within the step limit
12
-
13
- Required method:
14
- async def run(self, client, game, max_steps, seed, verbose) -> RunResult
15
-
16
- The 'client' is a FastMCP Client already connected to your MCP server.
17
- Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
18
-
19
- Tips:
20
- - Start by looking around and understanding your environment
21
- - Keep track of visited locations to avoid loops
22
- - Pick up useful items (lamp, sword, etc.)
23
- - The seed parameter should be used to set your LLM's seed for reproducibility
24
  """
25
 
26
  import json
@@ -32,79 +17,32 @@ from typing import Optional
32
  from dotenv import load_dotenv
33
  from huggingface_hub import InferenceClient
34
 
35
- # Load environment variables
36
  load_dotenv()
37
 
38
- # Set USE_LOCAL_MODEL=1 in your .env to use a locally downloaded model
39
- USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "0").strip() in ("1", "true", "yes")
40
- LOCAL_MODEL_ID = os.getenv("LOCAL_MODEL_ID", "Qwen/Qwen2.5-3B-Instruct")
41
-
42
  # =============================================================================
43
  # LLM Configuration - DO NOT MODIFY
44
  # =============================================================================
45
 
46
- # Model to use (fixed for fair evaluation)
47
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
48
 
49
- # Initialize the LLM client based on mode
50
- _local_pipeline = None
51
-
52
- if USE_LOCAL_MODEL:
53
- import torch
54
- from transformers import pipeline as _hf_pipeline
55
 
56
- _local_pipeline = _hf_pipeline(
57
- "text-generation",
58
- model=LOCAL_MODEL_ID,
59
- torch_dtype=torch.bfloat16,
60
- device_map="auto",
61
- )
62
- LLM_CLIENT = None
63
- else:
64
- _hf_token = os.getenv("HF_TOKEN")
65
- if not _hf_token:
66
- raise ValueError("HF_TOKEN not found. Set it in your .env file.")
67
- LLM_CLIENT = InferenceClient(token=_hf_token)
68
 
69
 
70
  def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
71
- """
72
- Call the LLM with the given prompt. Use this function in your agent.
73
-
74
- Args:
75
- prompt: The user prompt (current game state, history, etc.)
76
- system_prompt: The system prompt (instructions for the agent)
77
- seed: Random seed for reproducibility
78
- max_tokens: Maximum tokens in response (default: 300)
79
-
80
- Returns:
81
- The LLM's response text
82
-
83
- Example:
84
- response = call_llm(
85
- prompt="You are in a forest. What do you do?",
86
- system_prompt=SYSTEM_PROMPT,
87
- seed=42,
88
- )
89
- """
90
  messages = [
91
  {"role": "system", "content": system_prompt},
92
  {"role": "user", "content": prompt},
93
  ]
94
 
95
- if USE_LOCAL_MODEL and _local_pipeline is not None:
96
- outputs = _local_pipeline(
97
- messages,
98
- max_new_tokens=max_tokens,
99
- temperature=0.0001, # Near-deterministic (0.0 unsupported by some backends)
100
- do_sample=True,
101
- )
102
- return outputs[0]["generated_text"][-1]["content"]
103
-
104
  response = LLM_CLIENT.chat.completions.create(
105
  model=LLM_MODEL,
106
  messages=messages,
107
- temperature=0.0, # Deterministic for reproducibility
108
  max_tokens=max_tokens,
109
  seed=seed,
110
  )
@@ -125,179 +63,550 @@ class RunResult:
125
 
126
 
127
  # =============================================================================
128
- # System Prompt - Customize this for your agent
129
  # =============================================================================
 
130
 
131
- SYSTEM_PROMPT = """You are playing a classic text adventure game.
132
 
133
- GOAL: Explore the world, solve puzzles, and maximize your score.
134
 
135
  AVAILABLE TOOLS (use via MCP):
136
- - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
137
- - memory: Get current game state and history (if implemented)
138
- - inventory: Check what you're carrying (if implemented)
139
-
140
- VALID GAME COMMANDS for play_action:
141
- - Movement: north, south, east, west, up, down, enter, exit
142
- - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
143
- - Other: look, inventory, read <thing>, turn on lamp
144
-
145
- RESPOND IN THIS EXACT FORMAT (no markdown):
146
- THOUGHT: <your reasoning about what to do next>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  TOOL: <tool_name>
148
- ARGS: <JSON arguments, e.g., {"action": "look"}>
149
 
150
- Example:
151
- THOUGHT: I should look around to see where I am.
152
- TOOL: play_action
153
- ARGS: {"action": "look"}
154
  """
155
 
156
-
157
  # =============================================================================
158
- # Student Agent - IMPLEMENT THIS CLASS
159
  # =============================================================================
160
-
161
  class StudentAgent:
162
  """
163
- Your ReAct agent implementation.
164
-
165
- TODO:
166
- 1. Implement the run() method with the ReAct loop
167
- 2. Parse LLM responses to extract tool calls
168
- 3. Track state and avoid loops
169
-
170
- Use the provided call_llm() function to interact with the LLM.
171
  """
172
-
173
  def __init__(self):
174
- """Initialize your agent here."""
175
- # TODO: Initialize any state tracking you need
176
- # self.history = []
177
- # self.visited_locations = set()
178
- pass
179
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
  async def run(
181
  self,
182
- client, # FastMCP Client connected to your MCP server
183
  game: str,
184
  max_steps: int,
185
  seed: int,
186
  verbose: bool = False,
187
  ) -> RunResult:
188
- """
189
- Run the agent for a game session.
190
-
191
- Args:
192
- client: FastMCP Client connected to your MCP server
193
- game: Name of the game being played (e.g., "zork1")
194
- max_steps: Maximum number of steps to take
195
- seed: Random seed for reproducibility (use for LLM calls)
196
- verbose: Whether to print detailed output
197
-
198
- Returns:
199
- RunResult with final score and statistics
200
- """
201
- # TODO: Implement your ReAct loop here
202
- #
203
- # Basic structure:
204
- # 1. Get initial observation (call play_action with "look")
205
- # 2. Loop for max_steps:
206
- # a. Build prompt with current observation and history
207
- # b. Call LLM to get thought and action
208
- # c. Parse the response to extract tool and args
209
- # d. Call the tool via client.call_tool(tool_name, args)
210
- # e. Update history and state
211
- # f. Check for game over
212
- # 3. Return RunResult with final statistics
213
-
214
- # Example of calling a tool:
215
- # result = await client.call_tool("play_action", {"action": "look"})
216
- # observation = result[0].text if result else "No response"
217
-
218
- # Example of calling the LLM:
219
- # response = call_llm(
220
- # prompt="Current observation: " + observation,
221
- # system_prompt=SYSTEM_PROMPT,
222
- # seed=seed,
223
- # )
224
-
225
- # Placeholder implementation - replace with your code
226
  locations_visited = set()
227
  history = []
228
- final_score = 0
229
  moves = 0
230
-
231
- # TODO: Your implementation here
232
- # ...
233
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  return RunResult(
235
- final_score=final_score,
236
- max_score=350, # Zork1 max score, adjust if needed
237
  moves=moves,
238
  locations_visited=locations_visited,
239
- game_completed=False,
240
  history=history,
241
  )
242
-
243
- def _build_prompt(self, observation: str, history: list) -> str:
244
- """
245
- Build the prompt for the LLM.
246
-
247
- TODO: Implement this to create effective prompts
248
- """
249
- # TODO: Combine system prompt, history, and current observation
250
- pass
251
-
252
- def _parse_response(self, response: str) -> tuple[str, str, dict]:
253
  """
254
- Parse LLM response to extract thought, tool name, and arguments.
255
-
256
- TODO: Implement robust parsing
257
-
258
- Returns:
259
- Tuple of (thought, tool_name, args_dict)
260
  """
261
- # TODO: Parse the response format:
262
- # THOUGHT: ...
263
- # TOOL: ...
264
- # ARGS: {...}
265
- pass
266
-
267
- def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
268
  """
269
- Call the LLM with the given prompt.
270
-
271
- This is a convenience wrapper - you can also use call_llm() directly.
 
272
  """
273
- return call_llm(prompt, system_prompt, seed)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
274
 
275
 
276
  # =============================================================================
277
- # For local testing
278
  # =============================================================================
279
 
280
  async def test_agent():
281
- """Test the agent locally."""
282
  from fastmcp import Client
283
-
284
- # Path to your MCP server
285
- server_path = "mcp_server.py"
286
-
287
  agent = StudentAgent()
288
-
289
- async with Client(server_path) as client:
290
  result = await agent.run(
291
  client=client,
292
  game="zork1",
293
- max_steps=10,
294
  seed=42,
295
  verbose=True,
296
  )
297
-
298
- print(f"\nFinal Score: {result.final_score}")
 
299
  print(f"Moves: {result.moves}")
300
- print(f"Locations: {result.locations_visited}")
301
 
302
 
303
  if __name__ == "__main__":
 
1
  """
2
+ : MCP ReAct Agent (adapted for your MCP server)
3
 
4
+ Key upgrades:
5
+ - Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
6
+ - Injects those outputs into the LLM prompt (LLM-friendly context)
7
+ - Updates score from BOTH play_action output and memory output
8
+ - Keeps loop detection + action normalization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  """
10
 
11
  import json
 
17
  from dotenv import load_dotenv
18
  from huggingface_hub import InferenceClient
19
 
 
20
  load_dotenv()
21
 
 
 
 
 
22
  # =============================================================================
23
  # LLM Configuration - DO NOT MODIFY
24
  # =============================================================================
25
 
 
26
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
27
 
28
+ _hf_token = os.getenv("HF_TOKEN")
29
+ if not _hf_token:
30
+ raise ValueError("HF_TOKEN not found. Set it in your .env file.")
 
 
 
31
 
32
+ LLM_CLIENT = InferenceClient(token=_hf_token)
 
 
 
 
 
 
 
 
 
 
 
33
 
34
 
35
  def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
36
+ """Call the LLM with the given prompt."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  messages = [
38
  {"role": "system", "content": system_prompt},
39
  {"role": "user", "content": prompt},
40
  ]
41
 
 
 
 
 
 
 
 
 
 
42
  response = LLM_CLIENT.chat.completions.create(
43
  model=LLM_MODEL,
44
  messages=messages,
45
+ temperature=0.0,
46
  max_tokens=max_tokens,
47
  seed=seed,
48
  )
 
63
 
64
 
65
  # =============================================================================
66
+ # System Prompt
67
  # =============================================================================
68
+ SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
69
 
70
+ Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
71
 
72
+ This game is small and objective-focused. Avoid unnecessary wandering.
73
 
74
  AVAILABLE TOOLS (use via MCP):
75
+ 1. play_action - Execute valid game commands.
76
+ 2. memory - Get structured summary of current state and recent actions.
77
+ 3. get_map - See explored locations.
78
+ 4. inventory - Check carried items.
79
+
80
+ VALID ACTION STYLE:
81
+ Movement:
82
+ - north, south, east, west, up, down
83
+ - n, s, e, w, u, d
84
+
85
+ Core actions:
86
+ - look
87
+ - examine <thing>
88
+ - take <item>, drop <item>
89
+ - open <thing>, close <thing>
90
+ - talk to <character>
91
+ - give <item> to <character>
92
+ - use specific verbs mentioned in observation
93
+
94
+ AVOID:
95
+ - generic verbs like "use"
96
+ - random movement without purpose
97
+ - repeating failed actions
98
+
99
+ --------------------------------------------------
100
+ CORE STRATEGY (IMPORTANT)
101
+ --------------------------------------------------
102
+
103
+ 1) DOMINANT OBJECT RULE (VERY IMPORTANT):
104
+ If a specific object or character is repeatedly mentioned in the observation,
105
+ treat it as the main objective.
106
+
107
+ Do NOT leave the area until you:
108
+ - examine it
109
+ - try multiple meaningful interactions
110
+ - or confirm no new interaction is possible
111
+
112
+ Stay focused before exploring elsewhere.
113
+
114
+ 2) PROBLEM-SOLVING PRIORITY:
115
+ If the game clearly revolves around one main goal,
116
+ prioritize actions that directly affect that goal instead of exploring new rooms.
117
+
118
+ 3) CONTROLLED MOVEMENT:
119
+ Only move if:
120
+ - you have exhausted interactions in the current room
121
+ - or memory/map suggests a new unexplored path is necessary
122
+
123
+ 4) LIMITED RETRIES:
124
+ If an action fails once, try a different verb.
125
+ Do NOT repeat the same failed action more than once.
126
+
127
+ 5) OBJECT TRANSFORMATION FOCUS:
128
+ If an object seems central, try actions that might change its state:
129
+ - examine
130
+ - open
131
+ - give something
132
+ - use appropriate verbs mentioned in text
133
+ - interact from different angles
134
+
135
+ --------------------------------------------------
136
+ TOOL USAGE RULES
137
+ --------------------------------------------------
138
+
139
+ - Use memory() when uncertain or before repeating behavior.
140
+ - Use get_map() only if navigation becomes necessary.
141
+ - Use inventory() after obtaining items.
142
+
143
+ --------------------------------------------------
144
+ OUTPUT FORMAT (STRICT)
145
+ --------------------------------------------------
146
+
147
+ THOUGHT: <brief reasoning>
148
  TOOL: <tool_name>
149
+ ARGS: <JSON arguments>
150
 
151
+ Keep THOUGHT short (1-2 sentences).
152
+ Do not repeat the same action multiple times.
153
+ Prefer solving over wandering.
 
154
  """
155
 
 
156
  # =============================================================================
157
+ # Student Agent Implementation
158
  # =============================================================================
 
159
  class StudentAgent:
160
  """
161
+ MCP ReAct Agent adapted to your MCP server outputs:
162
+ - memory() returns STATE / RECENT / OBSERVATION
163
+ - get_map() returns MAP ...
164
+ - inventory() returns INVENTORY ...
 
 
 
 
165
  """
166
+
167
  def __init__(self):
168
+ self.history: list[dict] = []
169
+ self.recent_actions: list[str] = []
170
+ self.score: int = 0
171
+
172
+ # Cached tool outputs
173
+ self.last_memory: str = ""
174
+ self.last_map: str = ""
175
+ self.last_inventory: str = ""
176
+ self.last_observation: str = ""
177
+
178
+ # Exploration / anti-loop state
179
+ self.visit_counts: dict[str, int] = {}
180
+ self.loc_move_failures: dict[tuple[str, str], int] = {}
181
+ self.pending_move: Optional[tuple[str, str]] = None
182
+
183
+ # NEW: prevent repeating same thought+action at same location
184
+ self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
185
+
186
+ # ------------------------------------------------------------
187
+ # Thought normalization helper
188
+ # ------------------------------------------------------------
189
+ def _thought_sig(self, thought: str) -> str:
190
+ t = (thought or "").lower()
191
+ t = re.sub(r"[^a-z0-9\s]", " ", t)
192
+ t = re.sub(r"\s+", " ", t).strip()
193
+ return " ".join(t.split()[:12])
194
+
195
  async def run(
196
  self,
197
+ client,
198
  game: str,
199
  max_steps: int,
200
  seed: int,
201
  verbose: bool = False,
202
  ) -> RunResult:
203
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  locations_visited = set()
205
  history = []
 
206
  moves = 0
207
+
208
+ MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
209
+
210
+ # Available tools
211
+ tools = await client.list_tools()
212
+ tool_names = [t.name for t in tools]
213
+
214
+ # Initial observation
215
+ result = await client.call_tool("play_action", {"action": "look"})
216
+ observation = self._extract_result(result)
217
+ self.last_observation = observation
218
+
219
+ location = observation.split("\n")[0] if observation else "Unknown"
220
+ locations_visited.add(location)
221
+ self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
222
+
223
+ # Prime context (no moves)
224
+ if "memory" in tool_names:
225
+ self.last_memory = self._extract_result(await client.call_tool("memory", {}))
226
+ self._update_score(self.last_memory)
227
+
228
+ if "inventory" in tool_names:
229
+ self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
230
+
231
+ if verbose:
232
+ print(f"\n{observation}")
233
+
234
+ for step in range(1, max_steps + 1):
235
+ await self._refresh_context_tools(client, tool_names, step, verbose)
236
+
237
+ prompt = self._build_prompt()
238
+ response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
239
+ thought, tool_name, tool_args = self._parse_response(response, tool_names)
240
+
241
+ if verbose:
242
+ print(f"\n--- Step {step} ---")
243
+ print(f"[THOUGHT] {thought}")
244
+ print(f"[TOOL] {tool_name}({tool_args})")
245
+
246
+ tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
247
+
248
+ # ------------------------------------------------------------
249
+ # Block SAME (location + action + thought)
250
+ # ------------------------------------------------------------
251
+ if tool_name == "play_action":
252
+ current_loc = (
253
+ self.last_observation.split("\n")[0].strip()
254
+ if self.last_observation else "Unknown"
255
+ )
256
+ action_norm = tool_args.get("action", "look").strip().lower()
257
+ t_sig = self._thought_sig(thought)
258
+
259
+ triple = (current_loc, action_norm, t_sig)
260
+ self.loc_action_thought_counts[triple] = (
261
+ self.loc_action_thought_counts.get(triple, 0) + 1
262
+ )
263
+
264
+ if self.loc_action_thought_counts[triple] >= 2:
265
+ if verbose:
266
+ print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
267
+ if "get_map" in tool_names:
268
+ tool_name, tool_args = "get_map", {}
269
+ elif "memory" in tool_names:
270
+ tool_name, tool_args = "memory", {}
271
+ else:
272
+ tool_name, tool_args = "play_action", {"action": "look"}
273
+
274
+ # ------------------------------------------------------------
275
+ # Loop detection (same action spam)
276
+ # ------------------------------------------------------------
277
+ if tool_name == "play_action":
278
+ action = tool_args.get("action", "look")
279
+ self.recent_actions.append(action)
280
+ if len(self.recent_actions) > 5:
281
+ self.recent_actions = self.recent_actions[-5:]
282
+
283
+ if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
284
+ if verbose:
285
+ print("[WARNING] Loop detected - forcing 'look'")
286
+ tool_args = {"action": "look"}
287
+
288
+ # ------------------------------------------------------------
289
+ # Anti-backtracking: block only FAILED moves
290
+ # ------------------------------------------------------------
291
+ self.pending_move = None
292
+
293
+ if tool_name == "play_action":
294
+ action_norm = tool_args.get("action", "look").strip().lower()
295
+
296
+ if action_norm in MOVE_CMDS:
297
+ current_loc = (
298
+ self.last_observation.split("\n")[0].strip()
299
+ if self.last_observation else "Unknown"
300
+ )
301
+ key = (current_loc, action_norm)
302
+
303
+ if self.loc_move_failures.get(key, 0) >= 2:
304
+ if verbose:
305
+ print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
306
+ if "get_map" in tool_names:
307
+ tool_name, tool_args = "get_map", {}
308
+ elif "memory" in tool_names:
309
+ tool_name, tool_args = "memory", {}
310
+ else:
311
+ tool_name, tool_args = "play_action", {"action": "look"}
312
+ else:
313
+ self.pending_move = (current_loc, action_norm)
314
+
315
+ # ------------------------------------------------------------
316
+ # Count moves
317
+ # ------------------------------------------------------------
318
+ if tool_name == "play_action":
319
+ moves += 1
320
+
321
+ # ------------------------------------------------------------
322
+ # Execute tool
323
+ # ------------------------------------------------------------
324
+ try:
325
+ result = await client.call_tool(tool_name, tool_args)
326
+ out_text = self._extract_result(result)
327
+
328
+ if tool_name == "play_action":
329
+ observation = out_text
330
+ self.last_observation = observation
331
+ elif tool_name == "memory":
332
+ self.last_memory = out_text
333
+ elif tool_name == "get_map":
334
+ self.last_map = out_text
335
+ elif tool_name == "inventory":
336
+ self.last_inventory = out_text
337
+
338
+ if verbose:
339
+ print(f"[RESULT] {out_text[:200]}...")
340
+
341
+ except Exception as e:
342
+ out_text = f"Error: {e}"
343
+ observation = out_text
344
+ self.last_observation = observation
345
+ if verbose:
346
+ print(f"[ERROR] {e}")
347
+
348
+ # ------------------------------------------------------------
349
+ # Post-move update
350
+ # ------------------------------------------------------------
351
+ if tool_name == "play_action":
352
+ new_location = observation.split("\n")[0] if observation else "Unknown"
353
+
354
+ if self.pending_move is not None:
355
+ prev_loc, prev_action = self.pending_move
356
+ key = (prev_loc, prev_action)
357
+
358
+ if new_location == prev_loc:
359
+ self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
360
+ else:
361
+ self.loc_move_failures[key] = 0
362
+
363
+ self.pending_move = None
364
+
365
+ location = new_location
366
+ locations_visited.add(location)
367
+ self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
368
+
369
+ self._update_score(observation)
370
+
371
+ if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
372
+ if "inventory" in tool_names:
373
+ self.last_inventory = self._extract_result(
374
+ await client.call_tool("inventory", {})
375
+ )
376
+
377
+ # ------------------------------------------------------------
378
+ # History
379
+ # ------------------------------------------------------------
380
+ self.history.append({
381
+ "step": step,
382
+ "thought": thought,
383
+ "tool": tool_name,
384
+ "args": tool_args,
385
+ "result": out_text[:200]
386
+ })
387
+ if len(self.history) > 10:
388
+ self.history = self.history[-10:]
389
+
390
+ history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
391
+
392
+ if self._is_game_over(observation):
393
+ if verbose:
394
+ print("\n*** GAME OVER ***")
395
+ break
396
+
397
  return RunResult(
398
+ final_score=self.score,
399
+ max_score=350,
400
  moves=moves,
401
  locations_visited=locations_visited,
402
+ game_completed=self._is_game_over(self.last_observation),
403
  history=history,
404
  )
405
+
406
+
407
+ async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
 
 
 
 
 
 
 
 
408
  """
409
+ Pull structured context from MCP server without spending moves.
410
+ Tuned to your server outputs:
411
+ - memory() is the best single summary
412
+ - get_map() helps navigation
413
+ - inventory() helps object planning
 
414
  """
415
+ # Memory: often (every 4 steps) so LLM doesn't forget state
416
+ if "memory" in tool_names and (step == 1 or step % 4 == 0):
417
+ try:
418
+ self.last_memory = self._extract_result(await client.call_tool("memory", {}))
419
+ self._update_score(self.last_memory)
420
+ except Exception:
421
+ pass
422
+
423
+ # Map: occasionally (every 6 steps), and also if we moved a lot recently
424
+ if "get_map" in tool_names and (step % 6 == 0):
425
+ try:
426
+ self.last_map = self._extract_result(await client.call_tool("get_map", {}))
427
+ except Exception:
428
+ pass
429
+
430
+ # Inventory: occasionally (every 10 steps)
431
+ if "inventory" in tool_names and (step == 1 or step % 10 == 0):
432
+ try:
433
+ self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
434
+ except Exception:
435
+ pass
436
+
437
+ def _build_prompt(self) -> str:
438
  """
439
+ Build prompt that is aligned with your MCP server:
440
+ - memory() has STATE/RECENT/OBSERVATION
441
+ - get_map() starts with MAP
442
+ - inventory() starts with INVENTORY
443
  """
444
+ parts = []
445
+ parts.append(f"Current best-known score: {self.score}")
446
+
447
+ # Give the model your server-side memory snapshot (truncate to keep prompt lean)
448
+ if self.last_memory:
449
+ mem = self._truncate(self.last_memory, 1200)
450
+ parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
451
+
452
+ if self.last_inventory:
453
+ inv = self._truncate(self.last_inventory, 400)
454
+ parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
455
+
456
+ if self.last_map:
457
+ mp = self._truncate(self.last_map, 700)
458
+ parts.append("\n=== MAP (from MCP server) ===\n" + mp)
459
+
460
+ # Recent local history (anti-loop)
461
+ if self.history:
462
+ parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
463
+ for entry in self.history[-3:]:
464
+ action = entry.get("args", {}).get("action", entry["tool"])
465
+ result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
466
+ parts.append(f" > {action} -> {result_short}")
467
+
468
+ if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
469
+ parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
470
+
471
+ # Always include the most recent raw observation
472
+ parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
473
+ parts.append("\nWhat do you do next?")
474
+
475
+ return "\n".join(parts)
476
+
477
+ def _truncate(self, text: str, limit: int) -> str:
478
+ text = text or ""
479
+ if len(text) <= limit:
480
+ return text
481
+ return text[:limit] + "\n...[truncated]"
482
+
483
+ def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
484
+ thought = "No reasoning provided"
485
+ tool_name = "play_action"
486
+ tool_args = {"action": "look"}
487
+
488
+ lines = response.strip().split("\n")
489
+ for line in lines:
490
+ line_clean = line.strip()
491
+ line_upper = line_clean.upper()
492
+
493
+ if line_upper.startswith("THOUGHT:"):
494
+ thought = line_clean.split(":", 1)[1].strip()
495
+
496
+ elif line_upper.startswith("TOOL:"):
497
+ raw_tool = line_clean.split(":", 1)[1].strip().lower()
498
+ raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
499
+ raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
500
+ tool_name = raw_tool
501
+
502
+ elif line_upper.startswith("ARGS:"):
503
+ args_part = line_clean.split(":", 1)[1].strip()
504
+ if not args_part:
505
+ tool_args = {}
506
+ continue
507
+ try:
508
+ args_part = args_part.replace("'", '"')
509
+ tool_args = json.loads(args_part)
510
+ except json.JSONDecodeError:
511
+ match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
512
+ if match:
513
+ tool_args = {"action": match.group(1)}
514
+ else:
515
+ tool_args = {"action": "look"}
516
+
517
+ return thought, tool_name, tool_args
518
+
519
+ def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
520
+
521
+
522
+ if tool_name not in valid_tools:
523
+ if tool_name in ["action", "do", "command"]:
524
+ tool_name = "play_action"
525
+ elif tool_name in ["map", "location"]:
526
+ tool_name = "get_map"
527
+ elif tool_name in ["mem", "state", "status"]:
528
+ tool_name = "memory"
529
+ elif tool_name in ["inv", "items"]:
530
+ tool_name = "inventory"
531
+ else:
532
+ tool_name = "play_action"
533
+
534
+ if tool_name == "play_action":
535
+ action = tool_args.get("action", "look")
536
+
537
+ invalid_verb_map = {
538
+ "check": "examine",
539
+ "inspect": "examine",
540
+ "search": "look",
541
+ "grab": "take",
542
+ "pick": "take",
543
+ "use": "examine",
544
+ "investigate": "examine",
545
+ }
546
+
547
+ words = action.lower().split()
548
+ if words and words[0] in invalid_verb_map:
549
+ words[0] = invalid_verb_map[words[0]]
550
+ action = " ".join(words)
551
+
552
+ action = action.lower().strip()
553
+ action = action.replace("**", "").replace("*", "").replace("`", "")
554
+ action = " ".join(action.split())
555
+
556
+ tool_args["action"] = action
557
+
558
+ return tool_name, tool_args
559
+
560
+ def _extract_result(self, result) -> str:
561
+ if hasattr(result, 'content') and result.content:
562
+ return result.content[0].text
563
+ if isinstance(result, list) and result:
564
+ return result[0].text if hasattr(result[0], 'text') else str(result[0])
565
+ return str(result)
566
+
567
+ def _update_score(self, text: str) -> None:
568
+ patterns = [
569
+ r'\[Score:\s*(\d+)',
570
+ r'Score:\s*(\d+)\b',
571
+ ]
572
+ for pattern in patterns:
573
+ match = re.search(pattern, text, re.IGNORECASE)
574
+ if match:
575
+ self.score = max(self.score, int(match.group(1)))
576
+
577
+ def _is_game_over(self, text: str) -> bool:
578
+ game_over_phrases = [
579
+ "game over",
580
+ "you have died",
581
+ "you are dead",
582
+ "*** you have died ***",
583
+ ]
584
+ text_lower = (text or "").lower()
585
+ return any(phrase in text_lower for phrase in game_over_phrases)
586
 
587
 
588
  # =============================================================================
589
+ # Local Testing
590
  # =============================================================================
591
 
592
  async def test_agent():
 
593
  from fastmcp import Client
594
+
 
 
 
595
  agent = StudentAgent()
596
+
597
+ async with Client("mcp_server.py") as client:
598
  result = await agent.run(
599
  client=client,
600
  game="zork1",
601
+ max_steps=20,
602
  seed=42,
603
  verbose=True,
604
  )
605
+
606
+ print(f"\n{'=' * 50}")
607
+ print(f"Final Score: {result.final_score}")
608
  print(f"Moves: {result.moves}")
609
+ print(f"Locations: {len(result.locations_visited)}")
610
 
611
 
612
  if __name__ == "__main__":
mcp_server.py CHANGED
@@ -45,53 +45,338 @@ mcp = FastMCP("Student Text Adventure Server")
45
  # Game State Management
46
  # =============================================================================
47
 
 
 
 
48
  class GameManager:
49
  """
50
  Manages the text adventure game state.
51
-
52
- TODO: Extend this class to track:
53
  - Action history (for memory tool)
54
  - Explored locations (for mapping)
55
  - Current score and moves
 
56
  """
57
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  def __init__(self):
59
  self.env: TextAdventureEnv = None
60
  self.state = None
61
  self.game_name: str = ""
62
- # TODO: Add more state tracking
63
- # self.history: list[tuple[str, str]] = []
64
- # self.explored_locations: dict[str, set[str]] = {}
65
- # self.current_location: str = ""
66
-
 
67
  def initialize(self, game: str = "zork1"):
68
  """Initialize or reset the game."""
69
  self.game_name = game
70
  self.env = TextAdventureEnv(game)
71
  self.state = self.env.reset()
72
- # TODO: Reset your state tracking here
 
 
 
 
 
73
  return self.state.observation
74
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  def step(self, action: str) -> str:
76
  """Execute an action and return the result."""
77
  if self.env is None:
78
  self.initialize()
79
-
 
 
 
 
80
  self.state = self.env.step(action)
81
-
82
- # TODO: Update your state tracking here
83
- # self.history.append((action, self.state.observation))
84
- # Update location tracking, etc.
85
-
86
- return self.state.observation
87
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  def get_score(self) -> int:
89
  """Get current score."""
90
  return self.state.score if self.state else 0
91
-
92
  def get_moves(self) -> int:
93
  """Get number of moves taken."""
94
  return self.state.moves if self.state else 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
 
97
  # Global game manager
@@ -136,11 +421,37 @@ def play_action(action: str) -> str:
136
  # TODO: You might want to include score changes in the response
137
 
138
  result = game.step(action)
 
 
 
 
139
 
140
  # Optional: Append score info
141
  # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
142
 
143
- return result
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
 
146
  # TODO: Implement additional tools to help your agent
 
45
  # Game State Management
46
  # =============================================================================
47
 
48
+ import re
49
+ from typing import Optional
50
+
51
  class GameManager:
52
  """
53
  Manages the text adventure game state.
54
+
55
+ Extended tracking:
56
  - Action history (for memory tool)
57
  - Explored locations (for mapping)
58
  - Current score and moves
59
+ - Current location (best-effort, robust across games)
60
  """
61
+
62
+ # Lines that are often NOT room titles across many IF games
63
+ _HEADER_LIKE_PATTERNS = [
64
+ r"^\s*score\s*[:=]\s*\d+",
65
+ r"^\s*moves?\s*[:=]\s*\d+",
66
+ r"^\s*turns?\s*[:=]\s*\d+",
67
+ r"^\s*time\s*[:=]\s*",
68
+ r"^\s*health\s*[:=]\s*\d+",
69
+ r"^\s*location\s*[:=]\s*",
70
+ r"^\s*\[.*\]\s*$", # bracket-only status lines
71
+ r"^\s*\(.*\)\s*$", # parenthetical-only lines
72
+ r"^\s*you\s+(are|see|can)\b", # narrative sentence starters
73
+ ]
74
+ # Movement commands we consider for mapping (Zork-style + abbreviations)
75
+ _MOVE_CMDS = {
76
+ "north", "south", "east", "west", "up", "down", "enter", "exit",
77
+ "n", "s", "e", "w", "u", "d"
78
+ }
79
+
80
+ # Common failure phrases when trying to move (best-effort, not perfect)
81
+ _MOVE_FAIL_PHRASES = [
82
+ "you can't go", "you cannot go", "can't go that way", "cannot go that way",
83
+ "you can't go that way", "you cannot go that way",
84
+ "you can't", "you cannot",
85
+ "there is no way", "you can't see any way", "you see no way",
86
+ "blocked", "closed", "won't open", "is locked", "locked",
87
+ "too dark", "pitch black"
88
+ ]
89
+
90
+ def _is_movement_action(self, action: str) -> bool:
91
+ """Return True if this action is a movement command we track."""
92
+ a = (action or "").strip().lower()
93
+ return a in self._MOVE_CMDS
94
+
95
+ def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
96
+ """
97
+ Decide whether a move likely succeeded.
98
+ Strong signal: location label changed.
99
+ Negative signal: failure phrases in observation.
100
+ """
101
+ if new_loc and old_loc and new_loc != old_loc:
102
+ return True
103
+
104
+ text = (observation or "").lower()
105
+ if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
106
+ return False
107
+
108
+ # If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
109
+ return False
110
+
111
+ def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
112
+ """Record a directed edge old_loc --action--> new_loc in explored_locations."""
113
+ if not old_loc or not new_loc:
114
+ return
115
+ self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
116
+
117
+
118
  def __init__(self):
119
  self.env: TextAdventureEnv = None
120
  self.state = None
121
  self.game_name: str = ""
122
+
123
+ # Tracking for agent-support tools
124
+ self.history: list[tuple[str, str]] = []
125
+ self.explored_locations: dict[str, set[str]] = {}
126
+ self.current_location: str = "Unknown"
127
+
128
  def initialize(self, game: str = "zork1"):
129
  """Initialize or reset the game."""
130
  self.game_name = game
131
  self.env = TextAdventureEnv(game)
132
  self.state = self.env.reset()
133
+
134
+ # Reset tracking
135
+ self.history = []
136
+ self.explored_locations = {}
137
+ self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
138
+
139
  return self.state.observation
140
+
141
+ def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
142
+ """
143
+ Best-effort location extraction from the observation text.
144
+
145
+ Strategy:
146
+ 1) Split into lines, skip empties
147
+ 2) Skip lines that look like status bars / headers / pure brackets
148
+ 3) Prefer a short, title-like line (room name)
149
+ 4) If nothing confident, return fallback (usually previous location)
150
+ """
151
+ if not observation:
152
+ return fallback or "Unknown"
153
+
154
+ lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
155
+ if not lines:
156
+ return fallback or "Unknown"
157
+
158
+ header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
159
+
160
+ def looks_like_header(line: str) -> bool:
161
+ return any(rx.search(line) for rx in header_res)
162
+
163
+ def looks_like_title(line: str) -> bool:
164
+ # Many room titles are short and not ending with punctuation.
165
+ if len(line) > 60:
166
+ return False
167
+ if line.endswith((".", "!", "?", ";", ":")):
168
+ return False
169
+ # Too many digits usually means a status line.
170
+ if sum(ch.isdigit() for ch in line) >= 3:
171
+ return False
172
+ return True
173
+
174
+ # First pass: first "title-like" line that isn't header-like
175
+ for line in lines[:8]: # only inspect top chunk; titles are usually early
176
+ if looks_like_header(line):
177
+ continue
178
+ if looks_like_title(line):
179
+ return line
180
+
181
+ # Second pass: first non-header line
182
+ for line in lines[:8]:
183
+ if not looks_like_header(line):
184
+ return line
185
+
186
+ return fallback or "Unknown"
187
+
188
  def step(self, action: str) -> str:
189
  """Execute an action and return the result."""
190
  if self.env is None:
191
  self.initialize()
192
+
193
+ # Save old location before action
194
+ old_location = self.current_location
195
+
196
+ # Apply action to the real game
197
  self.state = self.env.step(action)
198
+ obs = self.state.observation
199
+
200
+ # Track history (keep last 50)
201
+ self.history.append((action, obs))
202
+ if len(self.history) > 50:
203
+ self.history = self.history[-50:]
204
+
205
+ # Extract new location (fallback to old)
206
+ new_location = self._extract_location(obs, fallback=old_location)
207
+
208
+ # Update map only if it was a movement attempt AND it likely succeeded
209
+ action_norm = (action or "").strip().lower()
210
+ if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
211
+ self._update_map(action_norm, old_location, new_location)
212
+
213
+ # Finally update current location
214
+ self.current_location = new_location
215
+
216
+ return obs
217
+
218
+
219
  def get_score(self) -> int:
220
  """Get current score."""
221
  return self.state.score if self.state else 0
222
+
223
  def get_moves(self) -> int:
224
  """Get number of moves taken."""
225
  return self.state.moves if self.state else 0
226
+ def _extract_facts(self, observation: str) -> dict:
227
+ """
228
+ Best-effort extraction of useful 'facts' from the current observation text.
229
+ This is intentionally heuristic so it can work across many games.
230
+ """
231
+ obs = observation or ""
232
+ text = obs.strip()
233
+ lower = text.lower()
234
+
235
+ # --- Exits mentioned (simple direction scan) ---
236
+ directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
237
+ exits_found = []
238
+ for d in directions:
239
+ # We detect directions as whole words to reduce false matches
240
+ if re.search(rf"\b{re.escape(d)}\b", lower):
241
+ exits_found.append(d)
242
+ exits_found = sorted(set(exits_found))
243
+
244
+ # --- Visible things (very light heuristics) ---
245
+ # We look for common IF patterns like "You see ... here." / "There is ... here."
246
+ visible_candidates: list[str] = []
247
+
248
+ patterns = [
249
+ r"you see (.+?) here\.",
250
+ r"you can see (.+?) here\.",
251
+ r"there is (.+?) here\.",
252
+ r"there are (.+?) here\.",
253
+ r"you notice (.+?)\.",
254
+ ]
255
+ for pat in patterns:
256
+ for m in re.finditer(pat, lower):
257
+ chunk = m.group(1).strip()
258
+ if chunk:
259
+ visible_candidates.append(chunk)
260
+
261
+ # Clean visible candidates a bit (split simple lists, avoid huge strings)
262
+ visible = []
263
+ for chunk in visible_candidates:
264
+ # Split on commas and "and" to get smaller pieces
265
+ parts = re.split(r",|\band\b", chunk)
266
+ for p in parts:
267
+ item = p.strip(" .;:!?\t")
268
+ if 1 <= len(item) <= 40:
269
+ visible.append(item)
270
+
271
+ # Deduplicate and limit (so memory stays compact)
272
+ visible = sorted(set(visible))[:10]
273
+
274
+ return {
275
+ "exits_mentioned": exits_found,
276
+ "visible": visible,
277
+ }
278
+
279
+ def get_memory(self) -> str:
280
+ """
281
+ LLM-friendly summary of current game state.
282
+ Format: Facts first, then recent actions, then the raw observation.
283
+ """
284
+ game = self.game_name or "Unknown"
285
+ location = self.current_location or "Unknown"
286
+ score = self.get_score()
287
+ moves = self.get_moves()
288
+
289
+ # Recent actions (keep short and anti-loop)
290
+ recent = self.history[-5:] if self.history else []
291
+ if recent:
292
+ recent_lines = []
293
+ for a, r in recent:
294
+ snippet = (r or "").replace("\n", " ").strip()
295
+ if len(snippet) > 80:
296
+ snippet = snippet[:80] + "..."
297
+ recent_lines.append(f"- {a} -> {snippet}")
298
+ recent_str = "\n".join(recent_lines)
299
+ else:
300
+ recent_str = "(none yet)"
301
+
302
+ # Facts extracted from current observation
303
+ obs = self.state.observation if self.state else ""
304
+ facts = self._extract_facts(obs)
305
+
306
+ exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
307
+ visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
308
+
309
+ return (
310
+ "STATE\n"
311
+ f"Game: {game}\n"
312
+ f"Location: {location}\n"
313
+ f"Score: {score} Moves: {moves}\n"
314
+ f"Visible (best effort): {visible_txt}\n"
315
+ f"Exits mentioned (best effort): {exits_txt}\n"
316
+ "\n"
317
+ "RECENT\n"
318
+ f"{recent_str}\n"
319
+ "\n"
320
+ "OBSERVATION\n"
321
+ f"{obs}"
322
+ )
323
+ def get_map(self) -> str:
324
+ """
325
+ Return a readable map of explored locations.
326
+ Uses explored_locations built during movement actions.
327
+
328
+ Output is stable + compact for LLM use.
329
+ """
330
+ if not self.explored_locations:
331
+ return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
332
+
333
+ lines = ["MAP", "Explored locations and exits:"]
334
+ for loc in sorted(self.explored_locations.keys()):
335
+ exits = sorted(self.explored_locations[loc])
336
+ lines.append(f"\n* {loc}")
337
+ for e in exits:
338
+ lines.append(f" - {e}")
339
+
340
+ lines.append(f"\n[Current] {self.current_location}")
341
+ return "\n".join(lines)
342
+ def get_inventory(self) -> str:
343
+ """
344
+ Return inventory in a robust way across different games/envs.
345
+
346
+ Strategy:
347
+ 1) If state.inventory exists and is non-empty -> format it
348
+ 2) Otherwise, fall back to issuing the command "inventory"
349
+ through the environment and return that observation
350
+ """
351
+ # 1) Try structured inventory if provided by env
352
+ items = []
353
+ if self.state is not None and hasattr(self.state, "inventory"):
354
+ inv = getattr(self.state, "inventory")
355
+ if inv:
356
+ # Normalize to strings
357
+ try:
358
+ items = [str(x).strip() for x in inv if str(x).strip()]
359
+ except Exception:
360
+ items = []
361
+
362
+ if items:
363
+ # Keep it simple and safe: just join a cleaned list
364
+ # (Avoid overly aggressive parsing that breaks across games)
365
+ items = sorted(set(items))
366
+ return "INVENTORY\n" + ", ".join(items)
367
+
368
+ # 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
369
+ # NOTE: We do not want to record this as agent history/map; this is a server-side query.
370
+ if self.env is None:
371
+ self.initialize()
372
+
373
+ try:
374
+ tmp_state = self.env.step("inventory")
375
+ inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
376
+ except Exception:
377
+ inv_text = "Inventory: (unable to retrieve)"
378
+
379
+ return "INVENTORY\n" + inv_text.strip()
380
 
381
 
382
  # Global game manager
 
421
  # TODO: You might want to include score changes in the response
422
 
423
  result = game.step(action)
424
+
425
+ # Append score/moves for clearer feedback (LLM-friendly, low noise)
426
+ result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
427
+ return result
428
 
429
  # Optional: Append score info
430
  # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
431
 
432
+
433
+ @mcp.tool()
434
+ def memory() -> str:
435
+ """
436
+ Return an LLM-friendly summary of the current game state.
437
+ """
438
+ game = get_game()
439
+ return game.get_memory()
440
+ @mcp.tool()
441
+ def get_map() -> str:
442
+ """
443
+ Return a map of explored locations and recorded exits.
444
+ """
445
+ game = get_game()
446
+ return game.get_map()
447
+
448
+ @mcp.tool()
449
+ def inventory() -> str:
450
+ """
451
+ Return the player's inventory in a robust way.
452
+ """
453
+ game = get_game()
454
+ return game.get_inventory()
455
 
456
 
457
  # TODO: Implement additional tools to help your agent