Chloé Court commited on
Commit
9c0dbe0
·
1 Parent(s): 7a36b3c

Submission

Browse files
Files changed (5) hide show
  1. README.md +61 -7
  2. agent.py +597 -210
  3. mcp_server.py +192 -119
  4. requirements.txt +15 -7
  5. utils.py +42 -0
README.md CHANGED
@@ -10,19 +10,72 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # Text Adventure Agent Submission
14
 
15
  ## Overview
 
16
 
17
- This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
 
 
 
 
 
18
 
19
- ## Approach
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- <!-- Describe your approach here -->
22
 
23
- - What strategy does your agent use?
24
- - What tools did you implement in your MCP server?
25
- - Any interesting techniques or optimizations?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Files
28
 
@@ -30,6 +83,7 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
30
  |------|-------------|
31
  | `agent.py` | ReAct agent with `StudentAgent` class |
32
  | `mcp_server.py` | MCP server with game interaction tools |
 
33
  | `app.py` | Gradio interface for HF Space |
34
  | `requirements.txt` | Additional dependencies |
35
 
 
10
  license: mit
11
  ---
12
 
13
+ # Autonomous Text Adventure Agent
14
 
15
  ## Overview
16
+ This project implements an autonomous text adventure agent designed to master parser-based interactive fiction (e.g., *Zork*). Unlike simple scripted bots, this agent utilizes a **ReAct-style reasoning loop** paired with an **MCP (Model Context Protocol) server** to manage structured memory and strategic planning.
17
 
18
+ ### Primary Objectives
19
+ * **Systematic Exploration:** Map and traverse complex game worlds.
20
+ * **Logic Puzzle Solving:** Interact with objects to unlock progression.
21
+ * **Loop Prevention:** Identify and break repetitive cycles or stagnant states.
22
+ * **State Consistency:** Maintain an accurate, persistent mental model of the world.
23
+ * **Efficiency:** Maximize the game score while minimizing unnecessary moves.
24
 
25
+ ---
26
+
27
+ ## Core Architecture
28
+ The agent operates on a three-layer decision model that ensures every action is grounded in observation and strategic intent.
29
+
30
+
31
+ 1. **Observation Input:** Raw text from the game engine is parsed.
32
+ 2. **Planner & Memory Update:** The LLM updates the cumulative world state.
33
+ 3. **Tool Selection:** Reasoning logic picks the best tool/action based on policy constraints.
34
+ 4. **Environment Interaction:** The command is executed via the MCP interface.
35
+
36
+ ---
37
+
38
+ ## Structured Memory System
39
+ The agent treats each location as an independent world substate. Memory is **incremental**, meaning it evolves with the agent's discoveries rather than being wiped.
40
+
41
+ ### Location Memory Schema
42
+ For every discovered room, the agent tracks:
43
+ * **Objects:** Visible and interactable items.
44
+ * **Action History:** Commands already attempted and their results.
45
+ * **Topology:** Explored vs. unexplored directions.
46
+ * **Context:** Cumulative summaries and strategic hints.
47
+
48
+ > **Key Principle:** Preserve previously known facts unless an observation explicitly contradicts them (e.g., "The door is now open").
49
+
50
+ ---
51
 
 
52
 
53
+ ## Anti-Loop & Stagnation Policy
54
+ To prevent getting "stuck," the agent follows strict rules:
55
+ * **No Oscillation:** Tools cannot be toggled more than twice consecutively.
56
+ * **Action Blacklisting:** Actions that have already been done are logged and avoided until the environment state changes.
57
+ * **Stagnation Escape:** If progress halts, the agent is forced to switch interaction verbs or backtrack to the "least recently visited" area.
58
+
59
+ ---
60
+
61
+ ## MCP Tool Interface
62
+ The agent interacts with the game through a standardized toolset:
63
+
64
+ * `play_action`: Executes commands (e.g., "north", "take lamp").
65
+ * `memory`: Retrieves the structured world state.
66
+ * `inventory`: Lists currently held items.
67
+ * `get_map`: Visualizes explored connections for navigation.
68
+ * `get_valid_actions`: Filters plausible commands to reduce hallucinations.
69
+
70
+ ---
71
+
72
+ ## Performance Metrics
73
+ Progress is measured by an **Efficiency Ratio**:
74
+ $$Efficiency = \frac{Score}{\max(1, Moves)}$$
75
+
76
+ The agent also tracks unique object discoveries and the total percentage of the map explored.
77
+
78
+ ---
79
 
80
  ## Files
81
 
 
83
  |------|-------------|
84
  | `agent.py` | ReAct agent with `StudentAgent` class |
85
  | `mcp_server.py` | MCP server with game interaction tools |
86
+ | `utils.py` | Useful shared functions |
87
  | `app.py` | Gradio interface for HF Space |
88
  | `requirements.txt` | Additional dependencies |
89
 
agent.py CHANGED
@@ -24,256 +24,643 @@ Tips:
24
  """
25
 
26
  import json
27
- import os
28
  import re
29
  from dataclasses import dataclass, field
30
  from typing import Optional
31
 
32
- from dotenv import load_dotenv
33
- from huggingface_hub import InferenceClient
34
-
35
- # Load environment variables
36
- load_dotenv()
37
 
38
  # =============================================================================
39
- # LLM Configuration - DO NOT MODIFY
40
  # =============================================================================
41
 
42
- # Model to use (fixed for fair evaluation)
43
- LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
44
-
45
- # Initialize the LLM client (uses HF_TOKEN from environment)
46
- _hf_token = os.getenv("HF_TOKEN")
47
- if not _hf_token:
48
- raise ValueError("HF_TOKEN not found. Set it in your .env file.")
49
-
50
- LLM_CLIENT = InferenceClient(token=_hf_token)
51
-
52
-
53
- def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
54
- """
55
- Call the LLM with the given prompt. Use this function in your agent.
56
-
57
- Args:
58
- prompt: The user prompt (current game state, history, etc.)
59
- system_prompt: The system prompt (instructions for the agent)
60
- seed: Random seed for reproducibility
61
- max_tokens: Maximum tokens in response (default: 300)
62
-
63
- Returns:
64
- The LLM's response text
65
-
66
- Example:
67
- response = call_llm(
68
- prompt="You are in a forest. What do you do?",
69
- system_prompt=SYSTEM_PROMPT,
70
- seed=42,
71
- )
72
- """
73
- messages = [
74
- {"role": "system", "content": system_prompt},
75
- {"role": "user", "content": prompt},
76
- ]
77
-
78
- response = LLM_CLIENT.chat.completions.create(
79
- model=LLM_MODEL,
80
- messages=messages,
81
- temperature=0.0, # Deterministic for reproducibility
82
- max_tokens=max_tokens,
83
- seed=seed,
84
- )
85
-
86
- return response.choices[0].message.content
87
-
88
 
89
  @dataclass
90
  class RunResult:
91
- """Result of running the agent. Do not modify this class."""
92
  final_score: int
93
  max_score: int
94
  moves: int
95
  locations_visited: set[str]
96
  game_completed: bool
 
 
 
97
  error: Optional[str] = None
98
- history: list[tuple[str, str, str]] = field(default_factory=list)
99
 
100
 
101
  # =============================================================================
102
- # System Prompt - Customize this for your agent
103
  # =============================================================================
104
 
105
- SYSTEM_PROMPT = """You are playing a classic text adventure game.
106
-
107
- GOAL: Explore the world, solve puzzles, and maximize your score.
108
-
109
- AVAILABLE TOOLS (use via MCP):
110
- - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
111
- - memory: Get current game state and history (if implemented)
112
- - inventory: Check what you're carrying (if implemented)
113
-
114
- VALID GAME COMMANDS for play_action:
115
- - Movement: north, south, east, west, up, down, enter, exit
116
- - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
117
- - Other: look, inventory, read <thing>, turn on lamp
118
-
119
- RESPOND IN THIS EXACT FORMAT (no markdown):
120
- THOUGHT: <your reasoning about what to do next>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  TOOL: <tool_name>
122
- ARGS: <JSON arguments, e.g., {"action": "look"}>
123
-
124
- Example:
125
- THOUGHT: I should look around to see where I am.
126
- TOOL: play_action
127
- ARGS: {"action": "look"}
128
  """
129
 
130
-
131
  # =============================================================================
132
- # Student Agent - IMPLEMENT THIS CLASS
133
  # =============================================================================
134
 
135
  class StudentAgent:
136
- """
137
- Your ReAct agent implementation.
138
-
139
- TODO:
140
- 1. Implement the run() method with the ReAct loop
141
- 2. Parse LLM responses to extract tool calls
142
- 3. Track state and avoid loops
143
-
144
- Use the provided call_llm() function to interact with the LLM.
145
- """
146
-
147
  def __init__(self):
148
- """Initialize your agent here."""
149
- # TODO: Initialize any state tracking you need
150
- # self.history = []
151
- # self.visited_locations = set()
152
- pass
153
-
154
- async def run(
155
- self,
156
- client, # FastMCP Client connected to your MCP server
157
- game: str,
158
- max_steps: int,
159
- seed: int,
160
- verbose: bool = False,
161
- ) -> RunResult:
162
- """
163
- Run the agent for a game session.
164
-
165
- Args:
166
- client: FastMCP Client connected to your MCP server
167
- game: Name of the game being played (e.g., "zork1")
168
- max_steps: Maximum number of steps to take
169
- seed: Random seed for reproducibility (use for LLM calls)
170
- verbose: Whether to print detailed output
171
-
172
- Returns:
173
- RunResult with final score and statistics
174
- """
175
- # TODO: Implement your ReAct loop here
176
- #
177
- # Basic structure:
178
- # 1. Get initial observation (call play_action with "look")
179
- # 2. Loop for max_steps:
180
- # a. Build prompt with current observation and history
181
- # b. Call LLM to get thought and action
182
- # c. Parse the response to extract tool and args
183
- # d. Call the tool via client.call_tool(tool_name, args)
184
- # e. Update history and state
185
- # f. Check for game over
186
- # 3. Return RunResult with final statistics
187
-
188
- # Example of calling a tool:
189
- # result = await client.call_tool("play_action", {"action": "look"})
190
- # observation = result[0].text if result else "No response"
191
-
192
- # Example of calling the LLM:
193
- # response = call_llm(
194
- # prompt="Current observation: " + observation,
195
- # system_prompt=SYSTEM_PROMPT,
196
- # seed=seed,
197
- # )
198
-
199
- # Placeholder implementation - replace with your code
200
- locations_visited = set()
201
  history = []
202
- final_score = 0
203
- moves = 0
204
-
205
- # TODO: Your implementation here
206
- # ...
207
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  return RunResult(
209
- final_score=final_score,
210
- max_score=350, # Zork1 max score, adjust if needed
211
  moves=moves,
212
- locations_visited=locations_visited,
213
- game_completed=False,
 
214
  history=history,
215
  )
216
-
217
- def _build_prompt(self, observation: str, history: list) -> str:
218
- """
219
- Build the prompt for the LLM.
220
-
221
- TODO: Implement this to create effective prompts
222
- """
223
- # TODO: Combine system prompt, history, and current observation
224
- pass
225
-
226
- def _parse_response(self, response: str) -> tuple[str, str, dict]:
227
- """
228
- Parse LLM response to extract thought, tool name, and arguments.
229
-
230
- TODO: Implement robust parsing
231
-
232
- Returns:
233
- Tuple of (thought, tool_name, args_dict)
234
- """
235
- # TODO: Parse the response format:
236
- # THOUGHT: ...
237
- # TOOL: ...
238
- # ARGS: {...}
239
- pass
240
-
241
- def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
242
  """
243
- Call the LLM with the given prompt.
244
-
245
- This is a convenience wrapper - you can also use call_llm() directly.
 
246
  """
247
- return call_llm(prompt, system_prompt, seed)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
 
 
249
 
250
- # =============================================================================
251
- # For local testing
252
- # =============================================================================
253
 
254
- async def test_agent():
255
- """Test the agent locally."""
256
- from fastmcp import Client
257
-
258
- # Path to your MCP server
259
- server_path = "mcp_server.py"
260
-
261
- agent = StudentAgent()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
 
263
- async with Client(server_path) as client:
264
- result = await agent.run(
265
- client=client,
266
- game="zork1",
267
- max_steps=10,
268
- seed=42,
269
- verbose=True,
270
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
271
 
272
- print(f"\nFinal Score: {result.final_score}")
273
- print(f"Moves: {result.moves}")
274
- print(f"Locations: {result.locations_visited}")
275
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276
 
277
- if __name__ == "__main__":
278
- import asyncio
279
- asyncio.run(test_agent())
 
24
  """
25
 
26
  import json
 
27
  import re
28
  from dataclasses import dataclass, field
29
  from typing import Optional
30
 
31
+ from utils import call_llm, extract_location, is_new_location
 
 
 
 
32
 
33
  # =============================================================================
34
+ # LLM Configuration
35
  # =============================================================================
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  @dataclass
39
  class RunResult:
 
40
  final_score: int
41
  max_score: int
42
  moves: int
43
  locations_visited: set[str]
44
  game_completed: bool
45
+ unique_objects: int = 0
46
+ puzzles_solved: int = 0
47
+ efficiency: float = 0.0
48
  error: Optional[str] = None
49
+ history: list[dict] = field(default_factory=list)
50
 
51
 
52
  # =============================================================================
53
+ # System Prompt
54
  # =============================================================================
55
 
56
+ SYSTEM_PROMPT = """
57
+ You are an expert text adventure game player. Your objective is to explore efficiently, collect treasures, solve puzzles, and maximize your score.
58
+ **Random movement is forbidden.** Always plan actions using context and memory.
59
+
60
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
61
+ AVAILABLE TOOLS (exactly ONE per step)
62
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
63
+ 1. memory - Check current state, items, objects, locations, and past actions.
64
+ 2. play_action - Execute a game command.
65
+ 3. get_map - Return to a previously visited location or get a map of explored areas.
66
+ 4. inventory - Check current inventory.
67
+ 5. get_valid_actions - Get likely valid actions from the current location.
68
+
69
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
70
+ TOOL PRIORITY RULE
71
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
72
+ Choose tool in this order:
73
+ 1. If local puzzle interaction is possible → play_action
74
+ 2. If interactable object is visible → play_action
75
+ 3. If inventory contains potentially useful item → inventory
76
+ 4. If location understanding is uncertain → memory
77
+ 5. If planning navigation to solve puzzle → get_map
78
+ 6. Exploration of world → play_action movement
79
+
80
+ **CRITICAL:**
81
+ - Do NOT use any tool other than play_action more than 2 times in a row.
82
+ - **DO NOT repeat an action that has already been attempted in the current location, unless the state clearly changed and it is necessary.**
83
+
84
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━���━━━
85
+ VALID GAME COMMANDS for play_action
86
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
87
+ Movement:
88
+ north, south, east, west, up, down, enter, exit
89
+ Objects:
90
+ take <item>, drop <item>, open <thing>, close <thing>, examine <thing>,
91
+ push <thing>, pull <thing>, move <thing>, lift <thing>, turn <thing>, press <thing>
92
+ Light:
93
+ turn on lamp, turn off lamp
94
+ Combat:
95
+ attack <enemy> with <weapon>
96
+ Other:
97
+ inventory, look, read <thing>, wait
98
+ Forbidden:
99
+ check, inspect, search, grab, use, help
100
+
101
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
102
+ STRATEGIC RULES
103
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
104
+ • **Avoid repeating actions:**
105
+ - **NEVER** repeat an action that has already been attempted in the current location.
106
+ - If an action failed or produced no progress, **do not try it again** in the same context.
107
+ - Track failed actions per location to avoid loops.
108
+
109
+ • Before leaving a location:
110
+ - Collect all useful items.
111
+ - Interact with all interesting objects (push/pull/move/lift/open) if "examine" yields nothing.
112
+ - Solve local puzzles before moving away.
113
+ - Check if there are valid actions related to visible objects or inventory items that haven't been tried yet.
114
+
115
+ • **Systematic exploration > random movement.**
116
+ • Avoid overusing "examine": if it yields nothing, try physical interactions (push/pull/move/lift/open/turn/press).
117
+ • If the previous observation indicates a failed action, **avoid that action and similar ones** in the future.
118
+
119
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
120
+ ANTI-REPETITION RULE (CRITICAL)
121
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
122
+ **STRICT POLICY:**
123
+ 1. **Track all attempted actions per location** in memory.
124
+ 2. **Never repeat an action** that has already been tried in the current location.
125
+ 3. If an action fails (e.g., "The door is locked"), **do not attempt it again** unless new context suggests it might now work (e.g., you found a key).
126
+ 4. If no progress is made after 3 actions, **change strategy** (e.g., try a different object or direction).
127
+
128
+ **Example:**
129
+ - If "open door" fails, **do not try it again** unless you acquire a key or new information.
130
+ - If "examine table" yields "nothing special," **try physical interactions** (push/pull/move) instead of repeating "examine."
131
+
132
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
133
+ INTERACTION STRATEGY
134
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
135
+ When you see an object:
136
+ 1. If it is a container → try **open** (only once).
137
+ 2. If large/fixed → try **move**, **push**, **pull**, or **lift** (only once each).
138
+ 3. If "examine" gives no useful info → try **one** physical interaction (e.g., turn/press).
139
+ 4. If enterable → try **enter** (only once).
140
+ 5. **Never repeat the same interaction** on the same object in the same location.
141
+
142
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
143
+ EXPLORATION RULE
144
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
145
+ - If no immediate objectives:
146
+ - Explore **unexplored directions systematically**.
147
+ - Prefer directions **not previously taken** from this location.
148
+ - **Do not wander randomly**: Always have a reason for movement (e.g., "The path east was not explored yet").
149
+ - Use **get_map** only to return to a location with unsolved puzzles or uncollected items.
150
+
151
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
152
+ RESPONSE FORMAT (STRICT — NO MARKDOWN)
153
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
154
+ THOUGHT: <brief reasoning referencing memory, map, or inventory if applicable>
155
  TOOL: <tool_name>
156
+ ARGS: <JSON arguments>
 
 
 
 
 
157
  """
158
 
 
159
  # =============================================================================
160
+ # StudentAgent
161
  # =============================================================================
162
 
163
  class StudentAgent:
 
 
 
 
 
 
 
 
 
 
 
164
  def __init__(self):
165
+ self.history = []
166
+ self.current_location = None
167
+ self.score = 0
168
+ self.recent_actions = []
169
+ self.last_tool = None
170
+ # structured memory
171
+ self.locations = {}
172
+
173
+ # =======================================
174
+ # Run
175
+ # =======================================
176
+ async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = False):
177
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  history = []
179
+
180
+ tools = await client.list_tools()
181
+ tool_names = [t.name for t in tools]
182
+
183
+ # ---------------------------------
184
+ # Initial observation
185
+ # ---------------------------------
186
+ tool_name, tool_args = "play_action", {"action": "look"}
187
+ self.last_tool = tool_name
188
+
189
+ result = await client.call_tool(tool_name, tool_args)
190
+ observation = self._extract_result(result)
191
+
192
+ # Detect starting location
193
+ self.current_location = extract_location(observation)
194
+
195
+ # Initialize location memory
196
+ self.locations[self.current_location] = {
197
+ "objects_seen": set(),
198
+ "actions_done": set(),
199
+ "directions_explored": set(),
200
+ "promising_hints": set(),
201
+ "memory": observation,
202
+ "observations_seen": set(),
203
+ "valid_actions": set()
204
+ }
205
+
206
+ self.locations[self.current_location]["observations_seen"].add(observation)
207
+
208
+ # Fetch valid actions
209
+ valid_actions = await client.call_tool("get_valid_actions", {})
210
+ parsed = self._extract_result(valid_actions)
211
+
212
+ self.locations[self.current_location]["valid_actions"] = set(
213
+ a.strip() for a in parsed.split(",") if a.strip()
214
+ )
215
+
216
+ if verbose:
217
+ print(observation)
218
+
219
+ # =====================================
220
+ # MAIN LOOP
221
+ # =====================================
222
+ for step in range(1, max_steps + 1):
223
+ # -------------------------
224
+ # Location detection
225
+ # -------------------------
226
+ try:
227
+ if is_new_location(observation, set(self.locations.keys()), self.last_tool):
228
+
229
+ new_location = extract_location(observation)
230
+
231
+ self.locations[self.current_location]["directions_explored"].add(
232
+ ("look", new_location)
233
+ )
234
+
235
+ self.current_location = new_location
236
+
237
+ if new_location not in self.locations.keys():
238
+ self.locations[new_location] = {
239
+ "objects_seen": set(),
240
+ "actions_done": set(),
241
+ "directions_explored": set(),
242
+ "promising_hints": set(),
243
+ "memory": observation,
244
+ "observations_seen": set(),
245
+ "valid_actions": set(),
246
+ }
247
+
248
+ # Fetch valid actions on entering location
249
+ try:
250
+ valid_actions = await client.call_tool(
251
+ "get_valid_actions",
252
+ {}
253
+ )
254
+
255
+ parsed = self._extract_result(valid_actions)
256
+
257
+ self.locations[self.current_location]["valid_actions"] = set(
258
+ a.strip() for a in parsed.split(",") if a.strip()
259
+ )
260
+
261
+ except Exception:
262
+ pass
263
+
264
+ except Exception:
265
+ pass
266
+
267
+ # Prevent tool oscillation
268
+ if len(self.history) >= 2:
269
+ actions = ["memory", "get_map", "inventory"]
270
+ # avoid using one of the non-play_action tools more than 2 times in a row
271
+ if any(self.last_tool == a for a in actions):
272
+ # Force exploration action instead of map query
273
+ self.forced_prompt_hint = "\nYou should choose play_action to explore instead of using the same tool again."
274
+ else:
275
+ self.forced_prompt_hint = ""
276
+
277
+ # -------------------------
278
+ # LLM decision step (pre-call for memory, objects, actions)
279
+ # -------------------------
280
+ if self.last_tool == "play_action":
281
+ planner_data = await self._call_planner_llm(observation)
282
+ print(f"\n[PLANNER LLM RESPONSE]\n{planner_data}\n")
283
+ print(f"[VALID ACTIONS]\n{self.locations[self.current_location]['valid_actions']}\n")
284
+
285
+ # Update memory with LLM-generated data
286
+ self.locations[self.current_location]["memory"] = planner_data["memory"]
287
+
288
+ actions = set(planner_data["promising_hints"])
289
+ actions -= self.locations[self.current_location]["actions_done"]
290
+ self.locations[self.current_location]["promising_hints"] = list(actions)
291
+
292
+ objects_seen_before = self.locations[self.current_location]["objects_seen"]
293
+ self.locations[self.current_location]["objects_seen"].update(planner_data["objects_seen"])
294
+
295
+ if objects_seen_before != self.locations[self.current_location]["objects_seen"]:
296
+ # Update valid actions
297
+ valid_actions = await client.call_tool("get_valid_actions", {})
298
+ parsed = self._extract_result(valid_actions)
299
+
300
+ self.locations[self.current_location]["valid_actions"] = set(
301
+ a.strip() for a in parsed.split(",") if a.strip()
302
+ )
303
+
304
+ # -------------------------
305
+ # Build prompt for tool selection (without calling LLM again)
306
+ # -------------------------
307
+ prompt = self._build_prompt(observation)
308
+
309
+ # Call LLM ONLY for tool selection (not for memory/objects/actions)
310
+ response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
311
+ thought, tool_name, tool_args = self._parse_response(response)
312
+
313
+ tool_name, tool_args = self._validate_tool_call(
314
+ tool_name,
315
+ tool_args,
316
+ tool_names
317
+ )
318
+ self.last_tool = tool_name
319
+
320
+ if tool_name == "play_action":
321
+ self.locations[self.current_location]["actions_done"].add(tool_args.get("action", "look"))
322
+
323
+ if verbose:
324
+ print(f"\nStep {step}")
325
+ print(f"Location: {self.current_location}")
326
+ print(f"Thought: {thought}")
327
+ print(f"Tool: {tool_name}")
328
+ print(f"Args: {tool_args}")
329
+
330
+ # -------------------------
331
+ # Tool execution
332
+ # -------------------------
333
+ try:
334
+ result = await client.call_tool(tool_name, tool_args)
335
+ observation = self._extract_result(result)
336
+ self.locations[self.current_location]["observations_seen"].add(observation)
337
+
338
+ except Exception as e:
339
+ observation = str(e)
340
+
341
+ # -------------------------
342
+ # Score tracking
343
+ # -------------------------
344
+ self._update_score(observation)
345
+
346
+ self.history.append({
347
+ "step": step,
348
+ "thought": thought,
349
+ "tool": tool_name,
350
+ "args": tool_args,
351
+ "result": observation
352
+ })
353
+
354
+ history.append((thought, f"{tool_name}({tool_args})", observation))
355
+
356
+ if len(self.history) > 10:
357
+ self.history = self.history[-10:]
358
+
359
+ if verbose:
360
+ print(f"[RESULT] {observation}")
361
+ print(f"[SCORE] {self.score}")
362
+
363
+ if self._is_game_over(observation):
364
+ break
365
+
366
+ # =====================================
367
+ # Final result
368
+ # =====================================
369
+ moves = len(self.history)
370
+ efficiency = self.score / max(1, moves)
371
+
372
  return RunResult(
373
+ final_score=self.score,
374
+ max_score=350,
375
  moves=moves,
376
+ locations_visited=self.locations,
377
+ game_completed=self._is_game_over(observation),
378
+ efficiency=efficiency,
379
  history=history,
380
  )
381
+
382
+ async def _call_planner_llm(self, observation: str) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
383
  """
384
+ Call the LLM to:
385
+ 1. Update the location memory.
386
+ 2. Extract interactable objects from the observation.
387
+ 3. Generate promising actions grounded in the observation.
388
  """
389
+ current_data = self.locations.get(self.current_location, {})
390
+
391
+ prompt = """
392
+ You are an expert text adventure agent. Your **only** goal is to maximize progress by:
393
+ - Solving puzzles (e.g., "use <object> on <thing>").
394
+ - Collecting useful items (e.g., "take <object>").
395
+ - Exploring new areas (e.g., "enter").
396
+ - Avoiding redundant or vague actions.
397
+
398
+ ---
399
+
400
+ ### CURRENT CONTEXT
401
+ **Location:**
402
+ {location}
403
+
404
+ **Current Observation:**
405
+ {observation}
406
+
407
+ **Current Memory of this Location:**
408
+ {memory}
409
+
410
+ ---
411
+
412
+ ### STRICT INSTRUCTIONS
413
+ Your task is to:
414
+ 1. **Update the memory**.
415
+ 2. **Extract interactable objects** (only explicitly mentioned in the observation).
416
+ 3. **Generate ≤5 promising actions** (strictly grounded in the observation + valid actions).
417
+
418
+ ---
419
+
420
+ #### 1. LOCATION MEMORY UPDATE
421
+
422
+ You are maintaining a cumulative memory of this location.
423
+
424
+ Goal:
425
+ Update the existing location description by merging it with the new observation,
426
+ while ensuring that the final description reflects the CURRENT STATE of the location.
427
+
428
+ Rules:
429
+
430
+ 1. Preserve all previously known environmental facts unless explicitly contradicted.
431
+ 2. Add any new information from the new observation.
432
+ 3. Remove facts that are clearly invalidated by the new observation.
433
+ 4. If an object is taken, it is no longer present in the location.
434
+ 5. If an object is dropped, it becomes present in the location.
435
+ 6. If an object changes state (opened, closed, locked, unlocked, broken, etc.), replace the old state with the new one.
436
+ 7. Only the CURRENT state of each object should appear in the final description.
437
+ 8. Do not keep outdated state history (e.g., do not keep both "closed" and "opened").
438
+ 9. Do NOT rewrite stylistically.
439
+ 10. Do not duplicate information.
440
+ 11. Keep it concise while preserving all relevant environmental details.
441
+
442
+ The final description must represent the current true state of the location,
443
+ not a history of past states.
444
+
445
+ #### 2. OBJECTS SEEN
446
+ List **only** objects that are:
447
+ - Explicitly mentioned in the observation.
448
+ - The objects should be clearly interactable (e.g., "a shiny key on the table" → "key", "a path" → not an object).
449
+ - Required for puzzle-solving.
450
+ Keep only the name of the object, without adjectives or extra description.
451
+
452
+ #### 3. **PROMISING HINTS**:
453
+ - Suggest **strategic hints** (not direct actions) that are strictly supported by the current observation and the valid actions for this location.
454
+ - Do not suggest actions already done in this location: {actions_done}.
455
+ - Do not suggest actions that do not seem possible (e.g., "take key" if the key is not mentioned in the observation, "open locked door").
456
+ - Hints must be directly supported by the current observation.
457
+ - Each hint should be a concise suggestion of what to try next, grounded in the current context (e.g., "The door is open, maybe you can enter it" → "try entering the door").
458
+ - Use the following action verbs if applicable: take, open, close, push, pull, move, lift, turn, press, enter, ... with the relevant object.
459
+
460
+ - Focus on:
461
+ * Potential puzzle solutions
462
+ * Object interactions
463
+ * Hidden opportunities
464
+ - Forbidden:
465
+ - Vague hints ("There might be something interesting")
466
+ - Repeats of already done actions
467
+ - Random movement without reason
468
+
469
+ - Movement rules:
470
+ - Do NOT suggest movement if there are still meaningful interactions available in the current location.
471
+ - If all useful local interactions have been exhausted, suggest exploring an unexplored direction.
472
+ - Prefer unexplored directions over previously visited ones.
473
+
474
+ ### OUTPUT FORMAT (STRICT JSON) with no markdown or explanations:
475
+ {{
476
+ "memory": "<updated_memory>",
477
+ "promising_hints": ["<hint1>", "<hint2>"],
478
+ "objects_seen": ["<object1>", "<object2>"]
479
+ }}
480
+ """.format(
481
+ observation=observation,
482
+ location=self.current_location,
483
+ memory=current_data.get("memory", ""),
484
+ actions_done=list(current_data.get("actions_done", set())),
485
+ )
486
 
487
+ response = call_llm(prompt=prompt, seed=42)
488
 
 
 
 
489
 
490
+ try:
491
+ data = json.loads(response)
492
+ json_data = {
493
+ "memory": data.get("memory", ""),
494
+ "promising_hints": data.get("promising_hints", []),
495
+ "objects_seen": data.get("objects_seen", [])
496
+ }
497
+
498
+ # remove promising actions that are already done
499
+ done_actions = self.locations[self.current_location].get("actions_done", set())
500
+ json_data["promising_hints"] = list(
501
+ set(json_data["promising_hints"]) - set(done_actions)
502
+ )
503
+ return json_data
504
+
505
+ except json.JSONDecodeError:
506
+ return {
507
+ "memory": "",
508
+ "promising_hints": [],
509
+ "objects_seen": []
510
+ }
511
+
512
+
513
+ def _build_prompt(self, observation: str) -> str:
514
+ """Build the prompt for the LLM, using pre-filled memory/objects/actions."""
515
+ current_location_data = self.locations.get(self.current_location, {})
516
+
517
+ prompt = f"""
518
+ OBSERVATION:
519
+ {observation}
520
+
521
+ LOCATION:
522
+ {self.current_location}
523
+
524
+ LOCATION MEMORY:
525
+ {current_location_data.get("memory", "None")}
526
+
527
+ OBJECTS_SEEN:
528
+ {list(current_location_data.get("objects_seen", set()))}
529
+
530
+ PROMISING_HINTS:
531
+ {", ".join(current_location_data.get("promising_hints", []))}
532
+
533
+ VALID_ACTIONS:
534
+ {list(current_location_data.get("valid_actions", set()))}
535
+
536
+ ACTIONS ALREADY DONE IN THIS LOCATION:
537
+ {list(current_location_data.get("actions_done", set()))}
538
+ AVOID REPEATING THESE ACTIONS.
539
+
540
+ HINT:
541
+ {self.forced_prompt_hint if hasattr(self, 'forced_prompt_hint') else ""}
542
+ """
543
+ return prompt
544
 
545
+ def _parse_response(self, response: str) -> tuple[str, str, dict]:
546
+ thought = "No reasoning provided"
547
+ tool_name = "play_action"
548
+ tool_args = {"action": "look"}
549
+ lines = response.strip().split("\n")
550
+ for line in lines:
551
+ line_clean = line.strip()
552
+ line_upper = line_clean.upper()
553
+ if line_upper.startswith("THOUGHT:"):
554
+ thought = line_clean.split(":", 1)[1].strip()
555
+ elif line_upper.startswith("TOOL:"):
556
+ raw_tool = line_clean.split(":", 1)[1].strip().lower()
557
+ raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
558
+ raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
559
+ tool_name = raw_tool
560
+ elif line_upper.startswith("ARGS:"):
561
+ args_part = line_clean.split(":", 1)[1].strip()
562
+ try:
563
+ args_part = args_part.replace("'", '"')
564
+ tool_args = json.loads(args_part)
565
+ except json.JSONDecodeError:
566
+ match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
567
+ if match:
568
+ tool_args = {"action": match.group(1)}
569
+ else:
570
+ tool_args = {"action": "look"}
571
+ return thought, tool_name, tool_args
572
+
573
+ def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
574
+ """Robust tool call validator."""
575
+
576
+ # --------------------------------------------------
577
+ # Ensure tool_args is a dictionary (LLM can hallucinate)
578
+ # --------------------------------------------------
579
+ if not isinstance(tool_args, dict):
580
+ tool_args = {}
581
+
582
+ # --------------------------------------------------
583
+ # Normalize tool name
584
+ # --------------------------------------------------
585
+ tool_name = str(tool_name).lower().strip()
586
+
587
+ tool_alias_map = {
588
+ "action": "play_action",
589
+ "do": "play_action",
590
+ "command": "play_action",
591
+ "map": "get_map",
592
+ "location": "get_map",
593
+ "mem": "memory",
594
+ "state": "memory",
595
+ "status": "memory",
596
+ "inv": "inventory",
597
+ "items": "inventory",
598
+ }
599
+
600
+ if tool_name in tool_alias_map:
601
+ tool_name = tool_alias_map[tool_name]
602
+
603
+ if tool_name not in valid_tools:
604
+ tool_name = "play_action"
605
+
606
+ # --------------------------------------------------
607
+ # Fix play_action argument schema
608
+ # --------------------------------------------------
609
+ if tool_name == "play_action":
610
+
611
+ action = tool_args.get("action")
612
+
613
+ if not isinstance(action, str) or not action:
614
+ action = "look"
615
+
616
+ action = action.lower()
617
+
618
+ # Normalize verb aliases
619
+ invalid_verb_map = {
620
+ "check": "examine",
621
+ "inspect": "examine",
622
+ "search": "look",
623
+ "grab": "take",
624
+ "pick": "take",
625
+ "use": "examine",
626
+ "investigate": "examine",
627
+ }
628
+
629
+ words = action.split()
630
+ if words and words[0] in invalid_verb_map:
631
+ words[0] = invalid_verb_map[words[0]]
632
+ action = " ".join(words)
633
+
634
+ # Remove markdown artifacts
635
+ action = action.replace("**", "").replace("*", "").replace("`", "")
636
+
637
+ # Normalize whitespace
638
+ action = " ".join(action.strip().split())
639
+
640
+ tool_args = {"action": action}
641
+
642
+ else:
643
+ # Non-action tools should have empty args
644
+ tool_args = {}
645
+
646
+ return tool_name, tool_args
647
 
648
+ def _extract_result(self, result) -> str:
649
+ if hasattr(result, 'content') and result.content:
650
+ return result.content[0].text
651
+ if isinstance(result, list) and result:
652
+ return result[0].text if hasattr(result[0], 'text') else str(result[0])
653
+ return str(result)
654
+
655
+ def _update_score(self, text: str) -> None:
656
+ patterns = [r'Score:\s*(\d+)', r'score[:\s]+(\d+)', r'\[Score:\s*(\d+)']
657
+ for pattern in patterns:
658
+ match = re.search(pattern, text, re.IGNORECASE)
659
+ if match:
660
+ self.score = max(self.score, int(match.group(1)))
661
+
662
+ def _is_game_over(self, text: str) -> bool:
663
+ phrases = ["game over","you have died","you are dead","*** you have died ***"]
664
+ text_lower = text.lower()
665
+ return any(p in text_lower for p in phrases)
666
 
 
 
 
mcp_server.py CHANGED
@@ -24,77 +24,121 @@ Test your server with:
24
  Then open the MCP Inspector in your browser to test the tools interactively.
25
  """
26
 
 
27
  import sys
28
  import os
 
 
29
 
30
- # Add parent directory to path to import games module
31
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
32
 
33
  from fastmcp import FastMCP
34
  from games.zork_env import TextAdventureEnv
35
 
36
-
37
- # =============================================================================
38
- # Create the MCP Server
39
- # =============================================================================
40
 
41
  mcp = FastMCP("Student Text Adventure Server")
42
 
43
 
44
- # =============================================================================
45
- # Game State Management
46
- # =============================================================================
47
 
48
  class GameManager:
49
- """
50
- Manages the text adventure game state.
51
-
52
- TODO: Extend this class to track:
53
- - Action history (for memory tool)
54
- - Explored locations (for mapping)
55
- - Current score and moves
56
- """
57
-
58
  def __init__(self):
59
- self.env: TextAdventureEnv = None
60
  self.state = None
61
- self.game_name: str = ""
62
- # TODO: Add more state tracking
63
- # self.history: list[tuple[str, str]] = []
64
- # self.explored_locations: dict[str, set[str]] = {}
65
- # self.current_location: str = ""
66
-
67
- def initialize(self, game: str = "zork1"):
68
- """Initialize or reset the game."""
69
- self.game_name = game
 
 
70
  self.env = TextAdventureEnv(game)
71
  self.state = self.env.reset()
72
- # TODO: Reset your state tracking here
73
- return self.state.observation
74
-
75
- def step(self, action: str) -> str:
76
- """Execute an action and return the result."""
77
- if self.env is None:
78
- self.initialize()
79
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  self.state = self.env.step(action)
81
-
82
- # TODO: Update your state tracking here
83
- # self.history.append((action, self.state.observation))
84
- # Update location tracking, etc.
85
-
86
- return self.state.observation
87
-
88
- def get_score(self) -> int:
89
- """Get current score."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  return self.state.score if self.state else 0
91
-
92
- def get_moves(self) -> int:
93
- """Get number of moves taken."""
94
  return self.state.moves if self.state else 0
95
 
96
 
97
- # Global game manager
 
 
 
98
  _game = GameManager()
99
 
100
 
@@ -107,10 +151,9 @@ def get_game() -> GameManager:
107
  _game.initialize(game)
108
  return _game
109
 
110
-
111
- # =============================================================================
112
- # MCP Tools - IMPLEMENT THESE
113
- # =============================================================================
114
 
115
  @mcp.tool()
116
  def play_action(action: str) -> str:
@@ -133,77 +176,107 @@ def play_action(action: str) -> str:
133
  game = get_game()
134
 
135
  # TODO: You might want to add action validation here
136
- # TODO: You might want to include score changes in the response
137
 
 
138
  result = game.step(action)
139
-
140
- # Optional: Append score info
141
- # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
142
-
143
  return result
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
 
146
- # TODO: Implement additional tools to help your agent
147
-
148
- # @mcp.tool()
149
- # def memory() -> str:
150
- # """
151
- # Get the current game state summary.
152
- #
153
- # Returns:
154
- # A summary including current location, score, moves, and recent history
155
- # """
156
- # game = get_game()
157
- # # TODO: Return useful state information
158
- # pass
159
-
160
-
161
- # @mcp.tool()
162
- # def inventory() -> str:
163
- # """
164
- # Check what the player is carrying.
165
- #
166
- # Returns:
167
- # List of items in the player's inventory
168
- # """
169
- # game = get_game()
170
- # result = game.step("inventory")
171
- # return result
172
-
173
-
174
- # @mcp.tool()
175
- # def get_map() -> str:
176
- # """
177
- # Get a map of explored locations.
178
- #
179
- # Returns:
180
- # A text representation of explored locations and connections
181
- # """
182
- # game = get_game()
183
- # # TODO: Return map of explored locations
184
- # pass
185
-
186
-
187
- # @mcp.tool()
188
- # def get_valid_actions() -> str:
189
- # """
190
- # Get a list of likely valid actions from the current location.
191
- #
192
- # Returns:
193
- # List of actions that might work here
194
- # """
195
- # # This is a hint: Jericho provides get_valid_actions()
196
- # game = get_game()
197
- # if game.env and game.env.env:
198
- # valid = game.env.env.get_valid_actions()
199
- # return "Valid actions: " + ", ".join(valid[:20])
200
- # return "Could not determine valid actions"
201
-
202
-
203
- # =============================================================================
204
- # Run the server
205
- # =============================================================================
206
 
207
  if __name__ == "__main__":
208
- # This runs the server with stdio transport (for MCP clients)
209
- mcp.run()
 
24
  Then open the MCP Inspector in your browser to test the tools interactively.
25
  """
26
 
27
+
28
  import sys
29
  import os
30
+ import re
31
+ from utils import is_new_location, extract_location
32
 
 
33
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
34
 
35
  from fastmcp import FastMCP
36
  from games.zork_env import TextAdventureEnv
37
 
38
+ # =========================================================
39
+ # Server Initialization
40
+ # =========================================================
 
41
 
42
  mcp = FastMCP("Student Text Adventure Server")
43
 
44
 
45
+ # =========================================================
46
+ # Game State Manager
47
+ # =========================================================
48
 
49
  class GameManager:
50
+
 
 
 
 
 
 
 
 
51
  def __init__(self):
52
+ self.env: TextAdventureEnv | None = None
53
  self.state = None
54
+
55
+ self.history = []
56
+ self.locations = {}
57
+ self.current_location = ""
58
+
59
+ self.inventory = set()
60
+
61
+ # -----------------------------------------------------
62
+
63
+ def initialize(self, game="zork1"):
64
+
65
  self.env = TextAdventureEnv(game)
66
  self.state = self.env.reset()
67
+
68
+ self.history.clear()
69
+ self.locations.clear()
70
+
71
+ # Initial observation
72
+ self.state = self.env.step("look")
73
+
74
+ obs = self.state.observation
75
+
76
+ self.current_location = extract_location(obs)
77
+
78
+ self.locations[self.current_location] = {
79
+ "objects": set(),
80
+ "actions": set(),
81
+ "directions": set(),
82
+ "observations": set(),
83
+ "summary": ""
84
+ }
85
+
86
+ self.inventory=set()
87
+
88
+ return obs
89
+
90
+ # -----------------------------------------------------
91
+
92
+ def step(self, action: str):
93
+
94
+ if not self.env:
95
+ return "Game not initialized."
96
+
97
  self.state = self.env.step(action)
98
+
99
+ obs = self.state.observation
100
+
101
+ action_lower = action.lower()
102
+
103
+ # Location detection
104
+ if is_new_location(obs, set(self.locations.keys()), "play_action") and action != "inventory":
105
+ previous_location = self.current_location
106
+
107
+ self.current_location = extract_location(obs)
108
+
109
+ self.locations[previous_location]["directions"].add(
110
+ (action_lower, self.current_location)
111
+ )
112
+
113
+ self.locations[self.current_location] = {
114
+ "objects": set(),
115
+ "actions": set(),
116
+ "directions": set(),
117
+ "observations": set(),
118
+ "summary": ""
119
+ }
120
+
121
+ # Track action history (server level only)
122
+ self.history.append((action, obs))
123
+
124
+ if len(self.history) > 20:
125
+ self.history = self.history[-20:]
126
+
127
+ return obs
128
+
129
+ # -----------------------------------------------------
130
+
131
+ def get_score(self):
132
  return self.state.score if self.state else 0
133
+
134
+ def get_moves(self):
 
135
  return self.state.moves if self.state else 0
136
 
137
 
138
+ # =========================================================
139
+ # Global Game Instance
140
+ # =========================================================
141
+
142
  _game = GameManager()
143
 
144
 
 
151
  _game.initialize(game)
152
  return _game
153
 
154
+ # =========================================================
155
+ # Tools (Execution Only)
156
+ # =========================================================
 
157
 
158
  @mcp.tool()
159
  def play_action(action: str) -> str:
 
176
  game = get_game()
177
 
178
  # TODO: You might want to add action validation here
 
179
 
180
+ # Execute the action
181
  result = game.step(action)
182
+
183
+ # TODO: You might want to include score changes in the response
184
+ # Optional: Append score info
185
+ return f"{result}\n\n[Score: {game.get_score()}, Moves: {game.get_moves()}]"
186
  return result
187
+ # ---------------------------------------------------------
188
+
189
+ @mcp.tool()
190
+ def memory(query: str = "") -> str:
191
+ """
192
+ State viewer only.
193
+ No LLM inference.
194
+ """
195
+
196
+ game = get_game()
197
+
198
+ if not game.state:
199
+ return "Game not initialized."
200
+
201
+ loc = game.current_location
202
+ data = game.locations.get(loc, {})
203
+
204
+ return f"""
205
+ STATE
206
+ Location: {loc}
207
+ Score: {game.get_score()}
208
+ Moves: {game.get_moves()}
209
+
210
+ RECENT HISTORY
211
+ {game.history[-10:]}
212
+ """.strip()
213
+
214
+ # ---------------------------------------------------------
215
+
216
+ @mcp.tool()
217
+ def get_map() -> str:
218
+ """
219
+ Exploration graph dump.
220
+ """
221
+
222
+ game = get_game()
223
+
224
+ if not game.locations:
225
+ return "No map discovered."
226
+
227
+ text = "EXPLORED MAP\n"
228
+
229
+ for loc, data in game.locations.items():
230
+
231
+ text += f"\n[{loc}]\n"
232
+
233
+ for direction, dest in data.get("directions", set()):
234
+ text += f" {direction} -> {dest}\n"
235
+
236
+ return text.strip()
237
+
238
+
239
+ # ---------------------------------------------------------
240
+
241
+ @mcp.tool()
242
+ def inventory() -> str:
243
+ """
244
+ Inventory viewer using the game command.
245
+ """
246
+
247
+ game = get_game()
248
+
249
+ if not game.env:
250
+ return "Game not initialized."
251
+
252
+ try:
253
+ state = game.env.step("inventory")
254
+ return state.observation
255
+ except Exception:
256
+ return "Unable to retrieve inventory."
257
+
258
+ # ---------------------------------------------------------
259
+
260
+ @mcp.tool()
261
+ def get_valid_actions() -> str:
262
+ """
263
+ Environment hint helper.
264
+ """
265
+
266
+ game = get_game()
267
+
268
+ if game.env and game.env.env:
269
+
270
+ valid = game.env.env.get_valid_actions()
271
+
272
+ return ", ".join(valid) if valid else "No valid actions."
273
+
274
+ return "Environment not available."
275
 
276
 
277
+ # =========================================================
278
+ # Run Server
279
+ # =========================================================
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
280
 
281
  if __name__ == "__main__":
282
+ mcp.run()
 
requirements.txt CHANGED
@@ -1,9 +1,17 @@
1
- # HF Spaces already has gradio and huggingface_hub pre-installed
2
- # Do not add them here or you may get version conflicts
 
 
3
 
4
- # Agent dependencies (these are provided by the evaluation infrastructure)
5
- # Do not add jericho, fastmcp here - they are installed during evaluation
 
 
6
 
7
- # Add any additional packages your agent needs below:
8
- # numpy
9
- # requests
 
 
 
 
 
1
+ # Core dependencies
2
+ jericho
3
+ python-dotenv
4
+ spacy
5
 
6
+ torch
7
+ spaces
8
+ transformers
9
+ accelerate
10
 
11
+ # MCP Server
12
+ fastmcp
13
+
14
+ # Function calling (optional, for the alternative approach)
15
+ langchain-core
16
+
17
+ huggingface_hub
utils.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from huggingface_hub import InferenceClient
2
+ import os
3
+ from dotenv import load_dotenv
4
+
5
+ load_dotenv()
6
+
7
+ LLM_MODEL = "Qwen/Qwen2.5-7B-Instruct"
8
+
9
+ _hf_token = os.getenv("HF_TOKEN")
10
+ if not _hf_token:
11
+ raise ValueError("HF_TOKEN not found. Set it in your .env file.")
12
+
13
+ LLM_CLIENT = InferenceClient(token=_hf_token)
14
+
15
+ def call_llm(prompt: str, system_prompt: str = "", seed: int = 0, max_tokens: int = 300) -> str:
16
+ messages = []
17
+
18
+ if system_prompt.strip():
19
+ messages.append({"role": "system", "content": system_prompt})
20
+
21
+ messages.append({"role": "user", "content": prompt})
22
+
23
+ response = LLM_CLIENT.chat.completions.create(
24
+ model=LLM_MODEL,
25
+ messages=messages,
26
+ temperature=0.0,
27
+ max_tokens=max_tokens,
28
+ seed=seed,
29
+ )
30
+
31
+ return response.choices[0].message.content
32
+
33
+ def is_new_location(observation: str, known_locations: set, last_tool:str) -> bool:
34
+ if last_tool != "play_action":
35
+ return False
36
+ location = extract_location(observation)
37
+ if location.strip().endswith(('.', '!', '?', ')')) or location in known_locations:
38
+ return False
39
+ return True
40
+
41
+ def extract_location(observation: str) -> str:
42
+ return observation.lower().split("\n")[0].strip()