Valentin Badea commited on
Commit
b113a1e
·
1 Parent(s): 7a36b3c

Implemented memory-driven agent with two-phase LLM approach (Priorization/Summarization)

Browse files
Files changed (5) hide show
  1. .gitignore +15 -0
  2. README.md +27 -6
  3. agent.py +477 -186
  4. mcp_server.py +219 -90
  5. z-machine-games-master +1 -0
.gitignore CHANGED
@@ -20,3 +20,18 @@ venv/
20
  # OS
21
  .DS_Store
22
  Thumbs.db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # OS
21
  .DS_Store
22
  Thumbs.db
23
+
24
+ # game binaries / collections (do not commit)
25
+ z-machine-games-master/
26
+ **/*.z1
27
+ **/*.z2
28
+ **/*.z3
29
+ **/*.z4
30
+ **/*.z5
31
+ **/*.z6
32
+ **/*.z7
33
+ **/*.z8
34
+ **/*.zblorb
35
+ **/*.blb
36
+ **/*.zip
37
+
README.md CHANGED
@@ -14,21 +14,42 @@ license: mit
14
 
15
  ## Overview
16
 
17
- This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
 
 
18
 
19
  ## Approach
20
 
21
- <!-- Describe your approach here -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- - What strategy does your agent use?
24
- - What tools did you implement in your MCP server?
25
- - Any interesting techniques or optimizations?
26
 
27
  ## Files
28
 
29
  | File | Description |
30
  |------|-------------|
31
- | `agent.py` | ReAct agent with `StudentAgent` class |
32
  | `mcp_server.py` | MCP server with game interaction tools |
33
  | `app.py` | Gradio interface for HF Space |
34
  | `requirements.txt` | Additional dependencies |
 
14
 
15
  ## Overview
16
 
17
+ This agent uses a memory-driven architecture with a two-phase LLM approach to systematically explore text adventure games. At each step, the agent leverages Jericho's API to access valid actions and current location data, maintaining a structured memory dictionary that records location-specific information including tried actions, available actions, promising action subsets, and summarized outcomes.
18
+
19
+ The core innovation is the dual LLM call strategy: first for strategic action selection with reasoning over promising action subsets (up to 10), and second for outcome summarization that ensures the agent actively "listens" to and learns from each action result. This approach balances comprehensive exploration with concise memory management, preventing context overflow while maintaining rich historical knowledge.
20
 
21
  ## Approach
22
 
23
+ ### Memory Architecture
24
+
25
+ The agent maintains a location-indexed memory dictionary with the following structure:
26
+ - **valid_actions**: Location-specific actions from Jericho's API (verified to work)
27
+ - **tried_actions**: Set of actions already attempted at this location
28
+ - **promising_actions**: LLM-selected subset (max 10) of strategic actions to consider
29
+ - **results**: For each tried action, stores {observation, summary, success, key_info}
30
+
31
+ This structure allows the agent to return to previously visited locations with full context, enabling informed decision-making even with new inventory or changed game state.
32
+
33
+ ### Two-Phase LLM Strategy
34
+
35
+ **Phase 1 - Action Selection**: The LLM receives current observation, game state (score, moves, inventory), and formatted location memory showing valid actions, previous promising actions, and tried actions with concise summaries (not overwhelming full text). The LLM then identifies up to 10 promising actions from available options and selects the single best action to execute, with explicit reasoning.
36
+
37
+ **Phase 2 - Outcome Summarization**: After action execution, a second LLM call analyzes the full observation and generates a concise 1-2 sentence summary, success classification (yes/no/partial), and key information to remember. This summary is stored in memory, forcing the agent to actively process outcomes rather than passively accumulating raw text.
38
+
39
+ ### Key Strategic Features
40
+
41
+ - **Object-focused exploration**: System prompt emphasizes that examining and interacting with props/objects is often critical for progress, with explicit guidance to try multiple interaction types (examine, take, open, read, push, pull, turn)
42
+ - **Movement tracking**: Detects object movements in observations and provides hints to follow them
43
+ - **Stagnation detection**: Monitors score progress and warns when exploration becomes circular
44
+ - **Context preservation**: Full observations archived for debugging while summaries keep prompts manageable
45
+ - **Dynamic re-evaluation**: Always recalculates promising actions on location revisits, accounting for changed context (new items, completed objectives)
46
 
 
 
 
47
 
48
  ## Files
49
 
50
  | File | Description |
51
  |------|-------------|
52
+ | `agent.py` | Memory-driven agent with two-phase LLM approach |
53
  | `mcp_server.py` | MCP server with game interaction tools |
54
  | `app.py` | Gradio interface for HF Space |
55
  | `requirements.txt` | Additional dependencies |
agent.py CHANGED
@@ -1,48 +1,32 @@
1
  """
2
- Student Agent for Text Adventure Games
3
-
4
- This is your submission file. Implement the StudentAgent class to play
5
- text adventure games using the MCP server you also implement.
6
-
7
- Your agent should:
8
- 1. Connect to the MCP server via the provided client
9
- 2. Use the ReAct pattern (Thought -> Action -> Observation)
10
- 3. Call MCP tools to interact with the game
11
- 4. Maximize the game score within the step limit
12
-
13
- Required method:
14
- async def run(self, client, game, max_steps, seed, verbose) -> RunResult
15
-
16
- The 'client' is a FastMCP Client already connected to your MCP server.
17
- Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
18
-
19
- Tips:
20
- - Start by looking around and understanding your environment
21
- - Keep track of visited locations to avoid loops
22
- - Pick up useful items (lamp, sword, etc.)
23
- - The seed parameter should be used to set your LLM's seed for reproducibility
24
  """
25
 
26
  import json
27
  import os
28
  import re
29
  from dataclasses import dataclass, field
30
- from typing import Optional
31
 
32
  from dotenv import load_dotenv
33
  from huggingface_hub import InferenceClient
34
 
35
- # Load environment variables
36
  load_dotenv()
37
 
38
  # =============================================================================
39
  # LLM Configuration - DO NOT MODIFY
40
  # =============================================================================
41
 
42
- # Model to use (fixed for fair evaluation)
43
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
44
 
45
- # Initialize the LLM client (uses HF_TOKEN from environment)
46
  _hf_token = os.getenv("HF_TOKEN")
47
  if not _hf_token:
48
  raise ValueError("HF_TOKEN not found. Set it in your .env file.")
@@ -50,45 +34,25 @@ if not _hf_token:
50
  LLM_CLIENT = InferenceClient(token=_hf_token)
51
 
52
 
53
- def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
54
- """
55
- Call the LLM with the given prompt. Use this function in your agent.
56
-
57
- Args:
58
- prompt: The user prompt (current game state, history, etc.)
59
- system_prompt: The system prompt (instructions for the agent)
60
- seed: Random seed for reproducibility
61
- max_tokens: Maximum tokens in response (default: 300)
62
-
63
- Returns:
64
- The LLM's response text
65
-
66
- Example:
67
- response = call_llm(
68
- prompt="You are in a forest. What do you do?",
69
- system_prompt=SYSTEM_PROMPT,
70
- seed=42,
71
- )
72
- """
73
  messages = [
74
  {"role": "system", "content": system_prompt},
75
  {"role": "user", "content": prompt},
76
  ]
77
-
78
  response = LLM_CLIENT.chat.completions.create(
79
  model=LLM_MODEL,
80
  messages=messages,
81
- temperature=0.0, # Deterministic for reproducibility
82
  max_tokens=max_tokens,
83
  seed=seed,
84
  )
85
-
86
  return response.choices[0].message.content
87
 
88
 
89
  @dataclass
90
  class RunResult:
91
- """Result of running the agent. Do not modify this class."""
92
  final_score: int
93
  max_score: int
94
  moves: int
@@ -99,152 +63,482 @@ class RunResult:
99
 
100
 
101
  # =============================================================================
102
- # System Prompt - Customize this for your agent
103
  # =============================================================================
104
 
105
- SYSTEM_PROMPT = """You are playing a classic text adventure game.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
- GOAL: Explore the world, solve puzzles, and maximize your score.
 
 
 
 
108
 
109
- AVAILABLE TOOLS (use via MCP):
110
- - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
111
- - memory: Get current game state and history (if implemented)
112
- - inventory: Check what you're carrying (if implemented)
 
 
113
 
114
- VALID GAME COMMANDS for play_action:
115
- - Movement: north, south, east, west, up, down, enter, exit
116
- - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
117
- - Other: look, inventory, read <thing>, turn on lamp
 
 
 
 
 
 
 
 
 
 
 
118
 
119
  RESPOND IN THIS EXACT FORMAT (no markdown):
120
- THOUGHT: <your reasoning about what to do next>
121
- TOOL: <tool_name>
122
- ARGS: <JSON arguments, e.g., {"action": "look"}>
123
 
124
- Example:
125
- THOUGHT: I should look around to see where I am.
126
- TOOL: play_action
127
- ARGS: {"action": "look"}
 
 
 
128
  """
129
 
130
 
131
  # =============================================================================
132
- # Student Agent - IMPLEMENT THIS CLASS
133
  # =============================================================================
134
 
135
  class StudentAgent:
136
- """
137
- Your ReAct agent implementation.
138
-
139
- TODO:
140
- 1. Implement the run() method with the ReAct loop
141
- 2. Parse LLM responses to extract tool calls
142
- 3. Track state and avoid loops
143
-
144
- Use the provided call_llm() function to interact with the LLM.
145
- """
146
-
147
  def __init__(self):
148
- """Initialize your agent here."""
149
- # TODO: Initialize any state tracking you need
150
- # self.history = []
151
- # self.visited_locations = set()
152
- pass
153
-
154
- async def run(
155
- self,
156
- client, # FastMCP Client connected to your MCP server
157
- game: str,
158
- max_steps: int,
159
- seed: int,
160
- verbose: bool = False,
161
- ) -> RunResult:
162
- """
163
- Run the agent for a game session.
164
-
165
- Args:
166
- client: FastMCP Client connected to your MCP server
167
- game: Name of the game being played (e.g., "zork1")
168
- max_steps: Maximum number of steps to take
169
- seed: Random seed for reproducibility (use for LLM calls)
170
- verbose: Whether to print detailed output
171
-
172
- Returns:
173
- RunResult with final score and statistics
174
- """
175
- # TODO: Implement your ReAct loop here
176
- #
177
- # Basic structure:
178
- # 1. Get initial observation (call play_action with "look")
179
- # 2. Loop for max_steps:
180
- # a. Build prompt with current observation and history
181
- # b. Call LLM to get thought and action
182
- # c. Parse the response to extract tool and args
183
- # d. Call the tool via client.call_tool(tool_name, args)
184
- # e. Update history and state
185
- # f. Check for game over
186
- # 3. Return RunResult with final statistics
187
-
188
- # Example of calling a tool:
189
- # result = await client.call_tool("play_action", {"action": "look"})
190
- # observation = result[0].text if result else "No response"
191
-
192
- # Example of calling the LLM:
193
- # response = call_llm(
194
- # prompt="Current observation: " + observation,
195
- # system_prompt=SYSTEM_PROMPT,
196
- # seed=seed,
197
- # )
198
-
199
- # Placeholder implementation - replace with your code
200
- locations_visited = set()
201
- history = []
202
- final_score = 0
203
- moves = 0
204
-
205
- # TODO: Your implementation here
206
- # ...
207
-
208
- return RunResult(
209
- final_score=final_score,
210
- max_score=350, # Zork1 max score, adjust if needed
211
- moves=moves,
212
- locations_visited=locations_visited,
213
- game_completed=False,
214
- history=history,
215
- )
216
-
217
- def _build_prompt(self, observation: str, history: list) -> str:
218
- """
219
- Build the prompt for the LLM.
220
-
221
- TODO: Implement this to create effective prompts
222
- """
223
- # TODO: Combine system prompt, history, and current observation
224
- pass
225
-
226
- def _parse_response(self, response: str) -> tuple[str, str, dict]:
227
  """
228
- Parse LLM response to extract thought, tool name, and arguments.
229
-
230
- TODO: Implement robust parsing
231
-
232
- Returns:
233
- Tuple of (thought, tool_name, args_dict)
234
  """
235
- # TODO: Parse the response format:
236
- # THOUGHT: ...
237
- # TOOL: ...
238
- # ARGS: {...}
239
- pass
240
-
241
- def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  """
243
- Call the LLM with the given prompt.
244
-
245
- This is a convenience wrapper - you can also use call_llm() directly.
246
  """
247
- return call_llm(prompt, system_prompt, seed)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
 
249
 
250
  # =============================================================================
@@ -254,24 +548,21 @@ class StudentAgent:
254
  async def test_agent():
255
  """Test the agent locally."""
256
  from fastmcp import Client
257
-
258
- # Path to your MCP server
259
- server_path = "mcp_server.py"
260
-
261
  agent = StudentAgent()
262
-
263
- async with Client(server_path) as client:
264
  result = await agent.run(
265
  client=client,
266
- game="zork1",
267
- max_steps=10,
268
  seed=42,
269
  verbose=True,
270
  )
271
-
272
  print(f"\nFinal Score: {result.final_score}")
273
  print(f"Moves: {result.moves}")
274
- print(f"Locations: {result.locations_visited}")
275
 
276
 
277
  if __name__ == "__main__":
 
1
  """
2
+ Memory-driven agent with two-phase LLM approach:
3
+ 1. Action Selection: Choose promising actions (max 10) and pick best one
4
+ 2. Outcome Summarization: Summarize action result for memory storage
5
+
6
+ Strategy:
7
+ - Location memory tracks: valid_actions, tried_actions, promising_actions, results
8
+ - Results store: {observation, summary, success, key_info} for each action
9
+ - Agent sees concise summaries in context, not overwhelming full observations
10
+ - Forces agent to "listen" by summarizing outcomes
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  """
12
 
13
  import json
14
  import os
15
  import re
16
  from dataclasses import dataclass, field
17
+ from typing import Optional, Dict, Set, Any
18
 
19
  from dotenv import load_dotenv
20
  from huggingface_hub import InferenceClient
21
 
 
22
  load_dotenv()
23
 
24
  # =============================================================================
25
  # LLM Configuration - DO NOT MODIFY
26
  # =============================================================================
27
 
 
28
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
29
 
 
30
  _hf_token = os.getenv("HF_TOKEN")
31
  if not _hf_token:
32
  raise ValueError("HF_TOKEN not found. Set it in your .env file.")
 
34
  LLM_CLIENT = InferenceClient(token=_hf_token)
35
 
36
 
37
+ def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 512) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  messages = [
39
  {"role": "system", "content": system_prompt},
40
  {"role": "user", "content": prompt},
41
  ]
42
+
43
  response = LLM_CLIENT.chat.completions.create(
44
  model=LLM_MODEL,
45
  messages=messages,
46
+ temperature=0.0,
47
  max_tokens=max_tokens,
48
  seed=seed,
49
  )
50
+
51
  return response.choices[0].message.content
52
 
53
 
54
  @dataclass
55
  class RunResult:
 
56
  final_score: int
57
  max_score: int
58
  moves: int
 
63
 
64
 
65
  # =============================================================================
66
+ # System Prompts - Two Phase Approach
67
  # =============================================================================
68
 
69
+ ACTION_SELECTION_SYSTEM_PROMPT = """You are playing a text adventure game to maximize score.
70
+
71
+ GOAL: Explore systematically, solve puzzles, collect items, and maximize score.
72
+
73
+ YOU WILL RECEIVE:
74
+ - Current observation
75
+ - Location memory with:
76
+ * VALID ACTIONS (from game engine - verified to work at this location)
77
+ * TRIED ACTIONS with summaries of outcomes (concise, LLM-generated)
78
+ * Previous promising actions (if you've been here before)
79
+
80
+ YOUR TASK - ACTION SELECTION:
81
+ 1. Analyze the valid actions available
82
+ 2. Consider which actions you've already tried and their outcomes
83
+ 3. Identify up to 10 PROMISING ACTIONS from available options
84
+ 4. Choose the BEST action to try next
85
+
86
+ STRATEGY GUIDELINES:
87
+ - Prioritize untried actions from the valid list
88
+ - **Objects are key**: When uncertain about next steps, examining or interacting with props/objects in the scene is often critical for progress
89
+ - Pick up valuable items: lamp, lantern, torch, sword, keys, treasures, tools
90
+ - Make sure you have a light sources before entering dark areas
91
+ - Examine objects, open containers, read signs, search rooms thoroughly
92
+ - Try multiple interactions with same object (examine, take, open, read, push, pull, turn)
93
+ - If stagnating (no progress), try different location or different action type
94
+ - Learn from previous outcomes: avoid repeating failures unless context changed
95
+ - If "not see that there": object moved - explore elsewhere
96
+ - Follow object movements mentioned in observations
97
 
98
+ RESPOND IN THIS EXACT FORMAT (no markdown):
99
+ THOUGHT: <your strategic reasoning>
100
+ PROMISING_ACTIONS: <JSON array of up to 10 promising actions to consider>
101
+ CHOSEN_ACTION: <the single best action to execute>
102
+ REASONING: <why this specific action is best>
103
 
104
+ Example:
105
+ THOUGHT: I'm in a dark area and need light. Valid actions show "turn on lamp". I haven't tried this yet.
106
+ PROMISING_ACTIONS: ["turn on lamp", "examine lamp", "go back", "look", "inventory"]
107
+ CHOSEN_ACTION: turn on lamp
108
+ REASONING: Lamp is critical for exploring dark areas. Should activate it before moving forward.
109
+ """
110
 
111
+ OUTCOME_SUMMARY_SYSTEM_PROMPT = """You are analyzing the outcome of an action in a text adventure game.
112
+
113
+ YOUR TASK - OUTCOME SUMMARIZATION:
114
+ Given an action and its observation result, create a concise summary that captures:
115
+ 1. What happened (1-2 sentences max)
116
+ 2. Whether it succeeded, partially succeeded, or failed
117
+ 3. Key information to remember for future decisions
118
+
119
+ Be CONCISE but capture critical details like:
120
+ - Items acquired/lost
121
+ - New areas discovered
122
+ - Obstacles encountered
123
+ - Score changes
124
+ - Object movements
125
+ - State changes (doors opened, lights turned on, etc.)
126
 
127
  RESPOND IN THIS EXACT FORMAT (no markdown):
128
+ OUTCOME_SUMMARY: <1-2 sentence summary>
129
+ SUCCESS: <yes/no/partial>
130
+ KEY_INFO: <key detail to remember>
131
 
132
+ Example action: "take lamp"
133
+ Example observation: "Taken. The brass lamp is now in your inventory. [Score: 5 | Moves: 3]"
134
+
135
+ Example response:
136
+ OUTCOME_SUMMARY: Successfully picked up the brass lamp and added it to inventory.
137
+ SUCCESS: yes
138
+ KEY_INFO: Lamp acquired - can use for dark areas
139
  """
140
 
141
 
142
  # =============================================================================
143
+ # Student Agent with Two-Phase LLM Approach
144
  # =============================================================================
145
 
146
  class StudentAgent:
 
 
 
 
 
 
 
 
 
 
 
147
  def __init__(self):
148
+ self.score = 0
149
+ self.moves = 0
150
+
151
+ # Enhanced location memory
152
+ self.location_memory: Dict[str, Dict[str, Any]] = {}
153
+ # Structure: {
154
+ # "Location Name": {
155
+ # "valid_actions": [...],
156
+ # "tried_actions": set(),
157
+ # "promising_actions": [...],
158
+ # "visited": count,
159
+ # "results": {
160
+ # "action": {
161
+ # "observation": "full text",
162
+ # "summary": "concise summary",
163
+ # "success": "yes/no/partial",
164
+ # "key_info": "important detail"
165
+ # }
166
+ # }
167
+ # }
168
+ # }
169
+
170
+ # Global tracking
171
+ self.locations_visited: set[str] = set()
172
+ self.inventory_items: set[str] = set()
173
+
174
+ # Stagnation detection
175
+ self.last_score_change_move: int = 0
176
+
177
+ # MCP client handle
178
+ self.env_handle = None
179
+
180
+ def _extract_result(self, result) -> str:
181
+ """Extract text from MCP tool result."""
182
+ if hasattr(result, "content") and result.content:
183
+ return result.content[0].text
184
+ if isinstance(result, list) and result:
185
+ return result[0].text if hasattr(result[0], "text") else str(result[0])
186
+ return str(result)
187
+
188
+ def _parse_action_selection(self, response: str) -> tuple[str, list[str], str]:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
  """
190
+ Parse action selection response.
191
+ Returns: (thought, promising_actions, chosen_action)
 
 
 
 
192
  """
193
+ thought = "Proceed with exploration"
194
+ promising_actions = []
195
+ chosen_action = "look"
196
+ reasoning = ""
197
+
198
+ for line in (response or "").splitlines():
199
+ line_stripped = line.strip()
200
+ line_upper = line_stripped.upper()
201
+
202
+ if line_upper.startswith("THOUGHT:"):
203
+ thought = line_stripped.split(":", 1)[1].strip()
204
+ elif line_upper.startswith("PROMISING_ACTIONS:"):
205
+ actions_str = line_stripped.split(":", 1)[1].strip()
206
+ try:
207
+ # Try to parse as JSON array
208
+ promising_actions = json.loads(actions_str)
209
+ if not isinstance(promising_actions, list):
210
+ promising_actions = []
211
+ except:
212
+ # Fallback: comma-separated
213
+ promising_actions = [a.strip().strip('"\'') for a in actions_str.split(",")]
214
+ promising_actions = [a for a in promising_actions if a][:10]
215
+
216
+ elif line_upper.startswith("CHOSEN_ACTION:"):
217
+ chosen_action = line_stripped.split(":", 1)[1].strip()
218
+ chosen_action = chosen_action.strip('"\'').strip()
219
+ elif line_upper.startswith("REASONING:"):
220
+ reasoning = line_stripped.split(":", 1)[1].strip()
221
+
222
+ # Ensure we have at least something
223
+ if not chosen_action or chosen_action == "":
224
+ chosen_action = "look"
225
+
226
+ return thought, promising_actions, chosen_action
227
+
228
+ def _parse_outcome_summary(self, response: str) -> tuple[str, str, str]:
229
  """
230
+ Parse outcome summary response.
231
+ Returns: (summary, success, key_info)
 
232
  """
233
+ summary = "Action executed"
234
+ success = "unknown"
235
+ key_info = ""
236
+
237
+ for line in (response or "").splitlines():
238
+ line_stripped = line.strip()
239
+ line_upper = line_stripped.upper()
240
+
241
+ if line_upper.startswith("OUTCOME_SUMMARY:"):
242
+ summary = line_stripped.split(":", 1)[1].strip()
243
+ elif line_upper.startswith("SUCCESS:"):
244
+ success = line_stripped.split(":", 1)[1].strip().lower()
245
+ elif line_upper.startswith("KEY_INFO:"):
246
+ key_info = line_stripped.split(":", 1)[1].strip()
247
+
248
+ return summary, success, key_info
249
+
250
+ def _update_score_moves(self, obs: str) -> None:
251
+ """Extract score and moves from observation."""
252
+ m = re.search(r"\[Score:\s*(\d+)\s*\|\s*Moves:\s*(\d+)\]", obs)
253
+ if m:
254
+ new_score = int(m.group(1))
255
+ if new_score > self.score:
256
+ self.last_score_change_move = int(m.group(2))
257
+ self.score = max(self.score, new_score)
258
+ self.moves = max(self.moves, int(m.group(2)))
259
+
260
+ def _is_game_over(self, obs: str) -> bool:
261
+ t = (obs or "").lower()
262
+ return any(p in t for p in ["game over", "you have died", "you are dead", "*** you have died ***"])
263
+
264
+ async def _get_current_location_name(self, client) -> str:
265
+ """Get current location name from Jericho's get_player_location()."""
266
+ try:
267
+ res = await client.call_tool("get_player_location", {})
268
+ loc_info = self._extract_result(res)
269
+ return loc_info.strip()
270
+ except Exception as e:
271
+ return "Unknown"
272
+
273
+ async def _get_valid_actions_for_location(self, client) -> list[str]:
274
+ """Get valid actions from Jericho API."""
275
+ try:
276
+ res = await client.call_tool("get_valid_actions", {})
277
+ actions_str = self._extract_result(res)
278
+
279
+ if actions_str.startswith("{"):
280
+ data = json.loads(actions_str)
281
+ return data.get("actions", [])
282
+
283
+ return [a.strip() for a in actions_str.split(",") if a.strip()]
284
+ except Exception:
285
+ return []
286
+
287
+ def _format_location_memory_for_action_selection(self, loc_name: str) -> str:
288
+ """Format location memory for action selection prompt."""
289
+ if loc_name not in self.location_memory:
290
+ return "=== LOCATION MEMORY: First visit - no memory yet ===\n"
291
+
292
+ mem = self.location_memory[loc_name]
293
+ visit_count = mem["visited"]
294
+ valid_actions = mem["valid_actions"]
295
+ tried_actions = mem["tried_actions"]
296
+ results = mem["results"]
297
+ promising_actions = mem.get("promising_actions", [])
298
+
299
+ parts = [f"=== LOCATION MEMORY: {loc_name} (visited {visit_count} times) ==="]
300
+
301
+ # Valid actions from game engine
302
+ parts.append(f"\nVALID ACTIONS ({len(valid_actions)} available):")
303
+ parts.append(f"{', '.join(sorted(valid_actions))}")
304
+ parts.append("NOTE: These are location-specific actions verified by game engine.")
305
+ parts.append("Universal commands (look, inventory, wait) also work but aren't listed here.")
306
+
307
+ # Previous promising actions (if any)
308
+ if promising_actions:
309
+ parts.append(f"\nPREVIOUS PROMISING ACTIONS:")
310
+ parts.append(f"{', '.join(promising_actions)}")
311
+
312
+ # Tried actions with SUMMARIES (not full observations)
313
+ if results:
314
+ parts.append(f"\nTRIED ACTIONS ({len(tried_actions)} total):")
315
+ # Show most recent 8 with summaries
316
+ for action, action_result in list(results.items())[-8:]:
317
+ summary = action_result.get("summary", "No summary")
318
+ success = action_result.get("success", "unknown")
319
+ key_info = action_result.get("key_info", "")
320
+
321
+ status_icon = "+" if success == "yes" else "-" if success == "no" else "~"
322
+ parts.append(f" [{status_icon}] {action}")
323
+ parts.append(f" → {summary}")
324
+ if key_info:
325
+ parts.append(f" [*] {key_info}")
326
+ else:
327
+ parts.append("\nTRIED ACTIONS: (none yet at this location)")
328
+
329
+ return "\n".join(parts)
330
+
331
+ def _build_action_selection_prompt(self, observation: str, current_location: str) -> str:
332
+ """Build prompt for Phase 1: Action Selection."""
333
+ parts = []
334
+
335
+ # Game state
336
+ parts.append(f"=== GAME STATE ===")
337
+ parts.append(f"Score: {self.score} | Moves: {self.moves}")
338
+ parts.append(f"Locations visited: {len(self.locations_visited)}")
339
+
340
+ if self.inventory_items:
341
+ parts.append(f"Inventory: {', '.join(sorted(self.inventory_items))}")
342
+
343
+ # Location memory (key context)
344
+ parts.append("\n" + self._format_location_memory_for_action_selection(current_location))
345
+
346
+ # Current observation
347
+ parts.append(f"\n=== CURRENT OBSERVATION ===")
348
+ parts.append(observation)
349
+
350
+ # Add strategic hints based on game state
351
+ hints = []
352
+
353
+ # Stagnation warning
354
+ moves_since_progress = self.moves - self.last_score_change_move
355
+ if moves_since_progress > 10:
356
+ hints.append(f"[!] No score progress in {moves_since_progress} moves!")
357
+ hints.append(" Consider: exploring new locations or trying different action types")
358
+
359
+ # Check for "not see that there" patterns
360
+ if current_location in self.location_memory:
361
+ mem = self.location_memory[current_location]
362
+ recent_failures = sum(1 for action, result in list(mem["results"].items())[-3:]
363
+ if "not see that there" in result.get("observation", "").lower())
364
+ if recent_failures >= 2:
365
+ hints.append("[!] Multiple 'not see that there' errors - object likely moved elsewhere!")
366
+
367
+ # Check for object movement in observation
368
+ obs_lower = observation.lower()
369
+ if any(phrase in obs_lower for phrase in ["ran to", "run to", "went to", "moved to"]):
370
+ hints.append("[+] Object movement detected in observation - consider following it!")
371
+
372
+ if hints:
373
+ parts.append("\n=== STRATEGIC HINTS ===")
374
+ parts.extend(hints)
375
+
376
+ parts.append("\n=== YOUR TASK ===")
377
+ parts.append("Select up to 10 promising actions and choose the best one to execute.")
378
+
379
+ return "\n".join(parts)
380
+
381
+ def _build_outcome_summary_prompt(self, action: str, observation: str) -> str:
382
+ """Build prompt for Phase 2: Outcome Summarization."""
383
+ return f"""Action executed: "{action}"
384
+
385
+ Observation received:
386
+ {observation}
387
+
388
+ Analyze this outcome and provide a concise summary."""
389
+
390
+ async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = False) -> RunResult:
391
+ result_history: list[tuple[str, str, str]] = []
392
+ moves_used = 0
393
+
394
+ # Discover available tools
395
+ tools = await client.list_tools()
396
+ tool_names = {t.name for t in tools}
397
+
398
+ # Initial look
399
+ try:
400
+ res = await client.call_tool("play_action", {"action": "look"})
401
+ obs = self._extract_result(res)
402
+ moves_used += 1
403
+ self._update_score_moves(obs)
404
+ except Exception as e:
405
+ return RunResult(self.score, 350, moves_used, self.locations_visited, False, error=str(e), history=result_history)
406
+
407
+ if verbose:
408
+ print(f"\n{obs}")
409
+
410
+ # Initial inventory check
411
+ if "inventory" in tool_names:
412
+ try:
413
+ inv_res = await client.call_tool("inventory", {})
414
+ inv_text = self._extract_result(inv_res).lower()
415
+ moves_used += 1
416
+
417
+ for item in ["torch", "lamp", "lantern", "sword", "key"]:
418
+ if item in inv_text:
419
+ self.inventory_items.add(item)
420
+
421
+ if verbose and self.inventory_items:
422
+ print(f"[Starting inventory: {', '.join(self.inventory_items)}]")
423
+ except:
424
+ pass
425
+
426
+ # Main game loop
427
+ for step in range(1, max_steps - moves_used + 1):
428
+ # Get current location
429
+ current_location = await self._get_current_location_name(client)
430
+ self.locations_visited.add(current_location)
431
+
432
+ # Initialize or update location memory
433
+ if current_location not in self.location_memory:
434
+ valid_actions = await self._get_valid_actions_for_location(client)
435
+
436
+ self.location_memory[current_location] = {
437
+ "tried_actions": set(),
438
+ "valid_actions": valid_actions,
439
+ "promising_actions": [],
440
+ "visited": 1,
441
+ "results": {}
442
+ }
443
+
444
+ if verbose:
445
+ print(f"\n[New location: {current_location}]")
446
+ print(f"[Valid actions: {len(valid_actions)}]")
447
+ else:
448
+ self.location_memory[current_location]["visited"] += 1
449
+
450
+ # ========================================================
451
+ # PHASE 1: ACTION SELECTION (LLM Call #1)
452
+ # ========================================================
453
+
454
+ prompt1 = self._build_action_selection_prompt(obs, current_location)
455
+ llm_response1 = call_llm(prompt1, ACTION_SELECTION_SYSTEM_PROMPT, seed + step, max_tokens=512)
456
+
457
+ thought, promising_actions, chosen_action = self._parse_action_selection(llm_response1)
458
+
459
+ # Store promising actions in memory
460
+ self.location_memory[current_location]["promising_actions"] = promising_actions
461
+
462
+ if verbose:
463
+ print(f"\n--- Step {step} ---")
464
+ print(f"THOUGHT: {thought}")
465
+ print(f"PROMISING: {promising_actions}")
466
+ print(f"CHOSEN: {chosen_action}")
467
+
468
+ # ========================================================
469
+ # EXECUTE ACTION
470
+ # ========================================================
471
+
472
+ try:
473
+ res = await client.call_tool("play_action", {"action": chosen_action})
474
+ obs = self._extract_result(res)
475
+ moves_used += 1
476
+
477
+ # Track score changes
478
+ old_score = self.score
479
+ self._update_score_moves(obs)
480
+ if self.score > old_score:
481
+ self.last_score_change_move = moves_used
482
+
483
+ if verbose:
484
+ print(f"Observation: {obs}...")
485
+
486
+ # ========================================================
487
+ # PHASE 2: OUTCOME SUMMARIZATION (LLM Call #2)
488
+ # ========================================================
489
+
490
+ prompt2 = self._build_outcome_summary_prompt(chosen_action, obs)
491
+ llm_response2 = call_llm(prompt2, OUTCOME_SUMMARY_SYSTEM_PROMPT, seed + step + 10000, max_tokens=256)
492
+
493
+ summary, success, key_info = self._parse_outcome_summary(llm_response2)
494
+
495
+ if verbose:
496
+ print(f"SUMMARY: {summary}")
497
+ print(f"SUCCESS: {success}")
498
+ if key_info:
499
+ print(f"KEY_INFO: {key_info}")
500
+
501
+ # ========================================================
502
+ # UPDATE MEMORY with summarized outcome
503
+ # ========================================================
504
+
505
+ mem = self.location_memory[current_location]
506
+ mem["tried_actions"].add(chosen_action)
507
+ mem["results"][chosen_action] = {
508
+ "observation": obs, # Full text preserved
509
+ "summary": summary, # LLM-generated summary
510
+ "success": success, # yes/no/partial
511
+ "key_info": key_info # Important detail
512
+ }
513
+
514
+ # Update inventory tracking
515
+ if "take" in chosen_action.lower() and success == "yes":
516
+ words = chosen_action.split()
517
+ if len(words) >= 2:
518
+ item = words[-1]
519
+ self.inventory_items.add(item)
520
+
521
+ # Record in history
522
+ result_history.append((thought, f"play_action({chosen_action})", obs))
523
+
524
+ if self._is_game_over(obs):
525
+ break
526
+
527
+ except Exception as e:
528
+ result_history.append((thought, f"play_action({chosen_action})", f"Error: {e}"))
529
+ return RunResult(self.score, 350, moves_used, self.locations_visited, False, error=str(e), history=result_history)
530
+
531
+ if moves_used >= max_steps:
532
+ break
533
+
534
+ return RunResult(
535
+ final_score=self.score,
536
+ max_score=350,
537
+ moves=moves_used,
538
+ locations_visited=self.locations_visited,
539
+ game_completed=self._is_game_over(obs),
540
+ history=result_history,
541
+ )
542
 
543
 
544
  # =============================================================================
 
548
  async def test_agent():
549
  """Test the agent locally."""
550
  from fastmcp import Client
551
+
 
 
 
552
  agent = StudentAgent()
553
+
554
+ async with Client("mcp_server.py") as client:
555
  result = await agent.run(
556
  client=client,
557
+ game="lostpig",
558
+ max_steps=50,
559
  seed=42,
560
  verbose=True,
561
  )
562
+
563
  print(f"\nFinal Score: {result.final_score}")
564
  print(f"Moves: {result.moves}")
565
+ print(f"Locations: {len(result.locations_visited)}")
566
 
567
 
568
  if __name__ == "__main__":
mcp_server.py CHANGED
@@ -49,41 +49,67 @@ class GameManager:
49
  """
50
  Manages the text adventure game state.
51
 
52
- TODO: Extend this class to track:
53
- - Action history (for memory tool)
54
- - Explored locations (for mapping)
55
  - Current score and moves
 
 
 
56
  """
57
 
58
  def __init__(self):
59
  self.env: TextAdventureEnv = None
60
  self.state = None
61
  self.game_name: str = ""
62
- # TODO: Add more state tracking
63
- # self.history: list[tuple[str, str]] = []
64
- # self.explored_locations: dict[str, set[str]] = {}
65
- # self.current_location: str = ""
66
 
67
  def initialize(self, game: str = "zork1"):
68
  """Initialize or reset the game."""
69
  self.game_name = game
70
  self.env = TextAdventureEnv(game)
71
  self.state = self.env.reset()
72
- # TODO: Reset your state tracking here
 
 
 
73
  return self.state.observation
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  def step(self, action: str) -> str:
76
  """Execute an action and return the result."""
77
  if self.env is None:
78
  self.initialize()
79
-
80
  self.state = self.env.step(action)
 
 
 
 
 
 
 
 
 
81
 
82
- # TODO: Update your state tracking here
83
- # self.history.append((action, self.state.observation))
84
- # Update location tracking, etc.
85
-
86
- return self.state.observation
87
 
88
  def get_score(self) -> int:
89
  """Get current score."""
@@ -97,17 +123,6 @@ class GameManager:
97
  # Global game manager
98
  _game = GameManager()
99
 
100
-
101
- def get_game() -> GameManager:
102
- """Get or initialize the game manager."""
103
- global _game
104
- if _game.env is None:
105
- # Get game from environment variable (set by evaluator)
106
- game = os.environ.get("GAME", "zork1")
107
- _game.initialize(game)
108
- return _game
109
-
110
-
111
  # =============================================================================
112
  # MCP Tools - IMPLEMENT THESE
113
  # =============================================================================
@@ -132,72 +147,186 @@ def play_action(action: str) -> str:
132
  """
133
  game = get_game()
134
 
135
- # TODO: You might want to add action validation here
136
- # TODO: You might want to include score changes in the response
137
-
138
- result = game.step(action)
139
-
140
- # Optional: Append score info
141
- # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
- return result
144
-
145
-
146
- # TODO: Implement additional tools to help your agent
147
-
148
- # @mcp.tool()
149
- # def memory() -> str:
150
- # """
151
- # Get the current game state summary.
152
- #
153
- # Returns:
154
- # A summary including current location, score, moves, and recent history
155
- # """
156
- # game = get_game()
157
- # # TODO: Return useful state information
158
- # pass
159
-
160
-
161
- # @mcp.tool()
162
- # def inventory() -> str:
163
- # """
164
- # Check what the player is carrying.
165
- #
166
- # Returns:
167
- # List of items in the player's inventory
168
- # """
169
- # game = get_game()
170
- # result = game.step("inventory")
171
- # return result
172
-
173
-
174
- # @mcp.tool()
175
- # def get_map() -> str:
176
- # """
177
- # Get a map of explored locations.
178
- #
179
- # Returns:
180
- # A text representation of explored locations and connections
181
- # """
182
- # game = get_game()
183
- # # TODO: Return map of explored locations
184
- # pass
185
-
186
-
187
- # @mcp.tool()
188
- # def get_valid_actions() -> str:
189
- # """
190
- # Get a list of likely valid actions from the current location.
191
- #
192
- # Returns:
193
- # List of actions that might work here
194
- # """
195
- # # This is a hint: Jericho provides get_valid_actions()
196
- # game = get_game()
197
- # if game.env and game.env.env:
198
- # valid = game.env.env.get_valid_actions()
199
- # return "Valid actions: " + ", ".join(valid[:20])
200
- # return "Could not determine valid actions"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
 
202
 
203
  # =============================================================================
 
49
  """
50
  Manages the text adventure game state.
51
 
52
+ Tracks:
53
+ - Action history (for memory tool, though agent manages its own memory)
54
+ - Current location (using Jericho API)
55
  - Current score and moves
56
+
57
+ Note: The agent handles its own location_memory system and doesn't rely
58
+ on server-side tracking beyond the core MCP tools.
59
  """
60
 
61
  def __init__(self):
62
  self.env: TextAdventureEnv = None
63
  self.state = None
64
  self.game_name: str = ""
65
+ # State tracking (for optional tools - agent manages its own memory)
66
+ self.history: list[tuple[str, str]] = [] # (action, observation)
67
+ self.current_location: str = "Unknown"
 
68
 
69
  def initialize(self, game: str = "zork1"):
70
  """Initialize or reset the game."""
71
  self.game_name = game
72
  self.env = TextAdventureEnv(game)
73
  self.state = self.env.reset()
74
+
75
+ # Reset tracking data
76
+ self.history = []
77
+ self.current_location = self._get_player_location_internal()
78
  return self.state.observation
79
 
80
+ def _get_player_location_internal(self) -> str:
81
+ """
82
+ Get current player location using Jericho API.
83
+ """
84
+ if self.env and hasattr(self.env, 'env') and self.env.env:
85
+ try:
86
+ # Access Jericho's get_player_location() which returns a ZObject
87
+ loc_obj = self.env.env.get_player_location()
88
+ # ZObject has a .name attribute
89
+ if hasattr(loc_obj, 'name'):
90
+ return loc_obj.name
91
+ except Exception:
92
+ pass
93
+ return "Unknown"
94
+
95
  def step(self, action: str) -> str:
96
  """Execute an action and return the result."""
97
  if self.env is None:
98
  self.initialize()
99
+
100
  self.state = self.env.step(action)
101
+ obs = self.state.observation
102
+
103
+ # Record history
104
+ self.history.append((action, obs))
105
+ if len(self.history) > 50:
106
+ self.history = self.history[-50:]
107
+
108
+ # Update current location using Jericho API
109
+ self.current_location = self._get_player_location_internal()
110
 
111
+ return obs
112
+
 
 
 
113
 
114
  def get_score(self) -> int:
115
  """Get current score."""
 
123
  # Global game manager
124
  _game = GameManager()
125
 
 
 
 
 
 
 
 
 
 
 
 
126
  # =============================================================================
127
  # MCP Tools - IMPLEMENT THESE
128
  # =============================================================================
 
147
  """
148
  game = get_game()
149
 
150
+ obs = game.step(action)
151
+
152
+ # Append score/moves info for the agent
153
+ score_info = f"\n\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
154
+
155
+ # If the environment exposes reward/done, include them (optional but helpful)
156
+ try:
157
+ if getattr(game.state, "reward", 0) and game.state.reward > 0:
158
+ score_info = f"\n\n+{game.state.reward} points! (Total: {game.get_score()})"
159
+ except Exception:
160
+ pass
161
+
162
+ done_info = ""
163
+ try:
164
+ if getattr(game.state, "done", False):
165
+ done_info = "\n\nGAME OVER"
166
+ except Exception:
167
+ pass
168
+
169
+ return obs + score_info + done_info
170
+
171
+
172
+ @mcp.tool()
173
+ def memory() -> str:
174
+ """
175
+ Get a summary of the current game state.
176
+
177
+ Returns:
178
+ A summary including current location, score, moves, and recent history
179
+ """
180
+ game = get_game()
181
+
182
+ recent = game.history[-5:] if game.history else []
183
+ if recent:
184
+ recent_str = "\n".join([f" > {a} -> {obs[:80]}..." for a, obs in recent])
185
+ else:
186
+ recent_str = " (none yet)"
187
+
188
+ return (
189
+ "Current State:\n"
190
+ f"- Location: {game.current_location}\n"
191
+ f"- Score: {game.get_score()} points\n"
192
+ f"- Moves: {game.get_moves()}\n"
193
+ f"- Game: {game.game_name}\n\n"
194
+ "Recent Actions:\n"
195
+ f"{recent_str}\n\n"
196
+ "Current Observation:\n"
197
+ f"{game.state.observation if game.state else ''}"
198
+ )
199
+
200
+ @mcp.tool()
201
+ def get_map() -> str:
202
+ """
203
+ Get a map of explored locations.
204
 
205
+ Note: This tool is not used by the current agent implementation.
206
+ The agent manages its own location memory internally.
207
+
208
+ Returns:
209
+ A message indicating the agent doesn't use this tool
210
+ """
211
+ game = get_game()
212
+ return f"Current location: {game.current_location}\\n\\nNote: The agent manages location tracking internally via its location_memory system."
213
+
214
+ @mcp.tool()
215
+ def inventory() -> str:
216
+ """
217
+ Check what items you are currently carrying.
218
+ """
219
+ game = get_game()
220
+
221
+ items = []
222
+ try:
223
+ if getattr(game.state, "inventory", None):
224
+ items = game.state.inventory
225
+ except Exception:
226
+ items = []
227
+
228
+ if not items:
229
+ return "Inventory: You are empty-handed."
230
+
231
+ # Convert items to readable names
232
+ item_names = []
233
+ for item in items:
234
+ s = str(item)
235
+ s_lower = s.lower()
236
+ if "parent" in s_lower:
237
+ idx = s_lower.index("parent")
238
+ name = s[:idx].strip()
239
+ if ":" in name:
240
+ name = name.split(":", 1)[1].strip()
241
+ item_names.append(name)
242
+ elif ":" in s:
243
+ item_names.append(s.split(":", 1)[1].strip())
244
+ else:
245
+ item_names.append(s)
246
+
247
+ return "Inventory: " + ", ".join(item_names)
248
+
249
+ def get_game() -> GameManager:
250
+ """Get or initialize the game manager."""
251
+ global _game
252
+
253
+ game_name = os.environ.get("GAME", "zork1")
254
+
255
+ if _game.env is None:
256
+ _game.initialize(game_name)
257
+ elif _game.game_name != game_name:
258
+ _game.initialize(game_name)
259
+
260
+ return _game
261
+
262
+
263
+ @mcp.tool()
264
+ def get_player_location() -> str:
265
+ """
266
+ Get the current player location name from Jericho's location tracking.
267
+
268
+ Returns:
269
+ The name of the current location (e.g., "West of House", "Forest")
270
+ """
271
+ game = get_game()
272
+
273
+ if game.env and hasattr(game.env, 'env') and game.env.env:
274
+ try:
275
+ # Access Jericho's get_player_location() which returns a ZObject
276
+ loc_obj = game.env.env.get_player_location()
277
+
278
+ # ZObject has a .name attribute
279
+ if hasattr(loc_obj, 'name'):
280
+ return loc_obj.name
281
+ except Exception as e:
282
+ pass
283
+
284
+ # Fallback: use heuristic location extraction
285
+ return game.current_location
286
+
287
+
288
+ @mcp.tool()
289
+ def get_valid_actions() -> str:
290
+ """
291
+ Get valid actions from Jericho's action space at the current location.
292
+
293
+ Returns:
294
+ JSON string with valid actions: {"available": true, "actions": [...], "count": N}
295
+ """
296
+ game = get_game()
297
+
298
+ if game.env and hasattr(game.env, 'env') and game.env.env:
299
+ try:
300
+ # CRITICAL: use_parallel=False to prevent deadlock on Lost Pig
301
+ valid_actions = game.env.env.get_valid_actions(
302
+ use_object_tree=True,
303
+ use_ctypes=True,
304
+ use_parallel=False # Prevent multiprocessing deadlock
305
+ )
306
+
307
+ import json
308
+ return json.dumps({
309
+ "available": True,
310
+ "actions": valid_actions,
311
+ "count": len(valid_actions),
312
+ "source": "jericho"
313
+ })
314
+ except Exception as e:
315
+ import json
316
+ return json.dumps({
317
+ "available": False,
318
+ "error": str(e),
319
+ "actions": [],
320
+ "count": 0
321
+ })
322
+
323
+ import json
324
+ return json.dumps({
325
+ "available": False,
326
+ "error": "Game environment not initialized",
327
+ "actions": [],
328
+ "count": 0
329
+ })
330
 
331
 
332
  # =============================================================================
z-machine-games-master ADDED
@@ -0,0 +1 @@
 
 
1
+ ../z-machine-games-master