Felix Lebel commited on
Commit
d725aa7
·
1 Parent(s): 615a63b

assignment done

Browse files
Files changed (3) hide show
  1. README.md +88 -4
  2. agent.py +653 -114
  3. mcp_server.py +260 -79
README.md CHANGED
@@ -14,15 +14,99 @@ license: mit
14
 
15
  ## Overview
16
 
17
- This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
 
18
 
19
  ## Approach
20
 
21
  <!-- Describe your approach here -->
22
 
23
- - What strategy does your agent use?
24
- - What tools did you implement in your MCP server?
25
- - Any interesting techniques or optimizations?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Files
28
 
 
14
 
15
  ## Overview
16
 
17
+ This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
18
+ Author: Félix LEBEL
19
 
20
  ## Approach
21
 
22
  <!-- Describe your approach here -->
23
 
24
+ ### What strategy does your agent use?
25
+ General strategy: The agent is encourage to explore (use mouvement actions) and examine/interact with its environment as possible.
26
+
27
+ Here is an excerpt from my system prompt:
28
+ EXPLORATION STRATEGY (follow this priority):
29
+ 1. EXPLORE a lot! Try new locations and exits frequently (north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit)
30
+ 2. ALWAYS EXAMINE everything that could be interesting, especially details in objects, rooms... EXAMINE where you could find some loot or useful items, or clues for puzzles. INTERACT with characters and objects to discover new possibilities.
31
+ 3. ALWAYS take items that seem useful (lamp, sword, key, etc.)
32
+ 4. Open containers (mailbox, cases, doors, windows)
33
+ 5. Try ALL exits from a location before moving on
34
+ 6. Use get_map and location_log frequently to plan which unexplored exits to try, and what actions to take. It also helps you remember what you've tried at the current location and their outcomes, so you can avoid repeating failed actions and focus on promising ones.
35
+ 7. Use memory to check if you're repeating yourself
36
+ 8. If you've been in the same location for 3+ turns, MOVE to a new location
37
+
38
+ ### What tools did you implement in your MCP server?
39
+
40
+ ```python
41
+ def play_action(action: str) -> str:
42
+ """
43
+ Execute a game command and return the result.
44
+
45
+ This is the main tool for interacting with the game.
46
+
47
+ Args:
48
+ action: The command to execute (e.g., "north", "take lamp", "open mailbox")
49
+
50
+ Returns:
51
+ The game's response to the action
52
+
53
+ Valid commands include:
54
+ - Movement: north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit
55
+ - Objects: take <item>, drop <item>, open <thing>, examine <thing>
56
+ - Other: look, inventory, read <thing>, turn on lamp
57
+ """
58
+ ```
59
+ ```python
60
+ def memory() -> str:
61
+ """
62
+ Get the current game state summary.
63
+
64
+ Returns:
65
+ A summary including current location (number of visits, actions tried, promising actions),
66
+ recent actions and current observation
67
+ """
68
+ ```
69
+ ```python
70
+ def inventory() -> str:
71
+ """
72
+ Check what the player is carrying.
73
+
74
+ Returns:
75
+ List of items in the player's inventory
76
+ """
77
+ return get_game().get_inventory()
78
+ ```
79
+ ```python
80
+ def get_map() -> str:
81
+ """
82
+ Get a map of explored locations, connections and exits.
83
+ Useful for navigation and avoiding getting lost.
84
+
85
+ Returns:
86
+ A text representation of explored locations and connections
87
+ """
88
+ ```
89
+ ```python
90
+ def location_log() -> str:
91
+ """
92
+ Shows what actions were tried and their outcomes at the current location, along with any promising actions to try.
93
+
94
+
95
+ Returns:
96
+ A detailed log of the current location, including visit count, actions taken and their outcomes, and promising leads.
97
+ """
98
+ ```
99
+ ### Any interesting techniques or optimizations?
100
+
101
+ Here list of ideas and techniques I implemented:
102
+ - I used Jericho API to extract cleaner Locations
103
+ - I used these "cleaner" locations to write a function that determines when a player enters a new location
104
+ - I kept the log of every actions (I made the distinction between movements and non-movement actions) at every locations
105
+ - When building the LLM prompt for the agent, I implemented another LLM whose task is to extract promising actions from: the current observation, the general history of actions/tool calls taken by the agent and the log of actions taken by the agent at the specific current locations (to prevent the agent from getting stuck, and for it to be aware of its last actions)
106
+ - I implemented an "Exploration Pressure" in several ways:
107
+ * if the agent stays too long at the same locations, the LLM-agent-prompt changes and suggest more movements (or using get_map, or look)
108
+ * if the agents keeps coming again and again to the same location while having already massively interact with objects, I show him the directions/movements it hasn't already tried (at the current location)
109
+ - I refined massively the system prompt and the extraction prompt
110
 
111
  ## Files
112
 
agent.py CHANGED
@@ -28,39 +28,32 @@ import os
28
  import re
29
  from dataclasses import dataclass, field
30
  from typing import Optional
 
31
 
32
  from dotenv import load_dotenv
33
- from huggingface_hub import InferenceClient
34
 
35
  # Load environment variables
36
  load_dotenv()
37
 
38
- # Set USE_LOCAL_MODEL=1 in your .env to use a locally downloaded model
39
- USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "0").strip() in ("1", "true", "yes")
40
- LOCAL_MODEL_ID = os.getenv("LOCAL_MODEL_ID", "Qwen/Qwen2.5-3B-Instruct")
41
-
42
  # =============================================================================
43
  # LLM Configuration - DO NOT MODIFY
44
  # =============================================================================
45
 
46
- # Model to use (fixed for fair evaluation)
47
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
48
 
49
- # Initialize the LLM client based on mode
50
  _local_pipeline = None
51
 
52
  if USE_LOCAL_MODEL:
53
- import torch
54
- from transformers import pipeline as _hf_pipeline
55
-
56
- _local_pipeline = _hf_pipeline(
57
- "text-generation",
58
- model=LOCAL_MODEL_ID,
59
- torch_dtype=torch.bfloat16,
60
- device_map="auto",
61
- )
62
- LLM_CLIENT = None
63
- else:
64
  _hf_token = os.getenv("HF_TOKEN")
65
  if not _hf_token:
66
  raise ValueError("HF_TOKEN not found. Set it in your .env file.")
@@ -79,13 +72,6 @@ def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300)
79
 
80
  Returns:
81
  The LLM's response text
82
-
83
- Example:
84
- response = call_llm(
85
- prompt="You are in a forest. What do you do?",
86
- system_prompt=SYSTEM_PROMPT,
87
- seed=42,
88
- )
89
  """
90
  messages = [
91
  {"role": "system", "content": system_prompt},
@@ -96,7 +82,7 @@ def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300)
96
  outputs = _local_pipeline(
97
  messages,
98
  max_new_tokens=max_tokens,
99
- temperature=0.0001, # Near-deterministic (0.0 unsupported by some backends)
100
  do_sample=True,
101
  )
102
  return outputs[0]["generated_text"][-1]["content"]
@@ -104,7 +90,7 @@ def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300)
104
  response = LLM_CLIENT.chat.completions.create(
105
  model=LLM_MODEL,
106
  messages=messages,
107
- temperature=0.0, # Deterministic for reproducibility
108
  max_tokens=max_tokens,
109
  seed=seed,
110
  )
@@ -125,61 +111,221 @@ class RunResult:
125
 
126
 
127
  # =============================================================================
128
- # System Prompt - Customize this for your agent
129
  # =============================================================================
130
 
131
- SYSTEM_PROMPT = """You are playing a classic text adventure game.
132
 
133
- GOAL: Explore the world, solve puzzles, and maximize your score.
134
 
135
  AVAILABLE TOOLS (use via MCP):
136
  - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
137
- - memory: Get current game state and history (if implemented)
138
- - inventory: Check what you're carrying (if implemented)
 
 
139
 
140
  VALID GAME COMMANDS for play_action:
141
- - Movement: north, south, east, west, up, down, enter, exit
142
  - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
143
- - Other: look, inventory, read <thing>, turn on lamp
 
 
 
 
 
 
144
 
145
  RESPOND IN THIS EXACT FORMAT (no markdown):
146
  THOUGHT: <your reasoning about what to do next>
147
  TOOL: <tool_name>
148
  ARGS: <JSON arguments, e.g., {"action": "look"}>
149
 
150
- Example:
151
- THOUGHT: I should look around to see where I am.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  TOOL: play_action
153
  ARGS: {"action": "look"}
 
 
 
 
154
  """
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
  # =============================================================================
158
- # Student Agent - IMPLEMENT THIS CLASS
159
  # =============================================================================
160
 
 
 
161
  class StudentAgent:
162
  """
163
- Your ReAct agent implementation.
164
-
165
- TODO:
166
- 1. Implement the run() method with the ReAct loop
167
- 2. Parse LLM responses to extract tool calls
168
- 3. Track state and avoid loops
169
-
170
- Use the provided call_llm() function to interact with the LLM.
171
  """
172
 
173
  def __init__(self):
174
  """Initialize your agent here."""
175
- # TODO: Initialize any state tracking you need
176
- # self.history = []
177
- # self.visited_locations = set()
178
- pass
 
 
 
 
 
 
 
 
179
 
180
  async def run(
181
  self,
182
- client, # FastMCP Client connected to your MCP server
183
  game: str,
184
  max_steps: int,
185
  seed: int,
@@ -187,89 +333,483 @@ class StudentAgent:
187
  ) -> RunResult:
188
  """
189
  Run the agent for a game session.
190
-
191
- Args:
192
- client: FastMCP Client connected to your MCP server
193
- game: Name of the game being played (e.g., "zork1")
194
- max_steps: Maximum number of steps to take
195
- seed: Random seed for reproducibility (use for LLM calls)
196
- verbose: Whether to print detailed output
197
-
198
- Returns:
199
- RunResult with final score and statistics
200
  """
201
- # TODO: Implement your ReAct loop here
202
- #
203
- # Basic structure:
204
- # 1. Get initial observation (call play_action with "look")
205
- # 2. Loop for max_steps:
206
- # a. Build prompt with current observation and history
207
- # b. Call LLM to get thought and action
208
- # c. Parse the response to extract tool and args
209
- # d. Call the tool via client.call_tool(tool_name, args)
210
- # e. Update history and state
211
- # f. Check for game over
212
- # 3. Return RunResult with final statistics
213
-
214
- # Example of calling a tool:
215
- # result = await client.call_tool("play_action", {"action": "look"})
216
- # observation = result[0].text if result else "No response"
217
-
218
- # Example of calling the LLM:
219
- # response = call_llm(
220
- # prompt="Current observation: " + observation,
221
- # system_prompt=SYSTEM_PROMPT,
222
- # seed=seed,
223
- # )
224
-
225
- # Placeholder implementation - replace with your code
226
  locations_visited = set()
227
  history = []
228
- final_score = 0
229
  moves = 0
230
 
231
- # TODO: Your implementation here
232
- # ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
233
 
234
  return RunResult(
235
- final_score=final_score,
236
- max_score=350, # Zork1 max score, adjust if needed
237
  moves=moves,
238
  locations_visited=locations_visited,
239
- game_completed=False,
240
  history=history,
241
  )
242
-
243
- def _build_prompt(self, observation: str, history: list) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
  """
245
- Build the prompt for the LLM.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
 
247
- TODO: Implement this to create effective prompts
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
  """
249
- # TODO: Combine system prompt, history, and current observation
250
- pass
251
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252
  def _parse_response(self, response: str) -> tuple[str, str, dict]:
253
  """
254
  Parse LLM response to extract thought, tool name, and arguments.
 
 
 
 
255
 
256
- TODO: Implement robust parsing
257
 
258
- Returns:
259
- Tuple of (thought, tool_name, args_dict)
260
- """
261
- # TODO: Parse the response format:
262
- # THOUGHT: ...
263
- # TOOL: ...
264
- # ARGS: {...}
265
- pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
266
 
267
- def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
268
- """
269
- Call the LLM with the given prompt.
 
 
 
 
 
 
270
 
271
- This is a convenience wrapper - you can also use call_llm() directly.
272
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
  return call_llm(prompt, system_prompt, seed)
274
 
275
 
@@ -281,7 +821,6 @@ async def test_agent():
281
  """Test the agent locally."""
282
  from fastmcp import Client
283
 
284
- # Path to your MCP server
285
  server_path = "mcp_server.py"
286
 
287
  agent = StudentAgent()
@@ -302,4 +841,4 @@ async def test_agent():
302
 
303
  if __name__ == "__main__":
304
  import asyncio
305
- asyncio.run(test_agent())
 
28
  import re
29
  from dataclasses import dataclass, field
30
  from typing import Optional
31
+ import numpy as np
32
 
33
  from dotenv import load_dotenv
 
34
 
35
  # Load environment variables
36
  load_dotenv()
37
 
 
 
 
 
38
  # =============================================================================
39
  # LLM Configuration - DO NOT MODIFY
40
  # =============================================================================
41
 
 
42
  LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
43
 
44
+ USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "false").lower() == "true"
45
  _local_pipeline = None
46
 
47
  if USE_LOCAL_MODEL:
48
+ try:
49
+ from transformers import pipeline
50
+ LOCAL_MODEL = os.getenv("LOCAL_MODEL", "Qwen/Qwen2.5-3B-Instruct")
51
+ _local_pipeline = pipeline("text-generation", model=LOCAL_MODEL, device_map="auto")
52
+ except Exception:
53
+ USE_LOCAL_MODEL = False
54
+
55
+ if not USE_LOCAL_MODEL:
56
+ from huggingface_hub import InferenceClient
 
 
57
  _hf_token = os.getenv("HF_TOKEN")
58
  if not _hf_token:
59
  raise ValueError("HF_TOKEN not found. Set it in your .env file.")
 
72
 
73
  Returns:
74
  The LLM's response text
 
 
 
 
 
 
 
75
  """
76
  messages = [
77
  {"role": "system", "content": system_prompt},
 
82
  outputs = _local_pipeline(
83
  messages,
84
  max_new_tokens=max_tokens,
85
+ temperature=0.0001,
86
  do_sample=True,
87
  )
88
  return outputs[0]["generated_text"][-1]["content"]
 
90
  response = LLM_CLIENT.chat.completions.create(
91
  model=LLM_MODEL,
92
  messages=messages,
93
+ temperature=0.0,
94
  max_tokens=max_tokens,
95
  seed=seed,
96
  )
 
111
 
112
 
113
  # =============================================================================
114
+ # System Prompt
115
  # =============================================================================
116
 
 
117
 
118
+ SYSTEM_PROMPT = """You are playing a classic text adventure game. Your goal is to EXPLORE widely, COLLECT treasures and MAXIMIZE your score.
119
 
120
  AVAILABLE TOOLS (use via MCP):
121
  - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
122
+ - location_log: See what actions were tried at the current location, their outcomes and the promising actions to try.
123
+ - memory: Get a current game state summary including current location (number of visits, actions tried, promising actions), recent actions and current observation
124
+ - get_map: Get a map of explored locations, connections and exits. It also helps you remember what you've tried at the current location and their outcomes, so you can avoid repeating failed actions and focus on promising ones..
125
+ - inventory: Have a look at what you're currently carrying.
126
 
127
  VALID GAME COMMANDS for play_action:
128
+ - Movement: north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit
129
  - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
130
+ - Light: turn on lamp, turn off lamp
131
+ - Combat: attack <enemy> with <weapon>
132
+ - Other: inventory, look, read <thing>, wait
133
+ - Other: look, examine, listen, speak, look, take, drop, empty, fill, inventory, climb, swim, open, close, set, turn, push, pull, push [direction], throw at, eat, drink, wear, take off, burn, dig, kick, destroy, read, ask for, give, feed, show, ask about, tell about, talk to, kiss, attack, wake, answer, wave, rub , squeeze, jump, jump over, wait, sleep
134
+ sing, yell, think, pray
135
+
136
+ FORBIDDEN (will NOT work): check, inspect, search, grab, use, help
137
 
138
  RESPOND IN THIS EXACT FORMAT (no markdown):
139
  THOUGHT: <your reasoning about what to do next>
140
  TOOL: <tool_name>
141
  ARGS: <JSON arguments, e.g., {"action": "look"}>
142
 
143
+ EXPLORATION STRATEGY (follow this priority):
144
+ 1. EXPLORE a lot! Try new locations and exits frequently (north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit)
145
+ 2. ALWAYS EXAMINE everything that could be interesting, especially details in objects, rooms... EXAMINE where you could find some loot or useful items, or clues for puzzles. INTERACT with characters and objects to discover new possibilities.
146
+ 3. ALWAYS take items that seem useful (lamp, sword, key, etc.)
147
+ 4. Open containers (mailbox, cases, doors, windows)
148
+ 5. Try ALL exits from a location before moving on
149
+ 6. Use get_map and location_log frequently to plan which unexplored exits to try, and what actions to take. It also helps you remember what you've tried at the current location and their outcomes, so you can avoid repeating failed actions and focus on promising ones.
150
+ 7. Use memory to check if you're repeating yourself
151
+ 8. If you've been in the same location for 3+ turns, MOVE to a new location
152
+
153
+ HERE IS THE STRUCTURE OF THE GAME OUTPUT you receive after each action and tool call:
154
+ <BEGIN GAME OUTPUT>
155
+ - CURRENT LOCATION: <location name>
156
+ - STEPS AT THIS LOCATION: <number of steps taken at this location>
157
+
158
+ - RECENT ACTIONS:
159
+ [<location name>] > action -> outcome
160
+ [<other location name>] > other action -> other outcome
161
+ ...
162
+ [<other location name>] > other action -> other outcome
163
+
164
+ - CURRENT SITUATION:
165
+ <text describing the current location, visible objects, characters, exits, inventory, map, etc.>
166
+ or <map description>
167
+
168
+ - ACTIONS ALREADY TRIED AT THIS LOCATION:
169
+ > action -> outcome
170
+ > other action -> other outcome
171
+
172
+ - ACTIONS SUGGESTED: action1, action2, action3
173
+ <END GAME OUTPUT>
174
+
175
+
176
+ "CURRENT SITUATION" is the most important part of the output, it is the direct consequence of your last action and the most up-to-date description of the world. Focus on it to find new interactions, objects, exits, and details to examine.
177
+ "RECENT ACTIONS" is a summary of what you've done recently and their outcomes. Use it to avoid repeating failed actions and to focus on promising ones.
178
+ DON'T SUGGEST ACTIONS YOU'VE ALREADY TRIED AT THIS LOCATION. If there are too many ACTIONS ALREADY TRIED AT THIS LOCATION, move to another place (use look to see the exits).
179
+
180
+
181
+ IMPORTANT:
182
+ - DO NOT repeat the same action multiple times in a row
183
+ - If an action doesn't work, try something DIFFERENT or EXAMINE more (precisely) to find new possibilities
184
+
185
+ Examples:
186
+
187
+ THOUGHT: I need to remember what I've tried here before. Let me check the location log.
188
+ TOOL: location_log
189
+ ARGS: {}
190
+
191
+ THOUGHT: I see an interesting object. Let me examine it.
192
+ TOOL: play_action
193
+ ARGS: {"action": "examine mailbox"}
194
+
195
+ THOUGHT: I should check the map to find unexplored exits and to remember what I've tried here before.
196
+ TOOL: get_map
197
+ ARGS: {}
198
+
199
+ THOUGHT: Look around to find more details about the room and possible interactions.
200
  TOOL: play_action
201
  ARGS: {"action": "look"}
202
+
203
+ THOUGHT: Let me remember to try opening the trapdoor later when I have a key.
204
+ TOOL: record_promising_action
205
+ ARGS: {"action": "open trapdoor"}
206
  """
207
 
208
+ # =============================================================================
209
+ # Prompt for extracting promising actions from observations
210
+ # =============================================================================
211
+
212
+ EXTRACT_ACTIONS_PROMPT = """You are analyzing text adventure game output. Extract promising actions the player should try.
213
+
214
+ Here is the structure of the GAME OUTPUT you receive:
215
+ <BEGIN GAME OUTPUT>
216
+ - CURRENT LOCATION: <location name>
217
+ - STEPS AT THIS LOCATION: <number of steps taken at this location>
218
+
219
+ - RECENT ACTIONS:
220
+ [<location name>] > action -> outcome
221
+ [<other location name>] > other action -> other outcome
222
+ ...
223
+ [<other location name>] > other action -> other outcome
224
+
225
+ - CURRENT SITUATION:
226
+ <text describing the current location, visible objects, characters, exits, inventory, map, etc.>
227
+ or <map description>
228
+
229
+ - ACTIONS ALREADY TRIED AT THIS LOCATION:
230
+ > action -> outcome
231
+ > other action -> other outcome
232
+
233
+ - ACTIONS SUGGESTED: action1, action2, action3
234
+ <END GAME OUTPUT>
235
+
236
+
237
+ Given the GAME OUTPUT, output a JSON list of action strings. Focus on:
238
+ - Objects mentioned in CURRENT SITUATION that can be TAKEN, examined, or opened
239
+ - Objects or places to examine mentioned in CURRENT SITUATION that could reveal new information or items
240
+ - Directions/exits mentioned in CURRENT SITUATION
241
+ - Interactive elements in CURRENT SITUATION (doors, containers, levers, buttons). Suggest interacting with them to discover new possibilities.
242
+ - Items that might be useful in CURRENT SITUATION
243
+ - Exploration if there is no interesting object to interact with mentioned in CURRENT SITUATION
244
+
245
+ Follow these additional guidelines:
246
+ - "CURRENT SITUATION" is the most important part of the output, it is the direct consequence of your last action and the most up-to-date description of the world. Focus on it to find new interactions, objects, exits, and details to examine.
247
+ - "RECENT ACTIONS" is a summary of what you've done recently and their outcomes. Use it to avoid repeating failed actions and to focus on promising ones.
248
+ - DON'T SUGGEST ACTIONS YOU'VE ALREADY TRIED AT THIS LOCATION. If there are too many ACTIONS ALREADY TRIED AT THIS LOCATION, move to another place (use look to see the exits).
249
+ - ACTIONS SUGGESTED are additionally useful, but make sure to focus on the CURRENT SITUATION and RECENT ACTIONS to find promising actions that are relevant to the current context.
250
+
251
+ IMPORTANT: If there is a warning 'WARNING', 'EXPLORATION HINT' or 'URGENT' in the GAME OUTPUT, prioritize suggesting actions that address those warnings.
252
+
253
+ VALID COMMANDS for include:
254
+ - Movement: north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit
255
+ - Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
256
+ - Light: turn on lamp, turn off lamp
257
+ - Combat: attack <enemy> with <weapon>
258
+ - Other: inventory, look, read <thing>, wait
259
+ - Other: look, examine, listen, speak, look, take, drop, empty, fill, inventory, climb, swim, open, close, set, turn, push, pull, push [direction], throw at, eat, drink, wear, take off, burn, dig, kick, destroy, read, ask for, give, feed, show, ask about, tell about, talk to, kiss, attack, wake, answer, wave, rub , squeeze, jump, jump over, wait, sleep
260
+ sing, yell, think, pray
261
+ KEEP VALID COMMANDS SIMPLE (e.g., "examine pcture" instead of "examine picture on east wall").
262
+ SUGGEST look when you need more information.
263
+
264
+ Output ONLY a JSON list, no explanation. Example: ["examine table", "take key", "open door", "north"]
265
+ If nothing stands out, output: []"""
266
+
267
+
268
+ EXTRACT_ACTIONS_PROMPT_EXIT = """You are analyzing text adventure game output. Extract promising actions or directions the player should try.
269
+
270
+ Here is the structure of the GAME OUTPUT you receive:
271
+ <BEGIN GAME OUTPUT>
272
+ - CURRENT LOCATION: <location name>
273
+ - STEPS AT THIS LOCATION: <number of steps taken at this location>
274
+
275
+ - RECENT ACTIONS:
276
+ [<location name>] > action -> outcome
277
+ [<other location name>] > other action -> other outcome
278
+ ...
279
+ [<other location name>] > other action -> other outcome
280
+
281
+ - CURRENT SITUATION:
282
+ <text describing the current location, visible objects, characters, exits, inventory, map, etc.>
283
+ or <map description>
284
+
285
+ - ACTIONS ALREADY TRIED AT THIS LOCATION:
286
+ > action -> outcome
287
+ > other action -> other outcome
288
+
289
+ <END GAME OUTPUT>
290
+
291
+ GUIDELINES:
292
+ The player needs to move to a different location. TRY TO DISCOVER NEW PLACES AND EXITS TO EXPLORE (look at RECENT ACTIONS to avoid going in the same direction again).
293
+ If no exits or directions are mentioned in the CURRENT SITUATION, suggest: look, get_map.
294
+ Otherwise, suggests exits and directions mentioned in the CURRENT SITUATION among the valid commands: north, south, east, west, northeast, northwest, southeast, southwest.
295
+
296
+ Output ONLY a JSON list, no explanation. Example: ["north", "look", "southwest", "east"]
297
+ If nothing stands out, output: []"""
298
+
299
 
300
  # =============================================================================
301
+ # Student Agent
302
  # =============================================================================
303
 
304
+ MVMT_COMMANDS = {"look", "north", "south", "east", "west", "up", "down", "northeast", "northwest", "southeast", "southwest"}
305
+
306
  class StudentAgent:
307
  """
308
+ ReAct agent with enhanced exploration and location-aware reasoning.
 
 
 
 
 
 
 
309
  """
310
 
311
  def __init__(self):
312
  """Initialize your agent here."""
313
+ self.history_agent: list[dict] = [] # # location -> history of actions/directions and outcomes at that location
314
+ self.history_location: dict[str, list[dict]] = {} # location -> history of actions that are not directions and outcomes at that location
315
+ self.remaining_directions: dict[str, set[str]] = {} # location -> unexplored directions
316
+ self.recent_actions: list[str] = [] # track recent actions for loop detection
317
+ self.score: int = 0
318
+ self.previous_location: str = "" # track previous location to detect movement
319
+ self.current_location: str = "" # track current location
320
+ self.steps_at_current_location: int = 0 # track how many steps we've been at the current location to encourage exploration
321
+ self.visited_locations: dict[str, int] = {} # location -> visit count
322
+ self.promising_actions: list[str] = [] # promising actions extracted from observation at new locations
323
+ self.is_new_location: bool = False # flag to indicate if the last observation was a new location
324
+
325
 
326
  async def run(
327
  self,
328
+ client,
329
  game: str,
330
  max_steps: int,
331
  seed: int,
 
333
  ) -> RunResult:
334
  """
335
  Run the agent for a game session.
 
 
 
 
 
 
 
 
 
 
336
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337
  locations_visited = set()
338
  history = []
 
339
  moves = 0
340
 
341
+ # Get list of available tools
342
+ tools = await client.list_tools()
343
+ tool_names = [t.name for t in tools]
344
+
345
+ # Get initial observation
346
+ result = await client.call_tool("play_action", {"action": "look"})
347
+ observation, location, is_new_location = self._extract_result(result)
348
+
349
+ # Track location (for counting unique locations visited, not necessarily the same as in-game location name)
350
+ dummy_location = observation.split("\n")[0] if observation else "Unknown"
351
+ locations_visited.add(dummy_location)
352
+
353
+ # Track location (location = in-game location name = the name of the room or area we're currently in, extracted from the observation)
354
+ self.current_location = location
355
+ self.previous_location = location
356
+ self.visited_locations[location] = 1
357
+ self.remaining_directions[location] = set(["north", "south", "east", "west", "northeast", "northwest", "southeast", "southwest"])
358
+
359
+ if verbose:
360
+ print(f"\n{observation}")
361
+
362
+ # Extract promising actions from initial observation
363
+ self.promising_actions = self._extract_promising_actions(observation, seed, EXTRACT_ACTIONS_PROMPT)
364
+ if self.promising_actions and verbose:
365
+ print(f"[PROMISING] {self.promising_actions}")
366
+
367
+ # Main ReAct loop
368
+ for step in range(1, max_steps + 1):
369
+
370
+ # Build prompt with context
371
+ prompt = self._build_prompt(observation, seed + step)
372
+
373
+ # Call LLM for reasoning
374
+ response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
375
+
376
+ # Parse the response
377
+ thought, tool_name, tool_args = self._parse_response(response)
378
+
379
+ if verbose:
380
+ print(f"\n--- Step {step} ---")
381
+ print(f"[THOUGHT] {thought}")
382
+ print(f"[TOOL] {tool_name}({tool_args})")
383
+
384
+ # Validate and fix common issues
385
+ tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
386
+
387
+ # Loop detection for play_action
388
+ if tool_name == "play_action":
389
+ action = tool_args.get("action", "look")
390
+
391
+ self.recent_actions.append(action)
392
+ if len(self.recent_actions) > 7:
393
+ self.recent_actions = self.recent_actions[-7:]
394
+
395
+ # Detect loops - if same action 3 times, force exploration
396
+ if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
397
+ if verbose:
398
+ print(f"[WARNING] Loop detected - forcing exploration")
399
+ # Try to move somewhere new
400
+ tool_name, tool_args = self._break_loop(tool_names)
401
+ self.recent_actions.append(tool_args.get("action", "look"))
402
+
403
+ # If stuck at same location too long, add exploration pressure
404
+ if self.steps_at_current_location >= 5 and tool_name == "play_action":
405
+ action = tool_args.get("action", "")
406
+ if action not in MVMT_COMMANDS:
407
+ if verbose:
408
+ print(f"[EXPLORATION BIAS] Been here {self.steps_at_current_location} steps, forcing movement")
409
+
410
+ moves += 1
411
+
412
+ # Execute the tool
413
+ try:
414
+ result = await client.call_tool(tool_name, tool_args)
415
+ observation, new_location, is_new_location = self._extract_result(result)
416
+ self.is_new_location = is_new_location
417
+
418
+ if verbose:
419
+ print(f"[RESULT] {observation[:200]}...")
420
+
421
+ except Exception as e:
422
+ observation = f"Error: {e}"
423
+ if verbose:
424
+ print(f"[ERROR] {e}")
425
+
426
+ # Detect location changes
427
+ self.previous_location = self.current_location
428
+ self.current_location = new_location
429
+ if is_new_location:
430
+ self.steps_at_current_location = 0
431
+
432
+ # Extract promising actions from new location
433
+ self.promising_actions = self._extract_promising_actions(observation, seed + step, EXTRACT_ACTIONS_PROMPT)
434
+ if self.promising_actions and verbose:
435
+ print(f"[PROMISING at new location] {self.promising_actions}")
436
+
437
+ else:
438
+ self.steps_at_current_location += 1
439
+ self.promising_actions = [] # Clear promising actions if we haven't moved
440
+
441
+ # Track number of visits to this location
442
+ if self._has_moved():
443
+ self.visited_locations[self.current_location] = self.visited_locations.get(self.current_location, 0) + 1
444
+ self.steps_at_current_location = 0
445
+
446
+ # Track location (for counting unique locations visited, not necessarily the same as in-game location name)
447
+ dummy_location = observation.split("\n")[0] if observation else "Unknown"
448
+ locations_visited.add(dummy_location)
449
+
450
+ # Update history of actions/directions and outcomes at that location
451
+ # Keep this general history not too long
452
+ self.history_agent.append({
453
+ "step": step,
454
+ "thought": thought,
455
+ "tool": tool_name,
456
+ "args": tool_args,
457
+ "result": observation[:200],
458
+ "location": self.current_location,
459
+ })
460
+ if len(self.history_agent) > 15:
461
+ self.history_agent = self.history_agent[-15:]
462
+
463
+ if self.current_location not in self.history_location:
464
+ self.history_location[self.current_location] = []
465
+
466
+ # Update remaining directions for this location if it's new
467
+ if self.current_location not in self.remaining_directions:
468
+ self.remaining_directions[self.current_location] = set(["north", "south", "east", "west", "northeast", "northwest", "southeast", "southwest"])
469
+
470
+ # Update history of non-movement actions at this location (to help the LLM learn from what worked and what didn't at this location).
471
+ if action not in MVMT_COMMANDS:
472
+ self.history_location[self.current_location].append({
473
+ "step": step,
474
+ "thought": thought,
475
+ "tool": tool_name,
476
+ "args": tool_args,
477
+ "result": observation,
478
+ })
479
+ else:
480
+ # If it's a movement action, remove it from remaining directions for this location
481
+ if action in self.remaining_directions[self.current_location]:
482
+ self.remaining_directions[self.current_location].remove(action)
483
+
484
+ # Track score from observation
485
+ self._update_score(observation)
486
+
487
+ # Record in result history (for final output)
488
+ history.append((thought, f"{tool_name}({tool_args})", observation[:100]))
489
+
490
+ # Check for game over
491
+ if self._is_game_over(observation):
492
+ if verbose:
493
+ print("\n*** GAME OVER ***")
494
+ break
495
 
496
  return RunResult(
497
+ final_score=self.score,
498
+ max_score=350,
499
  moves=moves,
500
  locations_visited=locations_visited,
501
+ game_completed=self._is_game_over(observation),
502
  history=history,
503
  )
504
+
505
+ def _has_moved(self) -> bool:
506
+ """Check if the player has moved to a new location."""
507
+ return self.current_location != self.previous_location
508
+
509
+ def _parse_location_from_observation(self, observation: str) -> tuple[str, bool]:
510
+ """Extract location name from observation text.
511
+ Return also if it's a new location based on tags in the observation."""
512
+ is_new_location = False
513
+ if not observation:
514
+ return "Unknown", False
515
+ first_line = observation.split("\n")[0].strip()
516
+ # If the first line begins with "[NEW LOCATION:", is_new_location = True
517
+ if first_line.startswith("[NEW LOCATION:"):
518
+ is_new_location = True
519
+ # Extract location from "[NEW/CURRENT LOCATION: location name]" if present
520
+ match = re.search(r'\[(?:NEW|CURRENT) LOCATION: (.+?)\]', first_line)
521
+
522
+ if match:
523
+ return match.group(1).strip(), is_new_location
524
+ else:
525
+ print(f"[ERROR] Could not parse location from observation. Defaulting to first line as location. Observation: \n{observation[:100]}...")
526
+ # Otherwise, return the first line as location
527
+ return first_line, is_new_location
528
+
529
+ def _parse_observation_wo_score(self, observation: str) -> str:
530
+ """Remove score information from observation to avoid confusion."""
531
+ if not observation:
532
+ return ""
533
+ return observation.split("[Score:")[0].strip()
534
+
535
+ def _extract_promising_actions(self, observation: str, seed: int, prompt: str) -> list[str]:
536
  """
537
+ Use the LLM to extract promising actions from an observation.
538
+ Returns a list of action strings worth trying.
539
+ """
540
+ try:
541
+ response = call_llm(
542
+ prompt=f"{observation}",
543
+ system_prompt=prompt,
544
+ seed=seed,
545
+ max_tokens=150,
546
+ )
547
+ # Try to parse JSON list from response
548
+ # Find the JSON array in the response
549
+ match = re.search(r'\[.*?\]', response, re.DOTALL)
550
+ if match:
551
+ actions = json.loads(match.group(0))
552
+ if isinstance(actions, list):
553
+ return [str(a) for a in actions if isinstance(a, str)]
554
+ except Exception:
555
+ pass
556
+ return []
557
+
558
+ def _break_loop(self, tool_names: list[str]) -> tuple[str, dict]:
559
+ """Break out of a loop by choosing an unexplored action."""
560
+ # Try movement directions we haven't tried recently
561
+ directions = ["north", "south", "east", "west", "up", "down",
562
+ "northeast", "northwest", "southeast", "southwest"]
563
+ recent_set = set(self.recent_actions[-5:]) if self.recent_actions else set()
564
+
565
+ for d in directions:
566
+ if d not in recent_set:
567
+ return "play_action", {"action": d}
568
 
569
+ # If all directions tried, try examining or looking
570
+ if "get_map" in tool_names:
571
+ return "get_map", {}
572
+
573
+ return "play_action", {"action": "look"}
574
+
575
+ def _force_movement(self) -> tuple[str, dict]:
576
+ """Force a movement action when stuck too long at a location."""
577
+ directions = ["north", "south", "east", "west", "up", "down",
578
+ "enter", "northeast", "northwest", "southeast", "southwest"]
579
+ recent_set = set(self.recent_actions[-5:]) if self.recent_actions else set()
580
+
581
+ for d in directions:
582
+ if d not in recent_set:
583
+ return "play_action", {"action": d}
584
+
585
+ # Fallback: just try north
586
+ return "play_action", {"action": "north"}
587
+
588
+ def _build_prompt(self, observation: str, seed: int = 0) -> str:
589
  """
590
+ Build the prompt for the LLM with rich context.
591
+ """
592
+ parts = []
593
+
594
+ parts.append(f"- CURRENT LOCATION: {self.current_location}")
595
+ parts.append(f"- STEPS AT THIS LOCATION: {self.steps_at_current_location}")
596
+
597
+ # Recent history
598
+ if self.history_agent:
599
+ parts.append("\n- RECENT ACTIONS:")
600
+ for entry in self.history_agent[-5:]:
601
+ loc = entry.get("location", "?")
602
+ action = entry.get("args", {}).get("action", entry["tool"])
603
+ result = entry.get("result", "")
604
+ result = self._parse_observation_wo_score(result)
605
+ # replace newlines in result with spaces for better readability
606
+ result = result.replace("\n", " ")
607
+ result_short = result[:80] + "..." if len(result) > 80 else result
608
+ parts.append(f" [{loc}] > {action} -> {result_short}")
609
+
610
+ # Warn about repeated actions
611
+ if self.recent_actions and len(self.recent_actions) >= 4 and len(set(self.recent_actions[-3:])) == 1:
612
+ parts.append(f"\n[WARNING: You've been doing '{self.recent_actions[-1]}' repeatedly. TRY SOMETHING COMPLETELY DIFFERENT!]")
613
+
614
+ # Exploration pressure
615
+ if self.steps_at_current_location >= 4:
616
+ parts.append(f"\n[EXPLORATION HINT: You have been at '{self.current_location}' for {self.steps_at_current_location} steps. Consider moving to a NEW location soon! Use 'look' to find exits of the room, or 'get_map' to see the discovered map.]")
617
+ if self.steps_at_current_location >= 5:
618
+ parts.append(f"\n[URGENT: You MUST move to a different location NOW. Pick a direction and go.]")
619
+
620
+ parts.append(f"\n- CURRENT SITUATION:\n{observation}")
621
+
622
+ # Actions already tried at this location (to avoid repetition and encourage trying new things)
623
+ revisited = self.visited_locations.get(self.current_location, 0) > 1
624
+ location_history = self.history_location.get(self.current_location, [])
625
+ if revisited:
626
+ parts.append(f"\n- ACTIONS ALREADY TRIED AT THIS LOCATION ({self.current_location}):")
627
+ for entry in location_history[-20:]:
628
+ action = entry.get("args", {}).get("action", entry["tool"])
629
+ result = entry.get("result", "")
630
+ result = self._parse_observation_wo_score(result)
631
+ result = result.replace("\n", " ")
632
+ result_short = result[:100] + "..." if len(result) > 100 else result
633
+ parts.append(f" > {action} -> {result_short}")
634
+
635
+ # # Show remaining unexplored directions for current location
636
+ # if self.current_location in self.remaining_directions and (self.visited_locations.get(self.current_location, 0) >= 5 or self.steps_at_current_location >= 5):
637
+ # # remaining should be a list
638
+ # remaining = list(self.remaining_directions[self.current_location])
639
+ # if remaining:
640
+ # parts.append(f"\n- REMAINING UNEXPLORED DIRECTIONS AT THIS LOCATION: {', '.join(remaining)}")
641
+
642
+ # Actions suggested by the LLM
643
+ if self.promising_actions:
644
+ parts.append(f"\n- ACTIONS SUGGESTED AT NEW LOCATION: {', '.join(self.promising_actions)}")
645
+ else:
646
+ prompt = EXTRACT_ACTIONS_PROMPT
647
+ if self.steps_at_current_location >= 5:
648
+ prompt = EXTRACT_ACTIONS_PROMPT_EXIT
649
+ promising_actions = self._extract_promising_actions("\n".join(parts), seed=seed, prompt=prompt)
650
+ if len(location_history) >= 7 or self.visited_locations.get(self.current_location, 0) >= 7:
651
+ # If we've been here a lot, prioritize exit directions
652
+ directions = ['look', 'get_map', 'north', 'south', 'east', 'west', 'northeast', 'northwest', 'southeast', 'southwest', 'up', 'down', 'enter', 'exit']
653
+ # Take 4 random elements from directions to build promising_actions
654
+ promising_actions = np.random.choice(directions, size=min(4, len(directions)), replace=False).tolist()
655
+ if promising_actions:
656
+ parts.append(f"\n- ACTIONS SUGGESTED: {', '.join(promising_actions)}")
657
+
658
+ parts.append("\nWhat do you do next?")
659
+
660
+ print(f"\n################### [START DEBUG] PROMPT RICH IN CONTEXT PASSED TO THE AGENT ###################\n{'\n'.join(parts)}\n[################### [END DEBUG] PROMPT RICH IN CONTEXT PASSED TO THE AGENT ###################]")
661
+
662
+ return "\n".join(parts)
663
+
664
  def _parse_response(self, response: str) -> tuple[str, str, dict]:
665
  """
666
  Parse LLM response to extract thought, tool name, and arguments.
667
+ """
668
+ thought = "No reasoning provided"
669
+ tool_name = "play_action"
670
+ tool_args = {"action": "look"}
671
 
672
+ lines = response.strip().split("\n")
673
 
674
+ for line in lines:
675
+ line_clean = line.strip()
676
+ line_upper = line_clean.upper()
677
+
678
+ if line_upper.startswith("THOUGHT:"):
679
+ thought = line_clean.split(":", 1)[1].strip()
680
+
681
+ elif line_upper.startswith("TOOL:"):
682
+ raw_tool = line_clean.split(":", 1)[1].strip().lower()
683
+ raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
684
+ tool_name = raw_tool.strip()
685
+
686
+ elif line_upper.startswith("ARGS:"):
687
+ raw_args = line_clean.split(":", 1)[1].strip()
688
+ raw_args = raw_args.replace("**", "").replace("*", "").replace("`", "")
689
+ try:
690
+ parsed = json.loads(raw_args)
691
+ if isinstance(parsed, dict):
692
+ tool_args = parsed
693
+ except json.JSONDecodeError:
694
+ # Try to extract action from malformed JSON
695
+ match = re.search(r'"action"\s*:\s*"([^"]+)"', raw_args)
696
+ if match:
697
+ tool_args = {"action": match.group(1)}
698
+ else:
699
+ # Try bare string
700
+ clean = raw_args.strip().strip('"').strip("'")
701
+ if clean:
702
+ tool_args = {"action": clean}
703
+
704
+ return thought, tool_name, tool_args
705
+
706
+ def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
707
+ """Validate and fix common tool call issues."""
708
+ # Fix tool name
709
+ if tool_name not in valid_tools:
710
+ if tool_name in ["action", "do", "command", "play"]:
711
+ tool_name = "play_action"
712
+ elif tool_name in ["map", "location", "locations"]:
713
+ tool_name = "get_map"
714
+ elif tool_name in ["mem", "state", "status", "history"]:
715
+ tool_name = "memory"
716
+ elif tool_name in ["inv", "items", "carrying"]:
717
+ tool_name = "inventory"
718
+ elif tool_name in ["valid", "valid_actions", "actions", "possible_actions"]:
719
+ tool_name = "get_valid_actions"
720
+ elif tool_name in ["log", "loc_log", "location_history"]:
721
+ tool_name = "location_log"
722
+ elif tool_name in ["record", "remember", "save_action", "promising"]:
723
+ tool_name = "record_promising_action"
724
+ else:
725
+ tool_name = "play_action"
726
+
727
+ # Fix action verbs
728
+ if tool_name == "play_action":
729
+ action = tool_args.get("action", "look")
730
+
731
+ invalid_verb_map = {
732
+ "check": "examine",
733
+ "inspect": "examine",
734
+ "search": "look",
735
+ "grab": "take",
736
+ "pick": "take",
737
+ "pick up": "take",
738
+ "get": "take",
739
+ "collect": "take",
740
+ "use": "turn on",
741
+ "switch on": "turn on",
742
+ "go north": "north",
743
+ "go south": "south",
744
+ "go east": "east",
745
+ "go west": "west",
746
+ "go up": "up",
747
+ "go down": "down",
748
+ "move north": "north",
749
+ "move south": "south",
750
+ "move east": "east",
751
+ "move west": "west",
752
+ }
753
+
754
+ action_lower = action.lower().strip()
755
+ if action_lower in invalid_verb_map:
756
+ action = invalid_verb_map[action_lower]
757
+ else:
758
+ # Check if action starts with an invalid verb
759
+ for invalid, valid in invalid_verb_map.items():
760
+ if action_lower.startswith(invalid + " "):
761
+ remainder = action_lower[len(invalid):].strip()
762
+ action = f"{valid} {remainder}"
763
+ break
764
+
765
+ tool_args["action"] = action
766
+
767
+ return tool_name, tool_args
768
 
769
+ def _extract_result(self, result) -> str:
770
+ """Extract observation, location, and boolean indicating if it's a new location from MCP tool result."""
771
+ if hasattr(result, 'content') and result.content:
772
+ obs = result.content[0].text
773
+ elif isinstance(result, list) and result:
774
+ obs = result[0].text if hasattr(result[0], 'text') else str(result[0])
775
+ else:
776
+ obs = str(result)
777
+ location, is_new_location = self._parse_location_from_observation(obs)
778
 
779
+ # obs without the first line
780
+ obs_without_first_line = "\n".join(obs.split("\n")[1:]).strip() if "\n" in obs else obs
781
+
782
+ return obs_without_first_line, location, is_new_location
783
+
784
+
785
+ def _update_score(self, text: str) -> None:
786
+ """Update score from game text."""
787
+ patterns = [
788
+ r'Score:\s*(\d+)',
789
+ r'score[:\s]+(\d+)',
790
+ r'\[Score:\s*(\d+)',
791
+ r'Total:\s*(\d+)',
792
+ ]
793
+
794
+ for pattern in patterns:
795
+ match = re.search(pattern, text, re.IGNORECASE)
796
+ if match:
797
+ self.score = max(self.score, int(match.group(1)))
798
+
799
+ def _is_game_over(self, text: str) -> bool:
800
+ """Check if the game is over."""
801
+ game_over_phrases = [
802
+ "game over",
803
+ "you have died",
804
+ "you are dead",
805
+ "*** you have died ***",
806
+ "*** you have won ***",
807
+ ]
808
+ text_lower = text.lower()
809
+ return any(phrase in text_lower for phrase in game_over_phrases)
810
+
811
+ def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
812
+ """Convenience wrapper for call_llm()."""
813
  return call_llm(prompt, system_prompt, seed)
814
 
815
 
 
821
  """Test the agent locally."""
822
  from fastmcp import Client
823
 
 
824
  server_path = "mcp_server.py"
825
 
826
  agent = StudentAgent()
 
841
 
842
  if __name__ == "__main__":
843
  import asyncio
844
+ asyncio.run(test_agent())
mcp_server.py CHANGED
@@ -26,6 +26,7 @@ Then open the MCP Inspector in your browser to test the tools interactively.
26
 
27
  import sys
28
  import os
 
29
 
30
  # Add parent directory to path to import games module
31
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -33,6 +34,9 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
33
  from fastmcp import FastMCP
34
  from games.zork_env import TextAdventureEnv
35
 
 
 
 
36
 
37
  # =============================================================================
38
  # Create the MCP Server
@@ -45,53 +49,213 @@ mcp = FastMCP("Student Text Adventure Server")
45
  # Game State Management
46
  # =============================================================================
47
 
 
 
 
 
 
 
 
 
 
48
  class GameManager:
49
  """
50
- Manages the text adventure game state.
51
-
52
- TODO: Extend this class to track:
53
- - Action history (for memory tool)
54
- - Explored locations (for mapping)
55
- - Current score and moves
56
  """
57
 
58
  def __init__(self):
59
  self.env: TextAdventureEnv = None
60
  self.state = None
61
  self.game_name: str = ""
62
- # TODO: Add more state tracking
63
- # self.history: list[tuple[str, str]] = []
64
- # self.explored_locations: dict[str, set[str]] = {}
65
- # self.current_location: str = ""
66
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  def initialize(self, game: str = "zork1"):
68
  """Initialize or reset the game."""
69
  self.game_name = game
70
  self.env = TextAdventureEnv(game)
71
  self.state = self.env.reset()
72
- # TODO: Reset your state tracking here
 
 
 
 
 
 
 
 
73
  return self.state.observation
74
 
75
- def step(self, action: str) -> str:
 
 
 
 
 
76
  """Execute an action and return the result."""
77
  if self.env is None:
78
  self.initialize()
79
 
80
- self.state = self.env.step(action)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- # TODO: Update your state tracking here
83
- # self.history.append((action, self.state.observation))
84
- # Update location tracking, etc.
 
 
85
 
86
- return self.state.observation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
- def get_score(self) -> int:
89
- """Get current score."""
90
- return self.state.score if self.state else 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
- def get_moves(self) -> int:
93
- """Get number of moves taken."""
94
- return self.state.moves if self.state else 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
 
97
  # Global game manager
@@ -102,14 +266,13 @@ def get_game() -> GameManager:
102
  """Get or initialize the game manager."""
103
  global _game
104
  if _game.env is None:
105
- # Get game from environment variable (set by evaluator)
106
- game = os.environ.get("GAME", "zork1")
107
  _game.initialize(game)
108
  return _game
109
 
110
 
111
  # =============================================================================
112
- # MCP Tools - IMPLEMENT THESE
113
  # =============================================================================
114
 
115
  @mcp.tool()
@@ -126,78 +289,97 @@ def play_action(action: str) -> str:
126
  The game's response to the action
127
 
128
  Valid commands include:
129
- - Movement: north, south, east, west, up, down, enter, exit
130
  - Objects: take <item>, drop <item>, open <thing>, examine <thing>
131
  - Other: look, inventory, read <thing>, turn on lamp
132
  """
133
  game = get_game()
134
 
135
- # TODO: You might want to add action validation here
136
- # TODO: You might want to include score changes in the response
137
 
138
- result = game.step(action)
 
139
 
140
- # Optional: Append score info
141
- # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
142
 
143
- return result
144
-
 
 
 
 
 
 
 
 
 
 
145
 
146
- # TODO: Implement additional tools to help your agent
147
 
148
- # @mcp.tool()
149
- # def memory() -> str:
150
- # """
151
- # Get the current game state summary.
152
- #
153
- # Returns:
154
- # A summary including current location, score, moves, and recent history
155
- # """
156
- # game = get_game()
157
- # # TODO: Return useful state information
158
- # pass
159
 
160
 
161
- # @mcp.tool()
162
- # def inventory() -> str:
163
- # """
164
- # Check what the player is carrying.
165
- #
166
- # Returns:
167
- # List of items in the player's inventory
168
- # """
169
- # game = get_game()
170
- # result = game.step("inventory")
171
- # return result
172
 
173
 
174
- # @mcp.tool()
175
- # def get_map() -> str:
176
- # """
177
- # Get a map of explored locations.
178
- #
179
- # Returns:
180
- # A text representation of explored locations and connections
181
- # """
182
- # game = get_game()
183
- # # TODO: Return map of explored locations
184
- # pass
185
 
186
 
187
  # @mcp.tool()
188
  # def get_valid_actions() -> str:
189
  # """
190
- # Get a list of likely valid actions from the current location.
191
- #
 
192
  # Returns:
193
- # List of actions that might work here
194
  # """
195
- # # This is a hint: Jericho provides get_valid_actions()
196
  # game = get_game()
197
- # if game.env and game.env.env:
198
- # valid = game.env.env.get_valid_actions()
199
- # return "Valid actions: " + ", ".join(valid[:20])
200
- # return "Could not determine valid actions"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
 
202
 
203
  # =============================================================================
@@ -205,5 +387,4 @@ def play_action(action: str) -> str:
205
  # =============================================================================
206
 
207
  if __name__ == "__main__":
208
- # This runs the server with stdio transport (for MCP clients)
209
- mcp.run()
 
26
 
27
  import sys
28
  import os
29
+ import re
30
 
31
  # Add parent directory to path to import games module
32
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 
34
  from fastmcp import FastMCP
35
  from games.zork_env import TextAdventureEnv
36
 
37
+ # Get game from environment variable (default: zork1)
38
+ INITIAL_GAME = os.environ.get("GAME", "lostpig")
39
+
40
 
41
  # =============================================================================
42
  # Create the MCP Server
 
49
  # Game State Management
50
  # =============================================================================
51
 
52
+ class LocationLog:
53
+ """Tracks actions, outcomes, and promising leads for a single location."""
54
+ def __init__(self, name: str):
55
+ self.name = name # Location name (e.g., "Kitchen")
56
+ self.visit_count: int = 0 # How many times we've been here
57
+ self.actions_taken: list[tuple[str, str]] = [] # (action, short_outcome)
58
+ self.exits_known: list[str] = [] # List of known exits from this location (e.g., "north -> Kitchen")
59
+
60
+
61
  class GameManager:
62
  """
63
+ Manages the text adventure game state with rich location tracking.
 
 
 
 
 
64
  """
65
 
66
  def __init__(self):
67
  self.env: TextAdventureEnv = None
68
  self.state = None
69
  self.game_name: str = ""
70
+ self.history: list[tuple[str, str]] = [] # list of (action, result) for recent actions
71
+ self.explored_locations: dict[str, set[str]] = {} # location name -> set of exits (e.g., "north -> Kitchen")
72
+ self.location_logs: dict[str, LocationLog] = {} # location name -> log of actions and outcomes at that location
73
+ self.previous_player_location: str = "" # Jericho internal location object
74
+ self.current_player_location: str = "" # Jericho internal location object
75
+ self.global_action_count: int = 0
76
+ self.score_history: list[int] = []
77
+
78
+ def _get_jericho_location(self):
79
+ """Get the internal Jericho player location object for comparison."""
80
+ try:
81
+ res = self.env.env.get_player_location()
82
+ match = re.search(r"Obj\d+: (.*) Parent\d+", res.name)
83
+ if match:
84
+ return match.group(1)
85
+ # Fallback: return the full location string
86
+ return res.name
87
+ except Exception:
88
+ return None
89
+
90
+ def has_moved(self) -> bool:
91
+ """
92
+ Determine if we moved to a new location.
93
+ Compares current player location object to the previous one.
94
+ """
95
+ current_loc = self.current_player_location
96
+ previous_loc = self.previous_player_location
97
+
98
+ changed = not (current_loc == previous_loc)
99
+ return changed
100
+
101
  def initialize(self, game: str = "zork1"):
102
  """Initialize or reset the game."""
103
  self.game_name = game
104
  self.env = TextAdventureEnv(game)
105
  self.state = self.env.reset()
106
+ self.history = []
107
+ self.explored_locations = {}
108
+ self.location_logs = {}
109
+ self.previous_player_location = ""
110
+ self.current_player_location = self._get_jericho_location()
111
+ self.global_action_count = 0
112
+ self.score_history = [0]
113
+ self._ensure_location_log(self.current_player_location)
114
+ self.location_logs[self.current_player_location].visit_count += 1
115
  return self.state.observation
116
 
117
+ def _ensure_location_log(self, location: str):
118
+ """Ensure a LocationLog exists for the given location."""
119
+ if location not in self.location_logs:
120
+ self.location_logs[location] = LocationLog(location)
121
+
122
+ def take_action(self, action: str) -> str:
123
  """Execute an action and return the result."""
124
  if self.env is None:
125
  self.initialize()
126
 
127
+ self.previous_player_location = self._get_jericho_location() # Store previous location before taking action
128
+ self.state = self.env.step(action) # Execute the action in the game environment
129
+ self.current_player_location = self._get_jericho_location() # Store current location after taking action
130
+
131
+ result = self.state.observation # Get the observation/result of the action
132
+ self.global_action_count += 1
133
+ self.score_history.append(self.state.score)
134
+
135
+ # Track history
136
+ self.history.append((action, result))
137
+ if len(self.history) > 50:
138
+ self.history = self.history[-50:]
139
+
140
+ moved = self.has_moved()
141
+ is_new_place = False
142
+
143
+ # New place! Update explored locations map
144
+ if self.current_player_location not in self.explored_locations:
145
+ self.explored_locations[self.current_player_location] = set()
146
+ is_new_place = True
147
+ # Add exit from previous location to current location
148
+ if moved:
149
+ self.explored_locations[self.previous_player_location].add(f"{action} -> {self.current_player_location}")
150
 
151
+ # Update location log
152
+ self._ensure_location_log(self.current_player_location)
153
+ current_loc_log = self.location_logs[self.current_player_location]
154
+ if moved:
155
+ current_loc_log.visit_count += 1
156
 
157
+ # Log this action and a short outcome in the previous location's log
158
+ prev_loc_log = self.location_logs.get(self.previous_player_location)
159
+ if prev_loc_log is not None:
160
+ short_outcome = result[:120].replace('\n', ' ')
161
+ prev_loc_log.actions_taken.append((action, short_outcome))
162
+ # Keep log manageable
163
+ if len(prev_loc_log.actions_taken) > 30:
164
+ prev_loc_log.actions_taken = prev_loc_log.actions_taken[-30:]
165
+
166
+ return result, is_new_place
167
+
168
+ def get_memory(self) -> str:
169
+ """Get a summary of current game state."""
170
+ recent = self.history[-5:] if self.history else []
171
+ recent_str = "\n".join([f" > {a} -> {r[:60]}..." for a, r in recent]) if recent else " (none yet)"
172
+
173
+ # Add location-specific info
174
+ loc_log = self.location_logs.get(self.current_player_location)
175
+ loc_info = ""
176
+ if loc_log:
177
+ loc_info = f"\nThis location visited {loc_log.visit_count} time(s)."
178
+ if loc_log.actions_taken:
179
+ loc_info += f"\nActions tried at this location: {len(loc_log.actions_taken)}"
180
+ recent_here = loc_log.actions_taken[-5:]
181
+ loc_info += "\nRecent actions at this location:"
182
+ for act, out in recent_here:
183
+ loc_info += f"\n > {act} -> {out[:50]}..."
184
+ if loc_log.promising_actions:
185
+ loc_info += f"\nPromising actions at this location: {', '.join(loc_log.promising_actions[:10])}"
186
+
187
+ return f"""[CURRENT LOCATION: {self.current_player_location}]
188
+ Location info:
189
+ {loc_info}
190
+
191
+ Recent Actions:
192
+ {recent_str}
193
+
194
+ Current Observation:
195
+ {self.state.observation}"""
196
 
197
+
198
+ def get_map(self) -> str:
199
+ """Get a map of explored locations."""
200
+ if not self.explored_locations:
201
+ return "Map: No locations explored yet. Try moving around!"
202
+ lines = [f"[CURRENT LOCATION: {self.current_player_location}]"]
203
+ lines.append("EXPLORED LOCATIONS AND EXITS:")
204
+ for loc, exits in sorted(self.explored_locations.items()):
205
+ visit_info = ""
206
+ if loc in self.location_logs:
207
+ visit_info = f" (visited {self.location_logs[loc].visit_count}x, {len(self.location_logs[loc].actions_taken)} actions tried)"
208
+ lines.append(f"\n* {loc}{visit_info}")
209
+ if exits:
210
+ for exit_info in sorted(exits):
211
+ lines.append(f" -> {exit_info}")
212
+ else:
213
+ lines.append(" -> No exits mapped yet")
214
+
215
+ # Add detailed log for current location
216
+ location = self.current_player_location
217
+ loc_log = self.location_logs.get(location)
218
+ if loc_log:
219
+ lines.append(f"\n- INFORMATION FOR CURRENT LOCATION: {location}")
220
+ lines.append(f" * Visited: {loc_log.visit_count} time(s)")
221
+ lines.append(f" * Actions tried here: {len(loc_log.actions_taken)}")
222
+
223
+ if loc_log.actions_taken:
224
+ lines.append(f" * Action history at this location {location}:")
225
+ for act, out in loc_log.actions_taken[-10:]:
226
+ lines.append(f" > {act} -> {out[:80]}")
227
+
228
+ return "\n".join(lines)
229
+
230
+ def get_inventory(self) -> str:
231
+ """Get current inventory."""
232
+ if self.env is None:
233
+ return "Game not initialized"
234
+ inv_state = self.env.step("inventory")
235
+ lines = [f"[CURRENT LOCATION: {self.current_player_location}]"]
236
+ lines.append(inv_state.observation)
237
+ return "\n".join(lines)
238
 
239
+ def get_location_log(self) -> str:
240
+ """Get detailed log for a specific location."""
241
+ location = self.current_player_location
242
+ loc_log = self.location_logs.get(location)
243
+ if not loc_log:
244
+ return f"No log for location: {location}"
245
+
246
+ lines = [f"[CURRENT LOCATION: {location}]"]
247
+ lines.append(f"Visited: {loc_log.visit_count} time(s)")
248
+ lines.append(f"Actions tried: {len(loc_log.actions_taken)}")
249
+
250
+ if loc_log.actions_taken:
251
+ lines.append("\nAction history:")
252
+ for act, out in loc_log.actions_taken[-10:]:
253
+ lines.append(f" > {act} -> {out[:80]}")
254
+
255
+ if loc_log.exits_known:
256
+ lines.append(f"Known exits: {', '.join(loc_log.exits_known)}")
257
+
258
+ return "\n".join(lines)
259
 
260
 
261
  # Global game manager
 
266
  """Get or initialize the game manager."""
267
  global _game
268
  if _game.env is None:
269
+ game = os.environ.get("GAME", "lostpig")
 
270
  _game.initialize(game)
271
  return _game
272
 
273
 
274
  # =============================================================================
275
+ # MCP Tools
276
  # =============================================================================
277
 
278
  @mcp.tool()
 
289
  The game's response to the action
290
 
291
  Valid commands include:
292
+ - Movement: north, south, east, west, northeast, northwest, southeast, southwest, up, down, enter, exit
293
  - Objects: take <item>, drop <item>, open <thing>, examine <thing>
294
  - Other: look, inventory, read <thing>, turn on lamp
295
  """
296
  game = get_game()
297
 
298
+ result, is_new_place = game.take_action(action)
 
299
 
300
+ # Add score info
301
+ score_info = f"\n\n[Score: {game.state.score} | Moves: {game.state.moves}]"
302
 
303
+ if game.state.reward > 0:
304
+ score_info = f"\n\n+{game.state.reward} points! (Total: {game.state.score})"
305
 
306
+ # Indicate if we moved to a new location
307
+ location_info = ""
308
+ if is_new_place:
309
+ location_info = f"[NEW LOCATION: {game.current_player_location}]\n"
310
+ else:
311
+ location_info = f"[CURRENT LOCATION: {game.current_player_location}]\n"
312
+
313
+ done_info = ""
314
+ if game.state.done:
315
+ done_info = "\n\nGAME OVER"
316
+
317
+ return location_info + result + score_info + done_info
318
 
 
319
 
320
+ @mcp.tool()
321
+ def memory() -> str:
322
+ """
323
+ Get the current game state summary.
324
+
325
+ Returns:
326
+ A summary including current location (number of visits, actions tried, promising actions),
327
+ recent actions and current observation
328
+ """
329
+ return get_game().get_memory()
 
330
 
331
 
332
+ @mcp.tool()
333
+ def inventory() -> str:
334
+ """
335
+ Check what the player is carrying.
336
+
337
+ Returns:
338
+ List of items in the player's inventory
339
+ """
340
+ return get_game().get_inventory()
 
 
341
 
342
 
343
+ @mcp.tool()
344
+ def get_map() -> str:
345
+ """
346
+ Get a map of explored locations, connections and exits.
347
+ Useful for navigation and avoiding getting lost.
348
+
349
+ Returns:
350
+ A text representation of explored locations and connections
351
+ """
352
+ return get_game().get_map()
 
353
 
354
 
355
  # @mcp.tool()
356
  # def get_valid_actions() -> str:
357
  # """
358
+ # Get a list of valid actions from the current game state using the game engine.
359
+ # Useful when entering a new location to understand what's possible.
360
+
361
  # Returns:
362
+ # A list of valid actions that the game engine considers possible
363
  # """
 
364
  # game = get_game()
365
+ # valid = game.get_valid_actions_list()
366
+ # if valid:
367
+ # return "Valid actions: " + ", ".join(valid)
368
+ # return "Could not determine valid actions. Try: look, inventory, examine, north, south, east, west, up, down, take, drop, open, close, read"
369
+
370
+
371
+ @mcp.tool()
372
+ def location_log() -> str:
373
+ """
374
+ Shows what actions were tried and their outcomes at the current location, along with any promising actions to try.
375
+
376
+
377
+ Returns:
378
+ A detailed log of the current location, including visit count, actions taken and their outcomes, and promising leads.
379
+ """
380
+ game = get_game()
381
+ return game.get_location_log()
382
+
383
 
384
 
385
  # =============================================================================
 
387
  # =============================================================================
388
 
389
  if __name__ == "__main__":
390
+ mcp.run()