VZ22 commited on
Commit
748ada7
·
1 Parent(s): adc4f14

cleaned and added README

Browse files
Files changed (2) hide show
  1. README.md +12 -5
  2. agent.py +5 -9
README.md CHANGED
@@ -16,13 +16,20 @@ license: mit
16
 
17
  This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
18
 
19
- ## Approach
 
20
 
21
- <!-- Describe your approach here -->
 
 
 
 
 
 
 
 
 
22
 
23
- - What strategy does your agent use?
24
- - What tools did you implement in your MCP server?
25
- - Any interesting techniques or optimizations?
26
 
27
  ## Files
28
 
 
16
 
17
  This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
18
 
19
+ ## Approach (My Report)
20
+ The idea behind my approach is that the game has an internal state, within the same internal state, only a set of actions has an effect, and the same action will always result in the same output, so there is no point repeating the same action within the same state. However, when the state is changed, then some prior action might be relevant once again.
21
 
22
+ For me, the state is first defined by the location. Then within that location, it is possible to change the state (like in the lostpig game, when we hear something, then listen again and `northeast` becomes a valid direction). As a naive way of determining it, I say that if there is a "!" in the output, and that we've never seen this output, then it is a new state.
23
+
24
+ At each step, I tell the agent which actions it already tried within that state and tell it to not repeat it. If it does repeat, I will loop (max 5 times) until it outputs something new. Then, if I already have a lot of actions tried within the same state (more than 5 actions), then I will force an action randomly (either an idle action such as `wait` or `listen`, or move to a different place).
25
+
26
+ Additionaly, I also have an internal map, each time the agent does an action that moves it, I update the map accordingly. And at each step, I tell the agent its neighbors (Location's name, unknown, or inaccessible if we already tried). If we have a new state, then the inaccessible places will become unknown.
27
+
28
+
29
+ Finally, I consider the default state as the output from `look`, then I will store it and if another action gives something similar (levenshtein score above $0.8$), then I will consider it as something already seen, so that even if there is a "!", it will not matter (like in the foundain room of the lostpig game).
30
+
31
+ In the mcp server, I only changed a few things. First I used the Jericho's function `get_dictionary` to get a list of words that the game accepts. If it receives an action that is not valid, then I default it to `look`. I added a variable `last_observation`, which is what I use to represent the state (it only keeps the output of the game env, not the score at the end). Finally, for the location, I used the function `get_player_location` from Jericho to have a foolproof way of determining our location (prior to finding that function, I used a parser function in the agent file that is still there).
32
 
 
 
 
33
 
34
  ## Files
35
 
agent.py CHANGED
@@ -215,7 +215,6 @@ class StudentAgent:
215
  self.history: list[dict] = []
216
  self.score: int = 0
217
  self.history_state_tried_action = {}
218
- self.useless_actions = {}
219
  self.location_state = {} # to each location, we have a set of every observation made here
220
 
221
  self.idle_actions = ["listen", "wait", "diagnose", "yell", "pray", "launch", "take all"] # Actions that don't change location
@@ -293,7 +292,6 @@ class StudentAgent:
293
  else:
294
  observation += " Be thourough, examine everything around you and try to find all treasures and points of interest! Also remember your objective"
295
  locations_visited.add(current_location)
296
- self.useless_actions[current_location] = []
297
 
298
  prompt = self._build_prompt(observation)
299
  prompt += self._look_for_neighboring_locations(prompt)
@@ -357,13 +355,6 @@ class StudentAgent:
357
  # Look if we got the same observation as for a "look"
358
  current_obs = await client.call_tool("last_observation", {}) # observation also has the score
359
  current_obs = self._extract_result(current_obs)
360
- if tool_args.get("action", "").lower() == "look":
361
- look_observation = current_obs.lower()
362
- if "nothing" in look_observation or "can't see" in look_observation or "dark" in look_observation:
363
- self.useless_actions[current_location].append("look")
364
- elif levenshtein(look_observation, current_obs, ratio=True) > 0.8:
365
- self.useless_actions[current_location].append(f"{tool_name}({tool_args})")
366
- not_new_state = True
367
  tried_action_in_same_state.append((tool_name, tool_args))
368
 
369
  if verbose:
@@ -373,6 +364,11 @@ class StudentAgent:
373
  if verbose:
374
  print(f"[ERROR] {e}")
375
 
 
 
 
 
 
376
  # Track location
377
  location = await client.call_tool("current_location", {})
378
  location = self._extract_result(location)
 
215
  self.history: list[dict] = []
216
  self.score: int = 0
217
  self.history_state_tried_action = {}
 
218
  self.location_state = {} # to each location, we have a set of every observation made here
219
 
220
  self.idle_actions = ["listen", "wait", "diagnose", "yell", "pray", "launch", "take all"] # Actions that don't change location
 
292
  else:
293
  observation += " Be thourough, examine everything around you and try to find all treasures and points of interest! Also remember your objective"
294
  locations_visited.add(current_location)
 
295
 
296
  prompt = self._build_prompt(observation)
297
  prompt += self._look_for_neighboring_locations(prompt)
 
355
  # Look if we got the same observation as for a "look"
356
  current_obs = await client.call_tool("last_observation", {}) # observation also has the score
357
  current_obs = self._extract_result(current_obs)
 
 
 
 
 
 
 
358
  tried_action_in_same_state.append((tool_name, tool_args))
359
 
360
  if verbose:
 
364
  if verbose:
365
  print(f"[ERROR] {e}")
366
 
367
+ if tool_args.get("action", "").lower() == "look":
368
+ look_observation = current_obs.lower()
369
+ elif levenshtein(look_observation, current_obs, ratio=True) > 0.8:
370
+ not_new_state = True
371
+
372
  # Track location
373
  location = await client.call_tool("current_location", {})
374
  location = self._extract_result(location)