Spaces:
Sleeping
Sleeping
cleaned and added README
Browse files
README.md
CHANGED
|
@@ -16,13 +16,20 @@ license: mit
|
|
| 16 |
|
| 17 |
This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
|
| 18 |
|
| 19 |
-
## Approach
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
- What strategy does your agent use?
|
| 24 |
-
- What tools did you implement in your MCP server?
|
| 25 |
-
- Any interesting techniques or optimizations?
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
|
|
|
| 16 |
|
| 17 |
This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
|
| 18 |
|
| 19 |
+
## Approach (My Report)
|
| 20 |
+
The idea behind my approach is that the game has an internal state, within the same internal state, only a set of actions has an effect, and the same action will always result in the same output, so there is no point repeating the same action within the same state. However, when the state is changed, then some prior action might be relevant once again.
|
| 21 |
|
| 22 |
+
For me, the state is first defined by the location. Then within that location, it is possible to change the state (like in the lostpig game, when we hear something, then listen again and `northeast` becomes a valid direction). As a naive way of determining it, I say that if there is a "!" in the output, and that we've never seen this output, then it is a new state.
|
| 23 |
+
|
| 24 |
+
At each step, I tell the agent which actions it already tried within that state and tell it to not repeat it. If it does repeat, I will loop (max 5 times) until it outputs something new. Then, if I already have a lot of actions tried within the same state (more than 5 actions), then I will force an action randomly (either an idle action such as `wait` or `listen`, or move to a different place).
|
| 25 |
+
|
| 26 |
+
Additionaly, I also have an internal map, each time the agent does an action that moves it, I update the map accordingly. And at each step, I tell the agent its neighbors (Location's name, unknown, or inaccessible if we already tried). If we have a new state, then the inaccessible places will become unknown.
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
Finally, I consider the default state as the output from `look`, then I will store it and if another action gives something similar (levenshtein score above $0.8$), then I will consider it as something already seen, so that even if there is a "!", it will not matter (like in the foundain room of the lostpig game).
|
| 30 |
+
|
| 31 |
+
In the mcp server, I only changed a few things. First I used the Jericho's function `get_dictionary` to get a list of words that the game accepts. If it receives an action that is not valid, then I default it to `look`. I added a variable `last_observation`, which is what I use to represent the state (it only keeps the output of the game env, not the score at the end). Finally, for the location, I used the function `get_player_location` from Jericho to have a foolproof way of determining our location (prior to finding that function, I used a parser function in the agent file that is still there).
|
| 32 |
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Files
|
| 35 |
|
agent.py
CHANGED
|
@@ -215,7 +215,6 @@ class StudentAgent:
|
|
| 215 |
self.history: list[dict] = []
|
| 216 |
self.score: int = 0
|
| 217 |
self.history_state_tried_action = {}
|
| 218 |
-
self.useless_actions = {}
|
| 219 |
self.location_state = {} # to each location, we have a set of every observation made here
|
| 220 |
|
| 221 |
self.idle_actions = ["listen", "wait", "diagnose", "yell", "pray", "launch", "take all"] # Actions that don't change location
|
|
@@ -293,7 +292,6 @@ class StudentAgent:
|
|
| 293 |
else:
|
| 294 |
observation += " Be thourough, examine everything around you and try to find all treasures and points of interest! Also remember your objective"
|
| 295 |
locations_visited.add(current_location)
|
| 296 |
-
self.useless_actions[current_location] = []
|
| 297 |
|
| 298 |
prompt = self._build_prompt(observation)
|
| 299 |
prompt += self._look_for_neighboring_locations(prompt)
|
|
@@ -357,13 +355,6 @@ class StudentAgent:
|
|
| 357 |
# Look if we got the same observation as for a "look"
|
| 358 |
current_obs = await client.call_tool("last_observation", {}) # observation also has the score
|
| 359 |
current_obs = self._extract_result(current_obs)
|
| 360 |
-
if tool_args.get("action", "").lower() == "look":
|
| 361 |
-
look_observation = current_obs.lower()
|
| 362 |
-
if "nothing" in look_observation or "can't see" in look_observation or "dark" in look_observation:
|
| 363 |
-
self.useless_actions[current_location].append("look")
|
| 364 |
-
elif levenshtein(look_observation, current_obs, ratio=True) > 0.8:
|
| 365 |
-
self.useless_actions[current_location].append(f"{tool_name}({tool_args})")
|
| 366 |
-
not_new_state = True
|
| 367 |
tried_action_in_same_state.append((tool_name, tool_args))
|
| 368 |
|
| 369 |
if verbose:
|
|
@@ -373,6 +364,11 @@ class StudentAgent:
|
|
| 373 |
if verbose:
|
| 374 |
print(f"[ERROR] {e}")
|
| 375 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 376 |
# Track location
|
| 377 |
location = await client.call_tool("current_location", {})
|
| 378 |
location = self._extract_result(location)
|
|
|
|
| 215 |
self.history: list[dict] = []
|
| 216 |
self.score: int = 0
|
| 217 |
self.history_state_tried_action = {}
|
|
|
|
| 218 |
self.location_state = {} # to each location, we have a set of every observation made here
|
| 219 |
|
| 220 |
self.idle_actions = ["listen", "wait", "diagnose", "yell", "pray", "launch", "take all"] # Actions that don't change location
|
|
|
|
| 292 |
else:
|
| 293 |
observation += " Be thourough, examine everything around you and try to find all treasures and points of interest! Also remember your objective"
|
| 294 |
locations_visited.add(current_location)
|
|
|
|
| 295 |
|
| 296 |
prompt = self._build_prompt(observation)
|
| 297 |
prompt += self._look_for_neighboring_locations(prompt)
|
|
|
|
| 355 |
# Look if we got the same observation as for a "look"
|
| 356 |
current_obs = await client.call_tool("last_observation", {}) # observation also has the score
|
| 357 |
current_obs = self._extract_result(current_obs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 358 |
tried_action_in_same_state.append((tool_name, tool_args))
|
| 359 |
|
| 360 |
if verbose:
|
|
|
|
| 364 |
if verbose:
|
| 365 |
print(f"[ERROR] {e}")
|
| 366 |
|
| 367 |
+
if tool_args.get("action", "").lower() == "look":
|
| 368 |
+
look_observation = current_obs.lower()
|
| 369 |
+
elif levenshtein(look_observation, current_obs, ratio=True) > 0.8:
|
| 370 |
+
not_new_state = True
|
| 371 |
+
|
| 372 |
# Track location
|
| 373 |
location = await client.call_tool("current_location", {})
|
| 374 |
location = self._extract_result(location)
|