Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
Text Adventure Agent Submission
Walkthrough
Before starting this project, I didn't know how to play text adventures. After finishing it, I can confidently say I still don't know how to play them. This is the story of my journey:
I had to build an agent that used the ReAct MCP methodology to play. Since I had no idea what that entailed, I decided to do some research. Following several suggestions from my professors, I started with Zork1 and Lost Pig, which were supposedly easy (maybe not so much).
Both stories have their differences: Lost Pig has few locations; however, you must interact with your environment, collect objects, and talk to NPCs (the gnome). Meanwhile, in Zork1, you have many locations, some of which can initially lead to dead ends (like going into the forest at the beginning or going to the canyon). If you make the wrong decisions, you can die. Right here is where I found the first baseline improvement: while in many games "game over" or "you died" means the game is done, Zork1 gives you a second chance (so I had to modify the code to allow for this).
With this in mind, I had to think about what tools I had at my disposal. Since I hadn't realized the model used in the baseline (Qwen/Qwen2.5-72B-Instruct) was hardcoded in agent.py instead of using the one in my .env file (HF_MODEL=meta-llama/Llama-3.2-3B-Instruct), I thought I would have to get most of the points using a not very intelligent 3B model. Therefore, most of the following modifications take that constraint into account.
Our agent has the following tools available to use in the MCP server:
play_action: This is the most basic action. It simply executes the requested command and returns the result, the new score, and the number of moves made.current_location: With this action, the agent can see its current location (although we'll see later that it's not strictly necessary to use it).memory: Returns a summary of the current game state. The LLM, even after I tried to force it, refused to use this.get_map: Returns a map with connections and directions, as well as the current position. Example in Zork1:[RESULT] === Explored Locations Map === West House Exits: north -> North House North House Exits: north -> Forest Path, south -> West House Forest Path Exits: north -> Clearing, west -> Forest, south -> North House, east -> Forest 📍 Clearing Exits: west -> Behind House, north -> Forest, south -> Forest Behind House Exits: east -> Clearing, enter window -> Kitchen Kitchen Exits: west -> Living, return via enter window -> Behind House Living Exits: down -> Cellar, east -> Kitchen Cellar Exits: down -> Forest, up -> Livinginventory: Returns the objects the agent is carrying and a brief description (depending on the game). Example in Zork1:[RESULT] You are carrying: A sword A brown sack The brown sack contains: A lunch A clove of garlic A leafletget_valid_actions: Returns the possible actions available at the current moment (a simple call to Jericho).add_knowledge: This is one of the most useful additions. This tool allows the model to add any knowledge it deems important to a concise knowledge base that is always provided in the prompt. Because the model didn't tend to add many things, and sometimes the info you added could be needed at the most inopportune moment, I decided to append it via the prompt instead of relying on thememorytool.
Prompt Construction
The first thing sent to the model is the current location. Because it's a short string, it doesn't consume many tokens, so it's fine to send it in every API call.
Once this is done, the next step is figuring out how to pass the context of previous actions to the model. My first approach was to show it the last N actions, and the older they were, the less text I included (keeping only the first $k_i$ characters). However, this had several problems: any action taken before that sliding window would be forgotten, and depending on N, the token count per call could be quite large (which I tried to fix by trimming the text). Furthermore, the actions you've taken in a specific location are more important to remember.
Passing all actions was suboptimal, whereas providing an LLM-generated summary was optimal regarding token consumption. Thus, the final solution was a mix of the three approaches:
- First, I pass the last 5 actions that are not in the current location (which covers movements between places).
- Next come the actions done in the current location. To avoid passing all actions, assuming I've done
dactions in this location every timed % 5 == 0, I prompt an LLM to create a summary using the summary from iterationd-5plus the 5 new actions. - When delivering this information, I provide the summary. If
dis not a multiple of 5, I append the lastd % 5actions directly. (See the prompt example at the end).
Next in the prompt comes the knowledge base generated with add_knowledge, which contains details the model considers will be useful later.
After several tests, this setup consistently achieved 2 points in Lost Pig and 35 in Zork1 across 10 iterations, and the agent managed to move freely around the map...
"Not all those who wander are lost." - J.R.R. Tolkien
But our model was undoubtedly lost. It walked around the map without any specific goal, and its actions lacked consistency. It was necessary for the agent to be goal-oriented. Therefore, it was allowed to add [GOAL]<text> to its response, enabling it to change its general objective. This is then injected into every prompt so it knows how to direct its actions in the long term.
Prompt Example:
[CURRENT LOCATION]: TabRoom
[RECENT ACTIONS]:
- Moved from CaWith Stream to FountaRoom
- Moved from FountaRoom to Hole
- north (Result: Tunnel and stairs only place that Grunk can go her...)
- east (Result: Fountain Room All wall in this room glow. It bright, just like day time. Except that instead of sun,...)
- Moved from Hole to FountaRoom
- Moved from FountaRoom to TabRoom
- Moved from TabRoom to FountaRoom
- Moved from FountaRoom to TabRoom
[RESUME OF ALL PREVIOUS ACTIONS DONE AT THIS LOCATION]:
Grunk explored the southwest area of the Fountain Room but found only doorways to the northeast and east. Moving northeast, Grunk re-entered the Fountain Room, a brightly lit underground chamber with a dry fountain in the center and a large curtain on the south wall. The pig was present but ran to the southwest upon Grunk's arrival. Grunk examined and pushed the curtain, but found nothing of interest. Attempts to go southwest were met with a warning to try something different.
[GOAL]: Investigate the curtain in the Fountain Room (If you want to change your general goal add [GOAL] to your answer)
[HINTS]: You have been in the same location a while, if you feel stagnated move around or use the map
What do you do next?
Steering the Agent
The model usually refused to use tools like add_knowledge or get_map, so sometimes it was necessary to nudge it. I provide [HINTS] in the context: if the agent has been in the same location for more than 20 turns, I suggest using the map to get unstuck. If it receives a result it hasn't encountered before in that location, I suggest using add_knowledge in case it discovered something interesting.
Repetition Prevention
To prevent the model from entering infinite loops, if it executes the exact same action with the same arguments within the last K actions, the system throws a warning. However, the model might actually need to perform that action again (perhaps the game context changed). Because of this, the warning tells the model that if it genuinely needs to execute the command, it should issue it a second time, and that subsequent attempt will go through.
Other Improvements and Conclusions
While the features detailed above are the core of the agent, I also made several adjustments to the prompts to inject expert knowledge about text adventure games. This included adding lists of common commands and general strategies I found online for playing these types of parsers. I also had to include negative examples; for instance, the model initially had a strong tendency to type commands like "look for objects," which simply don't work in these engines, so I explicitly prompted it to avoid that phrasing.
Throughout the project, I noticed a few interesting trade-offs in the agent's behavior:
- Context vs. Exploration: Giving the agent less context history made it much more exploratory. However, the obvious drawback was that it would forget the previous clues or items needed to actually solve the puzzles.
- The Goal-Setting Dilemma: Forcing the agent to set a specific goal could sometimes cause it to get stuck in a loop trying to achieve it, or lead it to create completely useless goals. On average, though, it improved overall performance because the agent would eventually realize it was stuck and update its goal to something else.
- Inference-Time Compute for Planning: Having an LLM summarize the previous actions had an unexpected benefit: the summarizer would often naturally include its own hypotheses or opinions about what should be done next. When paired with the explicit
[GOAL]and[THOUGHT]tags, this setup effectively functioned as inference-time compute. By forcing the model to generate reasoning tokens before outputting an action, it gave the agent the "thinking space" necessary to actually plan its next moves.