--- title: Text Adventure Agent Submission emoji: "\U0001F5FA" colorFrom: green colorTo: blue sdk: gradio sdk_version: "5.12.0" app_file: app.py pinned: false license: mit --- # Text Adventure Agent Submission ## Overview This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP. ## Approach - What strategy does your agent use? - What tools did you implement in your MCP server? - Any interesting techniques or optimizations? 1. First implementation : agent.py MINIMAP: The first thing I noticed was that it was hard to rely on the agent calling MCP tools like memory() and get_map(), and then using the result of these tools successfully. This was especially true for smaller models (anywhere around 3B-10B), that struggle with the information overload. My idea to solve this problem was to always give the agent a minimap of the connections around the current location. Since we don't need any information from the game environment other than the observation, the logic about updating the minimap is all contained in the agent.py script. To do this, I designed a method (regex based), to extract the location name from any observation resulting from a movement command. This method, called _extract_location(), checks if the observation begins with a line of less than 35 characters, with no punctuation (this is the format used for all locations in lostpig and zork1). If it is the case, the movement is considered successful and the path between the old and the new location is registered. Otherwise, this particular direction is marked as "Blocked". An example of the minimap given to the agent: """ KNOWN CONNECTIONS FROM Fountain Room: > w -> Hole > n -> Statue Room > d -> Blocked """ ANTI-LOOP: I also removed the summary of the 5 previous observations, as they were truncated and I felt like they were only confusing the model. This enabled me to provide more structured content to the model without reaching an information overload. I only provide the last 5 actions, to avoid immediate repetitions. However, I add the list of several of the previous actions tried by the agent in this location, to encourage it to try something new. CHAIN OF THOUGHT: Since managing memory continuity is very hard while keeping a reasonably low number of information in the prompt, I figured the best way to enforce consistency was to feed the agent with its previous thought, to avoid forgetting completely about the task at hand. An example of what can happen without this : the agent enters the cave, sees a bench and a reddish thing, examines the bench first, and then completely forgets about the reddish thing. My solution : ask the agent to plan several steps ahead, using very short sentences. In this example, the thought can be : "I see a stone bench and a small reddish thing across the stream. I will examine the bench first and then the reddish thing.". The agent being fed this, as well as the previous action "examine bench", will be encouraged to follow the plan and examine the reddish thing. INVENTORY MANAGEMENT: Similar to the issue I encountered about memory() and get_map(), it seems like the agent does not spontaneously open its inventory, or at very random times. A workaround solution I thought of was to display the content of the inventory after each 'examine' action, so that if the agent has an item in its inventory to combine with the examined object, it has all the information it needs to solve the puzzle. An example of where it is useful : in lostpig, in the Table Room, when the agent examines the metal box, the next prompt contains information both about the slot in the box and about the coin the player has in its inventory, which enables the agent to insert the coin inside the box. 2. Second implementation : agent_multicall.py ACTIONS SUMMARY: I tried a variant of this implementation, which used a small API call to the model to extract and summarize the result of a new observation, to distinguish previous successful actions from failed ones. This leads to a much better long term memory, but can quickly overload the model with many details. Example of this feature in lostpig : """ YOU ALREADY TRIED: > examine fountain -> Fountain is old and ornate but not functioning, no water present. > look into fountain -> Grunk finds and keeps a coin in the fountain, gaining 1 point. > examine curtain -> Curtain depicts a gnome with a torch pointing towards an exit. > look behind curtain -> Wall behind curtain glows. > examine wall -> Walls glow in the dark, but the room is not dark. > examine pig -> Pig is pink, chubby, quick, and smart. FAILED: push pig, take pig """ ITEMS EXTRACTION: In this approach, I also extract the list of all important items when I enter a new location, to encourage the agent to examine everything. This looks like this : """ IMPORTANT ITEMS IN Hole: torch, stairs, crack """ I also added a warning feature to signal the agent that it had spent more than 10 turns in the same location, and that it was time to move on. This proves to be useful in the Statue Room, where the agent gets stuck forever. Since I did not see any noticeable performance improvement between the two versions, I kept the cheapest one, which is the first one I presented, using only one LLM call per step. ## Files | File | Description | |------|-------------| | `agent.py` | ReAct agent with `StudentAgent` class | | `mcp_server.py` | MCP server with game interaction tools | | `app.py` | Gradio interface for HF Space | | `requirements.txt` | Additional dependencies | ## How to Submit 1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template` 2. Clone your fork locally 3. Implement your agent in `agent.py` and `mcp_server.py` 4. Test locally (see below) 5. Push your changes to your Space 6. Submit your Space URL on the course platform ## Local Testing ```bash # Install dependencies pip install -r requirements.txt # Test the MCP server interactively fastmcp dev mcp_server.py # Run your agent on a game python run_agent.py --agent . --game lostpig -v -n 20 # Run evaluation python -m evaluation.evaluate -s . -g lostpig -t 3 ```