Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
Zork Report - Duc-Hai Pham
TLDR
- Layered Context Management: More sophisticated, layered approach to prompting. The prompt utilizes information such as location-aware history, inventory, and visible NPCs, etc.
- LLM-based State Extraction: Use MCP tools to extract verified data about exits, inventory, and visible NPCs, reducing the likelihood of "hallucinated" objects. Then use another LLM to structure the information and add to prompt.
- Location-Aware History: Instead of keeping track of N recent moves, keep track of moves depending on where the agent is.
- LLM-based action proposal: Use an LLM to propose the next best actions from the observation and exploration goals.
MCP Toolset (mcp_server.py)
The server exposes the Zork environment through a set of specialized tools that the agent can call:
| Tool | Functionality |
|---|---|
play_action(action) |
Same as original implementatiom |
get_location_name() |
Returns a unique identifier for the current room to help the agent map its surroundings. |
get_visible_objects() |
Filters the room description to return a clean list of interactable items. |
get_score() |
Tracks final_score and total moves taken. |
Agent Architecture (agent.py)
The StudentAgent follows a refined ReAct (Reasoning + Acting) loop:
Initialize → Observe → Extract Structured Context →
Generate Promising Actions → Build Layered Prompt →
LLM Reasoning → Validate → Execute Tool →
Update Memory → Repeat
There are 3 LLM inside the agent: Player, Extractor and Proposer.
1. Extract Structured Context (Extractor)
The Extractor utilizes all the information from MCP tools and merge them into a set of valuable information for prompting.
{
"current_location_name": location,
"exits": parsed.get("exits", []),
"visible_objects": visible_objects,
"inventory": inventory,
"in_combat": parsed.get("in_combat", False),
"is_room_description": parsed.get("is_room_description", False),
}
2. Layered Prompt (Proposer)
Based on the information received from the Extractor, the Proposer will format all of this information into a prompt with different layers.
3. Location-based Memory
Instead of keeping N recent actions as context, my agent uses location-based context. The agent keeps a persistant dictionary of actions taken at each location, depending on where the agent is in the game, the previous actions of that location will be added to the prompt.
3. Execution & Parsing
The LLM returns a JSON object to ensure stability.
{
"thinking": "I have the lantern but it is dark. I should turn it on before entering the cave.",
"action": "activate lantern",
"new_objective": "Explore the dark cave"
}