text-adventure-template

Sleeping

App Files Files Community

text-adventure-template / README.md

haiphamcse

remove get valid

1b622bb 8 days ago

preview code

raw

history blame contribute delete

2.98 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit

Zork Report - Duc-Hai Pham

TLDR

Layered Context Management: More sophisticated, layered approach to prompting. The prompt utilizes information such as location-aware history, inventory, and visible NPCs, etc.
LLM-based State Extraction: Use MCP tools to extract verified data about exits, inventory, and visible NPCs, reducing the likelihood of "hallucinated" objects. Then use another LLM to structure the information and add to prompt.
Location-Aware History: Instead of keeping track of N recent moves, keep track of moves depending on where the agent is.
LLM-based action proposal: Use an LLM to propose the next best actions from the observation and exploration goals.

MCP Toolset (`mcp_server.py`)

The server exposes the Zork environment through a set of specialized tools that the agent can call:

Tool	Functionality
`play_action(action)`	Same as original implementatiom
`get_location_name()`	Returns a unique identifier for the current room to help the agent map its surroundings.
`get_visible_objects()`	Filters the room description to return a clean list of interactable items.
`get_score()`	Tracks `final_score` and total `moves` taken.

Agent Architecture (`agent.py`)

The StudentAgent follows a refined ReAct (Reasoning + Acting) loop:

Initialize → Observe → Extract Structured Context →
Generate Promising Actions → Build Layered Prompt →
LLM Reasoning → Validate → Execute Tool →
Update Memory → Repeat

There are 3 LLM inside the agent: Player, Extractor and Proposer.

1. Extract Structured Context (Extractor)

The Extractor utilizes all the information from MCP tools and merge them into a set of valuable information for prompting.

{
    "current_location_name": location,
    "exits": parsed.get("exits", []),
    "visible_objects": visible_objects,
    "inventory": inventory,
    "in_combat": parsed.get("in_combat", False),
    "is_room_description": parsed.get("is_room_description", False),
}

2. Layered Prompt (Proposer)

Based on the information received from the Extractor, the Proposer will format all of this information into a prompt with different layers.

3. Location-based Memory

Instead of keeping N recent actions as context, my agent uses location-based context. The agent keeps a persistant dictionary of actions taken at each location, depending on where the agent is in the game, the previous actions of that location will be added to the prompt.

3. Execution & Parsing

The LLM returns a JSON object to ensure stability.

{
  "thinking": "I have the lantern but it is dark. I should turn it on before entering the cave.",
  "action": "activate lantern",
  "new_objective": "Explore the dark cave"
}