text-adventure-agent

Sleeping

App Files Files Community

text-adventure-agent / README.md

Valentin Badea

Updated agent README and system prompt

6afaff2 18 days ago

preview code

raw

history blame contribute delete

10.1 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit

Text Adventure Agent Submission

Overview

This agent uses a memory-driven architecture with a two-phase LLM approach to systematically explore text adventure games. At each step, the agent leverages Jericho's API to access valid actions and current location data, maintaining a structured memory dictionary that records location-specific information including tried actions, available actions, promising action subsets, and summarized outcomes.

The core innovation is the dual LLM call strategy: first for strategic action selection with reasoning over promising action subsets (up to 10), and second for outcome summarization that ensures the agent actively "listens" to and learns from each action result. This approach balances comprehensive exploration with concise memory management, preventing context overflow while maintaining rich historical knowledge.

Approach

Memory Architecture

The agent maintains a location-indexed memory dictionary with the following structure:

valid_actions: Location-specific actions from Jericho's API (verified to work)
tried_actions: Set of actions already attempted at this location
promising_actions: LLM-selected subset (max 10) of strategic actions to consider
results: For each tried action, stores {observation, summary, success, key_info}

Example memory structure:

location_memory = {
    "West of House": {
        "valid_actions": ["north", "south", "east", "open window", "examine window", ...],
        "tried_actions": {"open mailbox", "take leaflet", "examine window"},
        "promising_actions": ["north", "open mailbox", "take leaflet", "examine window", "east"],
        "visited": 2,
        "results": {
            "open mailbox": {
                "observation": "Opening the mailbox reveals a leaflet. [Score: 5]",
                "summary": "Successfully opened mailbox. Found leaflet inside.",
                "success": "yes",
                "key_info": "leaflet available to take"
            },
            "take leaflet": {
                "observation": "Taken. [Score: 5 | Moves: 3]",
                "summary": "Picked up the leaflet from the mailbox.",
                "success": "yes",
                "key_info": "leaflet now in inventory"
            },
            "examine window": {
                "observation": "The window is closed and appears to be boarded. [Score: 5 | Moves: 4]",
                "summary": "Window is boarded up. Cannot open.",
                "success": "no",
                "key_info": "need different approach or location"
            }
        }
    },
    "Forest": {
        "valid_actions": ["north", "south", "east", "west", "climb tree", "examine tree"],
        "tried_actions": {"examine tree"},
        "promising_actions": ["climb tree", "examine tree", "north", "west"],
        "visited": 1,
        "results": {...}
    }
}

This structure allows the agent to return to previously visited locations with full context, enabling informed decision-making even with new inventory or changed game state.

Example Logic

Decision flow at each step:

STEP N:
├─ Get current location from Jericho API
├─ Check if location exists in memory
│
├─ IF NEW LOCATION:
│  ├─ Query Jericho API for valid_actions at this location
│  ├─ Initialize memory entry:
│  │   └─ valid_actions = [from API]
│  │   └─ tried_actions = {}
│  │   └─ promising_actions = []
│  │   └─ results = {}
│  │   └─ visited = 1
│  │
│  ├─ LLM CALL 1 (Action Selection):
│  │   Input:  - Current observation
│  │   │       - Game state (score, moves, inventory)
│  │   │       - Valid actions (no history yet)
│  │   Output: - THOUGHT (reasoning about situation)
│  │   │       - PROMISING_ACTIONS (up to 10 from valid list)
│  │   │       - REASONING (evaluate options before choosing)
│  │   │       - CHOSEN_ACTION (single best action)
│  │   └─ Store promising_actions in memory
│  │
│  ├─ Execute CHOSEN_ACTION via play_action()
│  │   └─ Receive observation from game
│  │
│  ├─ LLM CALL 2 (Outcome Summarization):
│  │   Input:  - Action executed
│  │   │       - Full observation received
│  │   Output: - OUTCOME_SUMMARY (1-2 sentences)
│  │   │       - SUCCESS (yes/no/partial)
│  │   │       - KEY_INFO (important detail)
│  │
│  └─ UPDATE MEMORY:
│      └─ Add to tried_actions
│      └─ Store in results[action] = {observation, summary, success, key_info}
│
└─ IF KNOWN LOCATION:
   ├─ Increment visited count
   ├─ Retrieve existing memory entry with:
   │   └─ valid_actions (from first visit)
   │   └─ tried_actions (all previous attempts)
   │   └─ results (with summaries, not full observations)
   │   └─ promising_actions (from last visit)
   │
   ├─ LLM CALL 1 (Action Selection with Context):
   │   Input:  - Current observation
   │   │       - Game state (score, moves, inventory)
   │   │       - Valid actions
   │   │       - Previously tried actions WITH SUMMARIES
   │   │       - Previous promising actions
   │   │       - Strategic hints (stagnation warnings, etc.)
   │   Output: - THOUGHT (considers past failures/successes)
   │   │       - PROMISING_ACTIONS (RE-EVALUATED for current context)
   │   │       - REASONING (evaluate options incorporating learned information)
   │   │       - CHOSEN_ACTION (avoids failed actions unless context changed)
   │   └─ Update promising_actions in memory
   │
   ├─ Execute CHOSEN_ACTION via play_action()
   │   └─ Receive observation from game
   │
   ├─ LLM CALL 2 (Outcome Summarization):
   │   Input:  - Action executed
   │   │       - Full observation received
   │   Output: - OUTCOME_SUMMARY (concise learning)
   │   │       - SUCCESS (yes/no/partial)
   │   │       - KEY_INFO (important detail)
   │
   └─ UPDATE MEMORY:
       └─ Add to tried_actions
       └─ Store in results[action] = {observation, summary, success, key_info}
       └─ Agent now has richer context for next visit

Key insight: The LLM always re-evaluates promising actions on location revisits, accounting for changed game state (new inventory, completed puzzles, different objectives). This enables adaptive exploration rather than rigid scripted behavior.

Two-Phase LLM Strategy

Phase 1 - Action Selection: The LLM receives current observation, game state (score, moves, inventory), and formatted location memory showing valid actions, previous promising actions, and tried actions with concise summaries (not overwhelming full text). The LLM then identifies up to 10 promising actions from available options, evaluates them through reasoning, and selects the single best action to execute. This sequential ordering (THOUGHT → PROMISING_ACTIONS → REASONING → CHOSEN_ACTION) aligns with how LLMs naturally generate tokens, ensuring reasoning happens before the final decision.

Phase 2 - Outcome Summarization: After action execution, a second LLM call analyzes the full observation and generates a concise 1-2 sentence summary, success classification (yes/no/partial), and key information to remember. This summary is stored in memory, forcing the agent to actively process outcomes rather than passively accumulating raw text.

Key Strategic Features

Object-focused exploration: System prompt emphasizes that examining and interacting with props/objects is often critical for progress, with explicit guidance to try multiple interaction types (examine, take, open, read, push, pull, turn)
Movement tracking: Detects object movements in observations and provides hints to follow them
Stagnation detection: Monitors score progress and warns when exploration becomes circular
Context preservation: Full observations archived for debugging while summaries keep prompts manageable
Dynamic re-evaluation: Always recalculates promising actions on location revisits, accounting for changed context (new items, completed objectives)

Comments on Agent's performances

The implementation of this Text Adventure Agent ensures that the LLM agent can efficiently access past game history, relatively to the agent's current location in the game. As a result, over the course of a few lostpig game iterations, the agent discovers the Hole in less than 10 steps, explores a significant part of the Cave in around 50 steps and meets the Gnome in under 100 steps. However, the point scoring system becomes pretty obscure (even for a Human player!) after 2 points scored. Without looking at the gameplay solutions, it is not directly obvious why finding the coin in the fountain brings one point to the player, whereas discovering the Gnome's room and talking to him would not add any points to the score.

Files

File	Description
`agent.py`	Memory-driven agent with two-phase LLM approach
`mcp_server.py`	MCP server with game interaction tools
`app.py`	Gradio interface for HF Space
`requirements.txt`	Additional dependencies

How to Submit

Fork the template Space: https://huggingface.co/spaces/LLM-course/text-adventure-template
Clone your fork locally
Implement your agent in agent.py and mcp_server.py
Test locally (see below)
Push your changes to your Space
Submit your Space URL on the course platform

Local Testing

# Install dependencies
pip install -r requirements.txt

# Test the MCP server interactively
fastmcp dev mcp_server.py

# Run your agent on a game
python run_agent.py --agent . --game lostpig -v -n 20

# Run evaluation
python -m evaluation.evaluate -s . -g lostpig -t 3