Spaces:

KshitijAmbilduke
/

text-adv-agent-mva

Runtime error

App Files Files Community

text-adv-agent-mva / README.md

Kshitij-Ambilduke

readme

598db53 about 2 months ago

preview code

raw

history blame contribute delete

4.31 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Text Based Game Agent
emoji: 🎮
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: false

Text based game agent

Submitted by: Kshitij AMBILDUKE

Following are the key improvements in the agent.py file:

1. Memory Management (The Memory Condenser)

Standard LLM agents inevitably fail at long text adventures because their context windows bloat with redundant history, causing them to "forget" crucial early-game details or result into degraded reasoning.

Dual-Model Architecture: This system implements a two-tier LLM approach. The primary heavy-weight model (Qwen2.5-72B-Instruct) handles moment-to-moment ReAct reasoning, while a secondary, lightweight model (Qwen2.5-7B-Instruct) acts as a dedicated background "Memory Condenser."

Active Compression Engine: Instead of a basic rolling array, the agent maintains a history_buffer. Every 8 steps, the 7B model processes this buffer alongside the existing summary. It is explicitly prompted to act as a "world modeler" rather than a simple summarizer.

Semantic Filtering for Long-Horizon Planning: The condenser's system prompt strictly dictates what to keep (unsolved puzzles, unexplored exits, items with potential future use, locked doors) and what to aggressively prune (resolved puzzles, narrative flavor, consumed items, and exact dialogue). This results in a highly concentrated, persistent state representation that guides multi-step planning without exceeding token limits.

2. Map Manager (Dynamic Graphing)

Passive tracking of string output easily leads to agents getting hopelessly lost or stuck in infinite navigation loops.

Active Spatial Mapping: The custom MapManager actively parses game observations (using a heuristic that isolates the first non-empty line as the room name) to generate and store interconnected RoomNode dataclasses.

Graph Building & Edge Creation: It tracks the current location, records the visited_count, and stores the full room description. More importantly, it automatically links rooms together based on movement directions (tracking a comprehensive list of valid directions including ne, nw, up, down, enter, and exit). For example, it logically deduces that moving "North" links the previous room to the newly discovered one.

Real-Time Context Injection: At the start of every ReAct loop, a localized map is directly injected into the agent's prompt. This block explicitly outlines the CURRENT LOCATION, Known paths from here, and Times visited. This grounded spatial awareness drastically reduces exploration redundancy.

3. System Prompt & Reasoning Enhancements

To force the LLM to actively utilize its new Condensed Memory and Map, the prompt engineering and output parsers were completely overhauled to enforce strict operational protocols.

Structured Thinking Pipeline: The agent is restricted from free-form thinking. Its THOUGHT block must contain a multi-step, 3-part reasoning chain:

[Context]: Analyze the immediate room description and threats.

[Deduction]: Cross-reference the environment with the injected Map and Condensed Memory.

[Selection]: Decide on the highest-priority action based on a predefined hierarchy (Explore > Take > Examine > Solve).

Embedded Strategy Guide: A 10-point text adventure heuristic guide is injected directly into the prompt. It trains the model on genre-specific survival tactics, such as examining everything, picking up all portable items, utilizing light sources in dark areas, and trying non-standard directions.

Intelligent Loop Breaking: If the agent gets stuck in a repetitive action loop (repeating the same action 3 times), the system intercepts the loop. Instead of defaulting to a basic "look" command, it scans the current observation text and dynamically forces a contextual alternative (e.g., if the word "door" is present, it forces "examine door" or "open door"; if compass directions are present, it forces movement).

Robust Output Parsing: Upgraded the regex parser using re.DOTALL to reliably capture multi-line reasoning blocks. It also includes comprehensive fallback layers: it strips hallucinated markdown blocks (like ````json`), automatically repairs single-quote JSON errors, and uses targeted regex extractions as a last resort to salvage malformed payloads.