text-adventure-template

Sleeping

App Files Files Community

Chloé Court commited on Feb 22

Commit

227f8c5

1 Parent(s): 42317c3

reduce README.md size

Browse files

Files changed (1) hide show

README.md +27 -118

README.md CHANGED Viewed

@@ -12,106 +12,39 @@ license: mit
 # Autonomous Text Adventure Agent
 ## Abstract
-This project implements an autonomous agent capable of solving parser-based interactive fiction games such as Zork-style environments.
-Unlike naive LLM bots that directly generate actions, this agent introduces a structured planning architecture combining:
-* ReAct-style reasoning loops
-* Incremental world-state memory
-* Dual-layer action proposal system
-* Hallucination-resistant decision filtering
-* Exploration efficiency bias
-The primary objective is to maximize game progress while minimizing redundant interactions and logical inconsistencies.
 ## Design Philosophy
-The agent is built around three core principles:
 ### 1. State-Aware Reasoning
-The agent maintains a persistent cognitive model of each visited location.
-Memory is updated incrementally rather than rewritten.
-**Key invariant:**
-The memory of a location represents the best known approximation of the current true environment state.
-State transitions are handled explicitly when observations contradict previous knowledge.
-**Examples:**
-* If a door is described as "open", previous "closed" state is removed.
-* If an object is taken, it is removed from room inventory.
-* If an object is dropped, it is added to room inventory.
-This design reduces semantic drift and improves long-term planning stability.
-### 2. Dual-Level Action Planning
-The agent uses two complementary action suggestion mechanisms.
-**Valid Action Filtering**
-The MCP server exposes environment-provided action constraints through:
-`get_valid_actions()`
-This serves as a hallucination safety layer.
-However, parser-based adventure games often have incomplete action listings.
-*Example:* In some Zork environments, `open window` may succeed. But `enter house` may also be logically valid even if not explicitly listed.
-**Promising Hint Generation**
-To address action space incompleteness, the planner LLM generates promising strategic hints.
-Promising hints are:
-* Contextually grounded in observation
-* Semantically plausible
-* Non-hallucinatory with respect to objects and environment
-* Designed to reveal hidden interaction opportunities
-Hints are not direct executable commands but guide downstream action selection.
-## Memory Architecture
-The agent maintains location-scoped structured memory. Each location stores:
-* Cumulative environmental description
-* Objects discovered
-* Actions attempted
-* Observation history
-* Exploration directions
-**Memory update policy follows a conservative overwrite strategy:**
-* Preserve stable facts
-* Remove only explicitly contradicted information
-* Avoid stylistic rewriting
-* Track current object states only
-This enables long-horizon reasoning across revisits.
-## Loop Prevention Strategy
-Repetition traps are a major failure mode in LLM agents. This system introduces multi-layer anti-loop mechanisms.
-### Tool Oscillation Control
-The agent enforces: No non-action tool can be used more than twice consecutively.
-### Action Blacklisting
-The agent tracks all actions attempted in the current location.
-**Policy:** Never repeat failed or ineffective actions unless environment state changes.
-### Stagnation Escape Rule
-If progress is not detected after several attempts:
-* Change interaction verb.
-* Prioritize alternative object manipulation.
-* Explore least recently visited directions.
-## Exploration Policy
-The agent maintains a balanced exploration strategy.
-**Priority order:**
-1. High-value puzzle-solving interactions
-2. Object manipulation actions
-3. Environment transition actions
-4. Systematic exploration of unexplored space
-Random movement is strictly forbidden. Movement is suggested only when local interactions are exhausted or puzzle progression is unlikely.
-## Hallucination Control
-The planner LLM is constrained by grounding rules.
-**Forbidden behaviors include:**
-* Introducing objects not mentioned in observation
-* Suggesting impossible actions
-* Generating vague hints
-All proposed hints must be supported by observation context and compatible with environment physics and parser logic.
 ## Valid Action vs Hint Separation
@@ -121,38 +54,14 @@ All proposed hints must be supported by observation context and compatible with
 | **Promising Hints** | Strategic reasoning suggestions |
 | **Planner Memory** | Long-term state tracking |
-This separation improves robustness in partially observable environments.
-## Location-Level Logging
-For each location, the agent records:
-* Discovered objects
-* Attempted actions and outcomes
-* Generated hints
-* Observation sequences
-When re-entering a location, cumulative memory is used.
-## Performance Objective
-The agent optimizes the following metric:
-$$Efficiency = \frac{Score}{\max(1, Number\ of\ Moves)}$$
-**Secondary objectives include:**
-* Map coverage maximization
-* Puzzle completion rate
-* Unique object discovery
-## Evaluation Strategy
-The agent is evaluated based on:
-* Final game score
-* Exploration completeness
-* Move efficiency
-* Loop avoidance rate
-* Puzzle solving success
 ## Summary
-This project demonstrates that combining structured memory, constrained LLM planning, and rule-based safety filtering can improve autonomous performance in parser-based interactive fiction environments.
 ---
 ## Files

 # Autonomous Text Adventure Agent
 ## Abstract
+This project implements an autonomous agent for parser-based games (e.g., Zork) using a structured planning architecture. Unlike naive LLM bots, it combines ReAct reasoning loops with incremental world-state memory and a dual-layer action proposal system to maximize progress while minimizing redundancy.
 ## Design Philosophy
 ### 1. State-Aware Reasoning
+The agent maintains a persistent cognitive model per location. Memory is updated incrementally:
+* **Invariant:** Memory reflects the best known approximation of the environment.
+* **Transitions:** Facts are updated only when observations contradict previous knowledge (e.g., updating a door from "closed" to "open" or tracking inventory movement). This prevents semantic drift.
+### 2. Dual-Level Action Planning
+* **Valid Action Filtering:** Uses `get_valid_actions()` via MCP as a hallucination safety layer.
+* **Promising Hint Generation:** The LLM generates grounded strategic hints for logically valid but unlisted actions (e.g., "enter house"). These guide selection without bypassing environment constraints.
+## Memory & Exploration
+### Memory Architecture
+Each location-scoped entry stores: cumulative descriptions, discovered objects, attempted actions, and exploration history. A **conservative overwrite strategy** preserves stable facts and tracks object states across revisits.
+### Loop Prevention & Stagnation
+To avoid LLM "repetition traps," the agent enforces:
+* **Tool Oscillation Control:** Limits consecutive non-action tool use.
+* **Action Blacklisting:** Never repeats failed actions unless the state changes.
+* **Stagnation Escape:** If progress halts, the agent switches interaction verbs or moves to the least recently visited area.
+### Exploration Policy
+Priority: Puzzle-solving > Object manipulation > Transitions > Systematic exploration. Movement is only suggested when local interactions are exhausted; random movement is forbidden.
+## Hallucination Control
+The planner is strictly grounded. It cannot invent objects, suggest impossible transitions, or generate vague hints. All proposals must be supported by observation and compatible with game physics.
 ## Valid Action vs Hint Separation
 | **Promising Hints** | Strategic reasoning suggestions |
 | **Planner Memory** | Long-term state tracking |
+## Evaluation
+Success is measured by:
+* **Efficiency:** $Score / \max(1, Moves)$
+* **Completeness:** Map coverage and puzzle success rate.
+* **Robustness:** Unique object discovery and loop avoidance.
 ## Summary
+This project demonstrates that structured memory and rule-based safety filtering significantly improve autonomous performance in partially observable text environments.
 ---
 ## Files