Chloé Court commited on
Commit
227f8c5
·
1 Parent(s): 42317c3

reduce README.md size

Browse files
Files changed (1) hide show
  1. README.md +27 -118
README.md CHANGED
@@ -12,106 +12,39 @@ license: mit
12
  # Autonomous Text Adventure Agent
13
 
14
  ## Abstract
15
- This project implements an autonomous agent capable of solving parser-based interactive fiction games such as Zork-style environments.
16
- Unlike naive LLM bots that directly generate actions, this agent introduces a structured planning architecture combining:
17
 
18
- * ReAct-style reasoning loops
19
- * Incremental world-state memory
20
- * Dual-layer action proposal system
21
- * Hallucination-resistant decision filtering
22
- * Exploration efficiency bias
23
 
24
- The primary objective is to maximize game progress while minimizing redundant interactions and logical inconsistencies.
25
 
26
  ## Design Philosophy
27
- The agent is built around three core principles:
28
 
29
  ### 1. State-Aware Reasoning
30
- The agent maintains a persistent cognitive model of each visited location.
31
- Memory is updated incrementally rather than rewritten.
 
32
 
33
- **Key invariant:**
34
- The memory of a location represents the best known approximation of the current true environment state.
35
- State transitions are handled explicitly when observations contradict previous knowledge.
36
 
37
- **Examples:**
38
- * If a door is described as "open", previous "closed" state is removed.
39
- * If an object is taken, it is removed from room inventory.
40
- * If an object is dropped, it is added to room inventory.
41
 
42
- This design reduces semantic drift and improves long-term planning stability.
 
 
 
 
 
 
 
43
 
44
- ### 2. Dual-Level Action Planning
45
- The agent uses two complementary action suggestion mechanisms.
46
-
47
- **Valid Action Filtering**
48
- The MCP server exposes environment-provided action constraints through:
49
- `get_valid_actions()`
50
- This serves as a hallucination safety layer.
51
- However, parser-based adventure games often have incomplete action listings.
52
-
53
- *Example:* In some Zork environments, `open window` may succeed. But `enter house` may also be logically valid even if not explicitly listed.
54
-
55
- **Promising Hint Generation**
56
- To address action space incompleteness, the planner LLM generates promising strategic hints.
57
- Promising hints are:
58
- * Contextually grounded in observation
59
- * Semantically plausible
60
- * Non-hallucinatory with respect to objects and environment
61
- * Designed to reveal hidden interaction opportunities
62
-
63
- Hints are not direct executable commands but guide downstream action selection.
64
-
65
- ## Memory Architecture
66
- The agent maintains location-scoped structured memory. Each location stores:
67
- * Cumulative environmental description
68
- * Objects discovered
69
- * Actions attempted
70
- * Observation history
71
- * Exploration directions
72
-
73
- **Memory update policy follows a conservative overwrite strategy:**
74
- * Preserve stable facts
75
- * Remove only explicitly contradicted information
76
- * Avoid stylistic rewriting
77
- * Track current object states only
78
-
79
- This enables long-horizon reasoning across revisits.
80
-
81
- ## Loop Prevention Strategy
82
- Repetition traps are a major failure mode in LLM agents. This system introduces multi-layer anti-loop mechanisms.
83
-
84
- ### Tool Oscillation Control
85
- The agent enforces: No non-action tool can be used more than twice consecutively.
86
-
87
- ### Action Blacklisting
88
- The agent tracks all actions attempted in the current location.
89
- **Policy:** Never repeat failed or ineffective actions unless environment state changes.
90
-
91
- ### Stagnation Escape Rule
92
- If progress is not detected after several attempts:
93
- * Change interaction verb.
94
- * Prioritize alternative object manipulation.
95
- * Explore least recently visited directions.
96
-
97
- ## Exploration Policy
98
- The agent maintains a balanced exploration strategy.
99
- **Priority order:**
100
- 1. High-value puzzle-solving interactions
101
- 2. Object manipulation actions
102
- 3. Environment transition actions
103
- 4. Systematic exploration of unexplored space
104
-
105
- Random movement is strictly forbidden. Movement is suggested only when local interactions are exhausted or puzzle progression is unlikely.
106
 
107
- ## Hallucination Control
108
- The planner LLM is constrained by grounding rules.
109
- **Forbidden behaviors include:**
110
- * Introducing objects not mentioned in observation
111
- * Suggesting impossible actions
112
- * Generating vague hints
113
 
114
- All proposed hints must be supported by observation context and compatible with environment physics and parser logic.
 
 
 
 
115
 
116
  ## Valid Action vs Hint Separation
117
 
@@ -121,38 +54,14 @@ All proposed hints must be supported by observation context and compatible with
121
  | **Promising Hints** | Strategic reasoning suggestions |
122
  | **Planner Memory** | Long-term state tracking |
123
 
124
- This separation improves robustness in partially observable environments.
125
-
126
- ## Location-Level Logging
127
- For each location, the agent records:
128
- * Discovered objects
129
- * Attempted actions and outcomes
130
- * Generated hints
131
- * Observation sequences
132
-
133
- When re-entering a location, cumulative memory is used.
134
-
135
- ## Performance Objective
136
- The agent optimizes the following metric:
137
-
138
- $$Efficiency = \frac{Score}{\max(1, Number\ of\ Moves)}$$
139
-
140
- **Secondary objectives include:**
141
- * Map coverage maximization
142
- * Puzzle completion rate
143
- * Unique object discovery
144
-
145
- ## Evaluation Strategy
146
- The agent is evaluated based on:
147
- * Final game score
148
- * Exploration completeness
149
- * Move efficiency
150
- * Loop avoidance rate
151
- * Puzzle solving success
152
 
153
  ## Summary
154
- This project demonstrates that combining structured memory, constrained LLM planning, and rule-based safety filtering can improve autonomous performance in parser-based interactive fiction environments.
155
-
156
  ---
157
 
158
  ## Files
 
12
  # Autonomous Text Adventure Agent
13
 
14
  ## Abstract
15
+ This project implements an autonomous agent for parser-based games (e.g., Zork) using a structured planning architecture. Unlike naive LLM bots, it combines ReAct reasoning loops with incremental world-state memory and a dual-layer action proposal system to maximize progress while minimizing redundancy.
 
16
 
 
 
 
 
 
17
 
 
18
 
19
  ## Design Philosophy
 
20
 
21
  ### 1. State-Aware Reasoning
22
+ The agent maintains a persistent cognitive model per location. Memory is updated incrementally:
23
+ * **Invariant:** Memory reflects the best known approximation of the environment.
24
+ * **Transitions:** Facts are updated only when observations contradict previous knowledge (e.g., updating a door from "closed" to "open" or tracking inventory movement). This prevents semantic drift.
25
 
26
+ ### 2. Dual-Level Action Planning
27
+ * **Valid Action Filtering:** Uses `get_valid_actions()` via MCP as a hallucination safety layer.
28
+ * **Promising Hint Generation:** The LLM generates grounded strategic hints for logically valid but unlisted actions (e.g., "enter house"). These guide selection without bypassing environment constraints.
29
 
30
+ ## Memory & Exploration
 
 
 
31
 
32
+ ### Memory Architecture
33
+ Each location-scoped entry stores: cumulative descriptions, discovered objects, attempted actions, and exploration history. A **conservative overwrite strategy** preserves stable facts and tracks object states across revisits.
34
+
35
+ ### Loop Prevention & Stagnation
36
+ To avoid LLM "repetition traps," the agent enforces:
37
+ * **Tool Oscillation Control:** Limits consecutive non-action tool use.
38
+ * **Action Blacklisting:** Never repeats failed actions unless the state changes.
39
+ * **Stagnation Escape:** If progress halts, the agent switches interaction verbs or moves to the least recently visited area.
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
 
 
 
 
 
 
42
 
43
+ ### Exploration Policy
44
+ Priority: Puzzle-solving > Object manipulation > Transitions > Systematic exploration. Movement is only suggested when local interactions are exhausted; random movement is forbidden.
45
+
46
+ ## Hallucination Control
47
+ The planner is strictly grounded. It cannot invent objects, suggest impossible transitions, or generate vague hints. All proposals must be supported by observation and compatible with game physics.
48
 
49
  ## Valid Action vs Hint Separation
50
 
 
54
  | **Promising Hints** | Strategic reasoning suggestions |
55
  | **Planner Memory** | Long-term state tracking |
56
 
57
+ ## Evaluation
58
+ Success is measured by:
59
+ * **Efficiency:** $Score / \max(1, Moves)$
60
+ * **Completeness:** Map coverage and puzzle success rate.
61
+ * **Robustness:** Unique object discovery and loop avoidance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ## Summary
64
+ This project demonstrates that structured memory and rule-based safety filtering significantly improve autonomous performance in partially observable text environments.
 
65
  ---
66
 
67
  ## Files