Spaces:
Sleeping
Sleeping
Valentin Badea commited on
Commit ·
b113a1e
1
Parent(s): 7a36b3c
Implemented memory-driven agent with two-phase LLM approach (Priorization/Summarization)
Browse files- .gitignore +15 -0
- README.md +27 -6
- agent.py +477 -186
- mcp_server.py +219 -90
- z-machine-games-master +1 -0
.gitignore
CHANGED
|
@@ -20,3 +20,18 @@ venv/
|
|
| 20 |
# OS
|
| 21 |
.DS_Store
|
| 22 |
Thumbs.db
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
# OS
|
| 21 |
.DS_Store
|
| 22 |
Thumbs.db
|
| 23 |
+
|
| 24 |
+
# game binaries / collections (do not commit)
|
| 25 |
+
z-machine-games-master/
|
| 26 |
+
**/*.z1
|
| 27 |
+
**/*.z2
|
| 28 |
+
**/*.z3
|
| 29 |
+
**/*.z4
|
| 30 |
+
**/*.z5
|
| 31 |
+
**/*.z6
|
| 32 |
+
**/*.z7
|
| 33 |
+
**/*.z8
|
| 34 |
+
**/*.zblorb
|
| 35 |
+
**/*.blb
|
| 36 |
+
**/*.zip
|
| 37 |
+
|
README.md
CHANGED
|
@@ -14,21 +14,42 @@ license: mit
|
|
| 14 |
|
| 15 |
## Overview
|
| 16 |
|
| 17 |
-
This
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Approach
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
- What strategy does your agent use?
|
| 24 |
-
- What tools did you implement in your MCP server?
|
| 25 |
-
- Any interesting techniques or optimizations?
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
| 29 |
| File | Description |
|
| 30 |
|------|-------------|
|
| 31 |
-
| `agent.py` |
|
| 32 |
| `mcp_server.py` | MCP server with game interaction tools |
|
| 33 |
| `app.py` | Gradio interface for HF Space |
|
| 34 |
| `requirements.txt` | Additional dependencies |
|
|
|
|
| 14 |
|
| 15 |
## Overview
|
| 16 |
|
| 17 |
+
This agent uses a memory-driven architecture with a two-phase LLM approach to systematically explore text adventure games. At each step, the agent leverages Jericho's API to access valid actions and current location data, maintaining a structured memory dictionary that records location-specific information including tried actions, available actions, promising action subsets, and summarized outcomes.
|
| 18 |
+
|
| 19 |
+
The core innovation is the dual LLM call strategy: first for strategic action selection with reasoning over promising action subsets (up to 10), and second for outcome summarization that ensures the agent actively "listens" to and learns from each action result. This approach balances comprehensive exploration with concise memory management, preventing context overflow while maintaining rich historical knowledge.
|
| 20 |
|
| 21 |
## Approach
|
| 22 |
|
| 23 |
+
### Memory Architecture
|
| 24 |
+
|
| 25 |
+
The agent maintains a location-indexed memory dictionary with the following structure:
|
| 26 |
+
- **valid_actions**: Location-specific actions from Jericho's API (verified to work)
|
| 27 |
+
- **tried_actions**: Set of actions already attempted at this location
|
| 28 |
+
- **promising_actions**: LLM-selected subset (max 10) of strategic actions to consider
|
| 29 |
+
- **results**: For each tried action, stores {observation, summary, success, key_info}
|
| 30 |
+
|
| 31 |
+
This structure allows the agent to return to previously visited locations with full context, enabling informed decision-making even with new inventory or changed game state.
|
| 32 |
+
|
| 33 |
+
### Two-Phase LLM Strategy
|
| 34 |
+
|
| 35 |
+
**Phase 1 - Action Selection**: The LLM receives current observation, game state (score, moves, inventory), and formatted location memory showing valid actions, previous promising actions, and tried actions with concise summaries (not overwhelming full text). The LLM then identifies up to 10 promising actions from available options and selects the single best action to execute, with explicit reasoning.
|
| 36 |
+
|
| 37 |
+
**Phase 2 - Outcome Summarization**: After action execution, a second LLM call analyzes the full observation and generates a concise 1-2 sentence summary, success classification (yes/no/partial), and key information to remember. This summary is stored in memory, forcing the agent to actively process outcomes rather than passively accumulating raw text.
|
| 38 |
+
|
| 39 |
+
### Key Strategic Features
|
| 40 |
+
|
| 41 |
+
- **Object-focused exploration**: System prompt emphasizes that examining and interacting with props/objects is often critical for progress, with explicit guidance to try multiple interaction types (examine, take, open, read, push, pull, turn)
|
| 42 |
+
- **Movement tracking**: Detects object movements in observations and provides hints to follow them
|
| 43 |
+
- **Stagnation detection**: Monitors score progress and warns when exploration becomes circular
|
| 44 |
+
- **Context preservation**: Full observations archived for debugging while summaries keep prompts manageable
|
| 45 |
+
- **Dynamic re-evaluation**: Always recalculates promising actions on location revisits, accounting for changed context (new items, completed objectives)
|
| 46 |
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
## Files
|
| 49 |
|
| 50 |
| File | Description |
|
| 51 |
|------|-------------|
|
| 52 |
+
| `agent.py` | Memory-driven agent with two-phase LLM approach |
|
| 53 |
| `mcp_server.py` | MCP server with game interaction tools |
|
| 54 |
| `app.py` | Gradio interface for HF Space |
|
| 55 |
| `requirements.txt` | Additional dependencies |
|
agent.py
CHANGED
|
@@ -1,48 +1,32 @@
|
|
| 1 |
"""
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
4. Maximize the game score within the step limit
|
| 12 |
-
|
| 13 |
-
Required method:
|
| 14 |
-
async def run(self, client, game, max_steps, seed, verbose) -> RunResult
|
| 15 |
-
|
| 16 |
-
The 'client' is a FastMCP Client already connected to your MCP server.
|
| 17 |
-
Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
|
| 18 |
-
|
| 19 |
-
Tips:
|
| 20 |
-
- Start by looking around and understanding your environment
|
| 21 |
-
- Keep track of visited locations to avoid loops
|
| 22 |
-
- Pick up useful items (lamp, sword, etc.)
|
| 23 |
-
- The seed parameter should be used to set your LLM's seed for reproducibility
|
| 24 |
"""
|
| 25 |
|
| 26 |
import json
|
| 27 |
import os
|
| 28 |
import re
|
| 29 |
from dataclasses import dataclass, field
|
| 30 |
-
from typing import Optional
|
| 31 |
|
| 32 |
from dotenv import load_dotenv
|
| 33 |
from huggingface_hub import InferenceClient
|
| 34 |
|
| 35 |
-
# Load environment variables
|
| 36 |
load_dotenv()
|
| 37 |
|
| 38 |
# =============================================================================
|
| 39 |
# LLM Configuration - DO NOT MODIFY
|
| 40 |
# =============================================================================
|
| 41 |
|
| 42 |
-
# Model to use (fixed for fair evaluation)
|
| 43 |
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 44 |
|
| 45 |
-
# Initialize the LLM client (uses HF_TOKEN from environment)
|
| 46 |
_hf_token = os.getenv("HF_TOKEN")
|
| 47 |
if not _hf_token:
|
| 48 |
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
|
@@ -50,45 +34,25 @@ if not _hf_token:
|
|
| 50 |
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 51 |
|
| 52 |
|
| 53 |
-
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int =
|
| 54 |
-
"""
|
| 55 |
-
Call the LLM with the given prompt. Use this function in your agent.
|
| 56 |
-
|
| 57 |
-
Args:
|
| 58 |
-
prompt: The user prompt (current game state, history, etc.)
|
| 59 |
-
system_prompt: The system prompt (instructions for the agent)
|
| 60 |
-
seed: Random seed for reproducibility
|
| 61 |
-
max_tokens: Maximum tokens in response (default: 300)
|
| 62 |
-
|
| 63 |
-
Returns:
|
| 64 |
-
The LLM's response text
|
| 65 |
-
|
| 66 |
-
Example:
|
| 67 |
-
response = call_llm(
|
| 68 |
-
prompt="You are in a forest. What do you do?",
|
| 69 |
-
system_prompt=SYSTEM_PROMPT,
|
| 70 |
-
seed=42,
|
| 71 |
-
)
|
| 72 |
-
"""
|
| 73 |
messages = [
|
| 74 |
{"role": "system", "content": system_prompt},
|
| 75 |
{"role": "user", "content": prompt},
|
| 76 |
]
|
| 77 |
-
|
| 78 |
response = LLM_CLIENT.chat.completions.create(
|
| 79 |
model=LLM_MODEL,
|
| 80 |
messages=messages,
|
| 81 |
-
temperature=0.0,
|
| 82 |
max_tokens=max_tokens,
|
| 83 |
seed=seed,
|
| 84 |
)
|
| 85 |
-
|
| 86 |
return response.choices[0].message.content
|
| 87 |
|
| 88 |
|
| 89 |
@dataclass
|
| 90 |
class RunResult:
|
| 91 |
-
"""Result of running the agent. Do not modify this class."""
|
| 92 |
final_score: int
|
| 93 |
max_score: int
|
| 94 |
moves: int
|
|
@@ -99,152 +63,482 @@ class RunResult:
|
|
| 99 |
|
| 100 |
|
| 101 |
# =============================================================================
|
| 102 |
-
# System
|
| 103 |
# =============================================================================
|
| 104 |
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
RESPOND IN THIS EXACT FORMAT (no markdown):
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
|
| 124 |
-
Example:
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
| 128 |
"""
|
| 129 |
|
| 130 |
|
| 131 |
# =============================================================================
|
| 132 |
-
# Student Agent -
|
| 133 |
# =============================================================================
|
| 134 |
|
| 135 |
class StudentAgent:
|
| 136 |
-
"""
|
| 137 |
-
Your ReAct agent implementation.
|
| 138 |
-
|
| 139 |
-
TODO:
|
| 140 |
-
1. Implement the run() method with the ReAct loop
|
| 141 |
-
2. Parse LLM responses to extract tool calls
|
| 142 |
-
3. Track state and avoid loops
|
| 143 |
-
|
| 144 |
-
Use the provided call_llm() function to interact with the LLM.
|
| 145 |
-
"""
|
| 146 |
-
|
| 147 |
def __init__(self):
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
#
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
"""
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
#
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
# result = await client.call_tool("play_action", {"action": "look"})
|
| 190 |
-
# observation = result[0].text if result else "No response"
|
| 191 |
-
|
| 192 |
-
# Example of calling the LLM:
|
| 193 |
-
# response = call_llm(
|
| 194 |
-
# prompt="Current observation: " + observation,
|
| 195 |
-
# system_prompt=SYSTEM_PROMPT,
|
| 196 |
-
# seed=seed,
|
| 197 |
-
# )
|
| 198 |
-
|
| 199 |
-
# Placeholder implementation - replace with your code
|
| 200 |
-
locations_visited = set()
|
| 201 |
-
history = []
|
| 202 |
-
final_score = 0
|
| 203 |
-
moves = 0
|
| 204 |
-
|
| 205 |
-
# TODO: Your implementation here
|
| 206 |
-
# ...
|
| 207 |
-
|
| 208 |
-
return RunResult(
|
| 209 |
-
final_score=final_score,
|
| 210 |
-
max_score=350, # Zork1 max score, adjust if needed
|
| 211 |
-
moves=moves,
|
| 212 |
-
locations_visited=locations_visited,
|
| 213 |
-
game_completed=False,
|
| 214 |
-
history=history,
|
| 215 |
-
)
|
| 216 |
-
|
| 217 |
-
def _build_prompt(self, observation: str, history: list) -> str:
|
| 218 |
-
"""
|
| 219 |
-
Build the prompt for the LLM.
|
| 220 |
-
|
| 221 |
-
TODO: Implement this to create effective prompts
|
| 222 |
-
"""
|
| 223 |
-
# TODO: Combine system prompt, history, and current observation
|
| 224 |
-
pass
|
| 225 |
-
|
| 226 |
-
def _parse_response(self, response: str) -> tuple[str, str, dict]:
|
| 227 |
"""
|
| 228 |
-
Parse
|
| 229 |
-
|
| 230 |
-
TODO: Implement robust parsing
|
| 231 |
-
|
| 232 |
-
Returns:
|
| 233 |
-
Tuple of (thought, tool_name, args_dict)
|
| 234 |
"""
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
"""
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
This is a convenience wrapper - you can also use call_llm() directly.
|
| 246 |
"""
|
| 247 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
|
| 250 |
# =============================================================================
|
|
@@ -254,24 +548,21 @@ class StudentAgent:
|
|
| 254 |
async def test_agent():
|
| 255 |
"""Test the agent locally."""
|
| 256 |
from fastmcp import Client
|
| 257 |
-
|
| 258 |
-
# Path to your MCP server
|
| 259 |
-
server_path = "mcp_server.py"
|
| 260 |
-
|
| 261 |
agent = StudentAgent()
|
| 262 |
-
|
| 263 |
-
async with Client(
|
| 264 |
result = await agent.run(
|
| 265 |
client=client,
|
| 266 |
-
game="
|
| 267 |
-
max_steps=
|
| 268 |
seed=42,
|
| 269 |
verbose=True,
|
| 270 |
)
|
| 271 |
-
|
| 272 |
print(f"\nFinal Score: {result.final_score}")
|
| 273 |
print(f"Moves: {result.moves}")
|
| 274 |
-
print(f"Locations: {result.locations_visited}")
|
| 275 |
|
| 276 |
|
| 277 |
if __name__ == "__main__":
|
|
|
|
| 1 |
"""
|
| 2 |
+
Memory-driven agent with two-phase LLM approach:
|
| 3 |
+
1. Action Selection: Choose promising actions (max 10) and pick best one
|
| 4 |
+
2. Outcome Summarization: Summarize action result for memory storage
|
| 5 |
+
|
| 6 |
+
Strategy:
|
| 7 |
+
- Location memory tracks: valid_actions, tried_actions, promising_actions, results
|
| 8 |
+
- Results store: {observation, summary, success, key_info} for each action
|
| 9 |
+
- Agent sees concise summaries in context, not overwhelming full observations
|
| 10 |
+
- Forces agent to "listen" by summarizing outcomes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
| 13 |
import json
|
| 14 |
import os
|
| 15 |
import re
|
| 16 |
from dataclasses import dataclass, field
|
| 17 |
+
from typing import Optional, Dict, Set, Any
|
| 18 |
|
| 19 |
from dotenv import load_dotenv
|
| 20 |
from huggingface_hub import InferenceClient
|
| 21 |
|
|
|
|
| 22 |
load_dotenv()
|
| 23 |
|
| 24 |
# =============================================================================
|
| 25 |
# LLM Configuration - DO NOT MODIFY
|
| 26 |
# =============================================================================
|
| 27 |
|
|
|
|
| 28 |
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 29 |
|
|
|
|
| 30 |
_hf_token = os.getenv("HF_TOKEN")
|
| 31 |
if not _hf_token:
|
| 32 |
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
|
|
|
| 34 |
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 35 |
|
| 36 |
|
| 37 |
+
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 512) -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
messages = [
|
| 39 |
{"role": "system", "content": system_prompt},
|
| 40 |
{"role": "user", "content": prompt},
|
| 41 |
]
|
| 42 |
+
|
| 43 |
response = LLM_CLIENT.chat.completions.create(
|
| 44 |
model=LLM_MODEL,
|
| 45 |
messages=messages,
|
| 46 |
+
temperature=0.0,
|
| 47 |
max_tokens=max_tokens,
|
| 48 |
seed=seed,
|
| 49 |
)
|
| 50 |
+
|
| 51 |
return response.choices[0].message.content
|
| 52 |
|
| 53 |
|
| 54 |
@dataclass
|
| 55 |
class RunResult:
|
|
|
|
| 56 |
final_score: int
|
| 57 |
max_score: int
|
| 58 |
moves: int
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
# =============================================================================
|
| 66 |
+
# System Prompts - Two Phase Approach
|
| 67 |
# =============================================================================
|
| 68 |
|
| 69 |
+
ACTION_SELECTION_SYSTEM_PROMPT = """You are playing a text adventure game to maximize score.
|
| 70 |
+
|
| 71 |
+
GOAL: Explore systematically, solve puzzles, collect items, and maximize score.
|
| 72 |
+
|
| 73 |
+
YOU WILL RECEIVE:
|
| 74 |
+
- Current observation
|
| 75 |
+
- Location memory with:
|
| 76 |
+
* VALID ACTIONS (from game engine - verified to work at this location)
|
| 77 |
+
* TRIED ACTIONS with summaries of outcomes (concise, LLM-generated)
|
| 78 |
+
* Previous promising actions (if you've been here before)
|
| 79 |
+
|
| 80 |
+
YOUR TASK - ACTION SELECTION:
|
| 81 |
+
1. Analyze the valid actions available
|
| 82 |
+
2. Consider which actions you've already tried and their outcomes
|
| 83 |
+
3. Identify up to 10 PROMISING ACTIONS from available options
|
| 84 |
+
4. Choose the BEST action to try next
|
| 85 |
+
|
| 86 |
+
STRATEGY GUIDELINES:
|
| 87 |
+
- Prioritize untried actions from the valid list
|
| 88 |
+
- **Objects are key**: When uncertain about next steps, examining or interacting with props/objects in the scene is often critical for progress
|
| 89 |
+
- Pick up valuable items: lamp, lantern, torch, sword, keys, treasures, tools
|
| 90 |
+
- Make sure you have a light sources before entering dark areas
|
| 91 |
+
- Examine objects, open containers, read signs, search rooms thoroughly
|
| 92 |
+
- Try multiple interactions with same object (examine, take, open, read, push, pull, turn)
|
| 93 |
+
- If stagnating (no progress), try different location or different action type
|
| 94 |
+
- Learn from previous outcomes: avoid repeating failures unless context changed
|
| 95 |
+
- If "not see that there": object moved - explore elsewhere
|
| 96 |
+
- Follow object movements mentioned in observations
|
| 97 |
|
| 98 |
+
RESPOND IN THIS EXACT FORMAT (no markdown):
|
| 99 |
+
THOUGHT: <your strategic reasoning>
|
| 100 |
+
PROMISING_ACTIONS: <JSON array of up to 10 promising actions to consider>
|
| 101 |
+
CHOSEN_ACTION: <the single best action to execute>
|
| 102 |
+
REASONING: <why this specific action is best>
|
| 103 |
|
| 104 |
+
Example:
|
| 105 |
+
THOUGHT: I'm in a dark area and need light. Valid actions show "turn on lamp". I haven't tried this yet.
|
| 106 |
+
PROMISING_ACTIONS: ["turn on lamp", "examine lamp", "go back", "look", "inventory"]
|
| 107 |
+
CHOSEN_ACTION: turn on lamp
|
| 108 |
+
REASONING: Lamp is critical for exploring dark areas. Should activate it before moving forward.
|
| 109 |
+
"""
|
| 110 |
|
| 111 |
+
OUTCOME_SUMMARY_SYSTEM_PROMPT = """You are analyzing the outcome of an action in a text adventure game.
|
| 112 |
+
|
| 113 |
+
YOUR TASK - OUTCOME SUMMARIZATION:
|
| 114 |
+
Given an action and its observation result, create a concise summary that captures:
|
| 115 |
+
1. What happened (1-2 sentences max)
|
| 116 |
+
2. Whether it succeeded, partially succeeded, or failed
|
| 117 |
+
3. Key information to remember for future decisions
|
| 118 |
+
|
| 119 |
+
Be CONCISE but capture critical details like:
|
| 120 |
+
- Items acquired/lost
|
| 121 |
+
- New areas discovered
|
| 122 |
+
- Obstacles encountered
|
| 123 |
+
- Score changes
|
| 124 |
+
- Object movements
|
| 125 |
+
- State changes (doors opened, lights turned on, etc.)
|
| 126 |
|
| 127 |
RESPOND IN THIS EXACT FORMAT (no markdown):
|
| 128 |
+
OUTCOME_SUMMARY: <1-2 sentence summary>
|
| 129 |
+
SUCCESS: <yes/no/partial>
|
| 130 |
+
KEY_INFO: <key detail to remember>
|
| 131 |
|
| 132 |
+
Example action: "take lamp"
|
| 133 |
+
Example observation: "Taken. The brass lamp is now in your inventory. [Score: 5 | Moves: 3]"
|
| 134 |
+
|
| 135 |
+
Example response:
|
| 136 |
+
OUTCOME_SUMMARY: Successfully picked up the brass lamp and added it to inventory.
|
| 137 |
+
SUCCESS: yes
|
| 138 |
+
KEY_INFO: Lamp acquired - can use for dark areas
|
| 139 |
"""
|
| 140 |
|
| 141 |
|
| 142 |
# =============================================================================
|
| 143 |
+
# Student Agent with Two-Phase LLM Approach
|
| 144 |
# =============================================================================
|
| 145 |
|
| 146 |
class StudentAgent:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
def __init__(self):
|
| 148 |
+
self.score = 0
|
| 149 |
+
self.moves = 0
|
| 150 |
+
|
| 151 |
+
# Enhanced location memory
|
| 152 |
+
self.location_memory: Dict[str, Dict[str, Any]] = {}
|
| 153 |
+
# Structure: {
|
| 154 |
+
# "Location Name": {
|
| 155 |
+
# "valid_actions": [...],
|
| 156 |
+
# "tried_actions": set(),
|
| 157 |
+
# "promising_actions": [...],
|
| 158 |
+
# "visited": count,
|
| 159 |
+
# "results": {
|
| 160 |
+
# "action": {
|
| 161 |
+
# "observation": "full text",
|
| 162 |
+
# "summary": "concise summary",
|
| 163 |
+
# "success": "yes/no/partial",
|
| 164 |
+
# "key_info": "important detail"
|
| 165 |
+
# }
|
| 166 |
+
# }
|
| 167 |
+
# }
|
| 168 |
+
# }
|
| 169 |
+
|
| 170 |
+
# Global tracking
|
| 171 |
+
self.locations_visited: set[str] = set()
|
| 172 |
+
self.inventory_items: set[str] = set()
|
| 173 |
+
|
| 174 |
+
# Stagnation detection
|
| 175 |
+
self.last_score_change_move: int = 0
|
| 176 |
+
|
| 177 |
+
# MCP client handle
|
| 178 |
+
self.env_handle = None
|
| 179 |
+
|
| 180 |
+
def _extract_result(self, result) -> str:
|
| 181 |
+
"""Extract text from MCP tool result."""
|
| 182 |
+
if hasattr(result, "content") and result.content:
|
| 183 |
+
return result.content[0].text
|
| 184 |
+
if isinstance(result, list) and result:
|
| 185 |
+
return result[0].text if hasattr(result[0], "text") else str(result[0])
|
| 186 |
+
return str(result)
|
| 187 |
+
|
| 188 |
+
def _parse_action_selection(self, response: str) -> tuple[str, list[str], str]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
"""
|
| 190 |
+
Parse action selection response.
|
| 191 |
+
Returns: (thought, promising_actions, chosen_action)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
"""
|
| 193 |
+
thought = "Proceed with exploration"
|
| 194 |
+
promising_actions = []
|
| 195 |
+
chosen_action = "look"
|
| 196 |
+
reasoning = ""
|
| 197 |
+
|
| 198 |
+
for line in (response or "").splitlines():
|
| 199 |
+
line_stripped = line.strip()
|
| 200 |
+
line_upper = line_stripped.upper()
|
| 201 |
+
|
| 202 |
+
if line_upper.startswith("THOUGHT:"):
|
| 203 |
+
thought = line_stripped.split(":", 1)[1].strip()
|
| 204 |
+
elif line_upper.startswith("PROMISING_ACTIONS:"):
|
| 205 |
+
actions_str = line_stripped.split(":", 1)[1].strip()
|
| 206 |
+
try:
|
| 207 |
+
# Try to parse as JSON array
|
| 208 |
+
promising_actions = json.loads(actions_str)
|
| 209 |
+
if not isinstance(promising_actions, list):
|
| 210 |
+
promising_actions = []
|
| 211 |
+
except:
|
| 212 |
+
# Fallback: comma-separated
|
| 213 |
+
promising_actions = [a.strip().strip('"\'') for a in actions_str.split(",")]
|
| 214 |
+
promising_actions = [a for a in promising_actions if a][:10]
|
| 215 |
+
|
| 216 |
+
elif line_upper.startswith("CHOSEN_ACTION:"):
|
| 217 |
+
chosen_action = line_stripped.split(":", 1)[1].strip()
|
| 218 |
+
chosen_action = chosen_action.strip('"\'').strip()
|
| 219 |
+
elif line_upper.startswith("REASONING:"):
|
| 220 |
+
reasoning = line_stripped.split(":", 1)[1].strip()
|
| 221 |
+
|
| 222 |
+
# Ensure we have at least something
|
| 223 |
+
if not chosen_action or chosen_action == "":
|
| 224 |
+
chosen_action = "look"
|
| 225 |
+
|
| 226 |
+
return thought, promising_actions, chosen_action
|
| 227 |
+
|
| 228 |
+
def _parse_outcome_summary(self, response: str) -> tuple[str, str, str]:
|
| 229 |
"""
|
| 230 |
+
Parse outcome summary response.
|
| 231 |
+
Returns: (summary, success, key_info)
|
|
|
|
| 232 |
"""
|
| 233 |
+
summary = "Action executed"
|
| 234 |
+
success = "unknown"
|
| 235 |
+
key_info = ""
|
| 236 |
+
|
| 237 |
+
for line in (response or "").splitlines():
|
| 238 |
+
line_stripped = line.strip()
|
| 239 |
+
line_upper = line_stripped.upper()
|
| 240 |
+
|
| 241 |
+
if line_upper.startswith("OUTCOME_SUMMARY:"):
|
| 242 |
+
summary = line_stripped.split(":", 1)[1].strip()
|
| 243 |
+
elif line_upper.startswith("SUCCESS:"):
|
| 244 |
+
success = line_stripped.split(":", 1)[1].strip().lower()
|
| 245 |
+
elif line_upper.startswith("KEY_INFO:"):
|
| 246 |
+
key_info = line_stripped.split(":", 1)[1].strip()
|
| 247 |
+
|
| 248 |
+
return summary, success, key_info
|
| 249 |
+
|
| 250 |
+
def _update_score_moves(self, obs: str) -> None:
|
| 251 |
+
"""Extract score and moves from observation."""
|
| 252 |
+
m = re.search(r"\[Score:\s*(\d+)\s*\|\s*Moves:\s*(\d+)\]", obs)
|
| 253 |
+
if m:
|
| 254 |
+
new_score = int(m.group(1))
|
| 255 |
+
if new_score > self.score:
|
| 256 |
+
self.last_score_change_move = int(m.group(2))
|
| 257 |
+
self.score = max(self.score, new_score)
|
| 258 |
+
self.moves = max(self.moves, int(m.group(2)))
|
| 259 |
+
|
| 260 |
+
def _is_game_over(self, obs: str) -> bool:
|
| 261 |
+
t = (obs or "").lower()
|
| 262 |
+
return any(p in t for p in ["game over", "you have died", "you are dead", "*** you have died ***"])
|
| 263 |
+
|
| 264 |
+
async def _get_current_location_name(self, client) -> str:
|
| 265 |
+
"""Get current location name from Jericho's get_player_location()."""
|
| 266 |
+
try:
|
| 267 |
+
res = await client.call_tool("get_player_location", {})
|
| 268 |
+
loc_info = self._extract_result(res)
|
| 269 |
+
return loc_info.strip()
|
| 270 |
+
except Exception as e:
|
| 271 |
+
return "Unknown"
|
| 272 |
+
|
| 273 |
+
async def _get_valid_actions_for_location(self, client) -> list[str]:
|
| 274 |
+
"""Get valid actions from Jericho API."""
|
| 275 |
+
try:
|
| 276 |
+
res = await client.call_tool("get_valid_actions", {})
|
| 277 |
+
actions_str = self._extract_result(res)
|
| 278 |
+
|
| 279 |
+
if actions_str.startswith("{"):
|
| 280 |
+
data = json.loads(actions_str)
|
| 281 |
+
return data.get("actions", [])
|
| 282 |
+
|
| 283 |
+
return [a.strip() for a in actions_str.split(",") if a.strip()]
|
| 284 |
+
except Exception:
|
| 285 |
+
return []
|
| 286 |
+
|
| 287 |
+
def _format_location_memory_for_action_selection(self, loc_name: str) -> str:
|
| 288 |
+
"""Format location memory for action selection prompt."""
|
| 289 |
+
if loc_name not in self.location_memory:
|
| 290 |
+
return "=== LOCATION MEMORY: First visit - no memory yet ===\n"
|
| 291 |
+
|
| 292 |
+
mem = self.location_memory[loc_name]
|
| 293 |
+
visit_count = mem["visited"]
|
| 294 |
+
valid_actions = mem["valid_actions"]
|
| 295 |
+
tried_actions = mem["tried_actions"]
|
| 296 |
+
results = mem["results"]
|
| 297 |
+
promising_actions = mem.get("promising_actions", [])
|
| 298 |
+
|
| 299 |
+
parts = [f"=== LOCATION MEMORY: {loc_name} (visited {visit_count} times) ==="]
|
| 300 |
+
|
| 301 |
+
# Valid actions from game engine
|
| 302 |
+
parts.append(f"\nVALID ACTIONS ({len(valid_actions)} available):")
|
| 303 |
+
parts.append(f"{', '.join(sorted(valid_actions))}")
|
| 304 |
+
parts.append("NOTE: These are location-specific actions verified by game engine.")
|
| 305 |
+
parts.append("Universal commands (look, inventory, wait) also work but aren't listed here.")
|
| 306 |
+
|
| 307 |
+
# Previous promising actions (if any)
|
| 308 |
+
if promising_actions:
|
| 309 |
+
parts.append(f"\nPREVIOUS PROMISING ACTIONS:")
|
| 310 |
+
parts.append(f"{', '.join(promising_actions)}")
|
| 311 |
+
|
| 312 |
+
# Tried actions with SUMMARIES (not full observations)
|
| 313 |
+
if results:
|
| 314 |
+
parts.append(f"\nTRIED ACTIONS ({len(tried_actions)} total):")
|
| 315 |
+
# Show most recent 8 with summaries
|
| 316 |
+
for action, action_result in list(results.items())[-8:]:
|
| 317 |
+
summary = action_result.get("summary", "No summary")
|
| 318 |
+
success = action_result.get("success", "unknown")
|
| 319 |
+
key_info = action_result.get("key_info", "")
|
| 320 |
+
|
| 321 |
+
status_icon = "+" if success == "yes" else "-" if success == "no" else "~"
|
| 322 |
+
parts.append(f" [{status_icon}] {action}")
|
| 323 |
+
parts.append(f" → {summary}")
|
| 324 |
+
if key_info:
|
| 325 |
+
parts.append(f" [*] {key_info}")
|
| 326 |
+
else:
|
| 327 |
+
parts.append("\nTRIED ACTIONS: (none yet at this location)")
|
| 328 |
+
|
| 329 |
+
return "\n".join(parts)
|
| 330 |
+
|
| 331 |
+
def _build_action_selection_prompt(self, observation: str, current_location: str) -> str:
|
| 332 |
+
"""Build prompt for Phase 1: Action Selection."""
|
| 333 |
+
parts = []
|
| 334 |
+
|
| 335 |
+
# Game state
|
| 336 |
+
parts.append(f"=== GAME STATE ===")
|
| 337 |
+
parts.append(f"Score: {self.score} | Moves: {self.moves}")
|
| 338 |
+
parts.append(f"Locations visited: {len(self.locations_visited)}")
|
| 339 |
+
|
| 340 |
+
if self.inventory_items:
|
| 341 |
+
parts.append(f"Inventory: {', '.join(sorted(self.inventory_items))}")
|
| 342 |
+
|
| 343 |
+
# Location memory (key context)
|
| 344 |
+
parts.append("\n" + self._format_location_memory_for_action_selection(current_location))
|
| 345 |
+
|
| 346 |
+
# Current observation
|
| 347 |
+
parts.append(f"\n=== CURRENT OBSERVATION ===")
|
| 348 |
+
parts.append(observation)
|
| 349 |
+
|
| 350 |
+
# Add strategic hints based on game state
|
| 351 |
+
hints = []
|
| 352 |
+
|
| 353 |
+
# Stagnation warning
|
| 354 |
+
moves_since_progress = self.moves - self.last_score_change_move
|
| 355 |
+
if moves_since_progress > 10:
|
| 356 |
+
hints.append(f"[!] No score progress in {moves_since_progress} moves!")
|
| 357 |
+
hints.append(" Consider: exploring new locations or trying different action types")
|
| 358 |
+
|
| 359 |
+
# Check for "not see that there" patterns
|
| 360 |
+
if current_location in self.location_memory:
|
| 361 |
+
mem = self.location_memory[current_location]
|
| 362 |
+
recent_failures = sum(1 for action, result in list(mem["results"].items())[-3:]
|
| 363 |
+
if "not see that there" in result.get("observation", "").lower())
|
| 364 |
+
if recent_failures >= 2:
|
| 365 |
+
hints.append("[!] Multiple 'not see that there' errors - object likely moved elsewhere!")
|
| 366 |
+
|
| 367 |
+
# Check for object movement in observation
|
| 368 |
+
obs_lower = observation.lower()
|
| 369 |
+
if any(phrase in obs_lower for phrase in ["ran to", "run to", "went to", "moved to"]):
|
| 370 |
+
hints.append("[+] Object movement detected in observation - consider following it!")
|
| 371 |
+
|
| 372 |
+
if hints:
|
| 373 |
+
parts.append("\n=== STRATEGIC HINTS ===")
|
| 374 |
+
parts.extend(hints)
|
| 375 |
+
|
| 376 |
+
parts.append("\n=== YOUR TASK ===")
|
| 377 |
+
parts.append("Select up to 10 promising actions and choose the best one to execute.")
|
| 378 |
+
|
| 379 |
+
return "\n".join(parts)
|
| 380 |
+
|
| 381 |
+
def _build_outcome_summary_prompt(self, action: str, observation: str) -> str:
|
| 382 |
+
"""Build prompt for Phase 2: Outcome Summarization."""
|
| 383 |
+
return f"""Action executed: "{action}"
|
| 384 |
+
|
| 385 |
+
Observation received:
|
| 386 |
+
{observation}
|
| 387 |
+
|
| 388 |
+
Analyze this outcome and provide a concise summary."""
|
| 389 |
+
|
| 390 |
+
async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = False) -> RunResult:
|
| 391 |
+
result_history: list[tuple[str, str, str]] = []
|
| 392 |
+
moves_used = 0
|
| 393 |
+
|
| 394 |
+
# Discover available tools
|
| 395 |
+
tools = await client.list_tools()
|
| 396 |
+
tool_names = {t.name for t in tools}
|
| 397 |
+
|
| 398 |
+
# Initial look
|
| 399 |
+
try:
|
| 400 |
+
res = await client.call_tool("play_action", {"action": "look"})
|
| 401 |
+
obs = self._extract_result(res)
|
| 402 |
+
moves_used += 1
|
| 403 |
+
self._update_score_moves(obs)
|
| 404 |
+
except Exception as e:
|
| 405 |
+
return RunResult(self.score, 350, moves_used, self.locations_visited, False, error=str(e), history=result_history)
|
| 406 |
+
|
| 407 |
+
if verbose:
|
| 408 |
+
print(f"\n{obs}")
|
| 409 |
+
|
| 410 |
+
# Initial inventory check
|
| 411 |
+
if "inventory" in tool_names:
|
| 412 |
+
try:
|
| 413 |
+
inv_res = await client.call_tool("inventory", {})
|
| 414 |
+
inv_text = self._extract_result(inv_res).lower()
|
| 415 |
+
moves_used += 1
|
| 416 |
+
|
| 417 |
+
for item in ["torch", "lamp", "lantern", "sword", "key"]:
|
| 418 |
+
if item in inv_text:
|
| 419 |
+
self.inventory_items.add(item)
|
| 420 |
+
|
| 421 |
+
if verbose and self.inventory_items:
|
| 422 |
+
print(f"[Starting inventory: {', '.join(self.inventory_items)}]")
|
| 423 |
+
except:
|
| 424 |
+
pass
|
| 425 |
+
|
| 426 |
+
# Main game loop
|
| 427 |
+
for step in range(1, max_steps - moves_used + 1):
|
| 428 |
+
# Get current location
|
| 429 |
+
current_location = await self._get_current_location_name(client)
|
| 430 |
+
self.locations_visited.add(current_location)
|
| 431 |
+
|
| 432 |
+
# Initialize or update location memory
|
| 433 |
+
if current_location not in self.location_memory:
|
| 434 |
+
valid_actions = await self._get_valid_actions_for_location(client)
|
| 435 |
+
|
| 436 |
+
self.location_memory[current_location] = {
|
| 437 |
+
"tried_actions": set(),
|
| 438 |
+
"valid_actions": valid_actions,
|
| 439 |
+
"promising_actions": [],
|
| 440 |
+
"visited": 1,
|
| 441 |
+
"results": {}
|
| 442 |
+
}
|
| 443 |
+
|
| 444 |
+
if verbose:
|
| 445 |
+
print(f"\n[New location: {current_location}]")
|
| 446 |
+
print(f"[Valid actions: {len(valid_actions)}]")
|
| 447 |
+
else:
|
| 448 |
+
self.location_memory[current_location]["visited"] += 1
|
| 449 |
+
|
| 450 |
+
# ========================================================
|
| 451 |
+
# PHASE 1: ACTION SELECTION (LLM Call #1)
|
| 452 |
+
# ========================================================
|
| 453 |
+
|
| 454 |
+
prompt1 = self._build_action_selection_prompt(obs, current_location)
|
| 455 |
+
llm_response1 = call_llm(prompt1, ACTION_SELECTION_SYSTEM_PROMPT, seed + step, max_tokens=512)
|
| 456 |
+
|
| 457 |
+
thought, promising_actions, chosen_action = self._parse_action_selection(llm_response1)
|
| 458 |
+
|
| 459 |
+
# Store promising actions in memory
|
| 460 |
+
self.location_memory[current_location]["promising_actions"] = promising_actions
|
| 461 |
+
|
| 462 |
+
if verbose:
|
| 463 |
+
print(f"\n--- Step {step} ---")
|
| 464 |
+
print(f"THOUGHT: {thought}")
|
| 465 |
+
print(f"PROMISING: {promising_actions}")
|
| 466 |
+
print(f"CHOSEN: {chosen_action}")
|
| 467 |
+
|
| 468 |
+
# ========================================================
|
| 469 |
+
# EXECUTE ACTION
|
| 470 |
+
# ========================================================
|
| 471 |
+
|
| 472 |
+
try:
|
| 473 |
+
res = await client.call_tool("play_action", {"action": chosen_action})
|
| 474 |
+
obs = self._extract_result(res)
|
| 475 |
+
moves_used += 1
|
| 476 |
+
|
| 477 |
+
# Track score changes
|
| 478 |
+
old_score = self.score
|
| 479 |
+
self._update_score_moves(obs)
|
| 480 |
+
if self.score > old_score:
|
| 481 |
+
self.last_score_change_move = moves_used
|
| 482 |
+
|
| 483 |
+
if verbose:
|
| 484 |
+
print(f"Observation: {obs}...")
|
| 485 |
+
|
| 486 |
+
# ========================================================
|
| 487 |
+
# PHASE 2: OUTCOME SUMMARIZATION (LLM Call #2)
|
| 488 |
+
# ========================================================
|
| 489 |
+
|
| 490 |
+
prompt2 = self._build_outcome_summary_prompt(chosen_action, obs)
|
| 491 |
+
llm_response2 = call_llm(prompt2, OUTCOME_SUMMARY_SYSTEM_PROMPT, seed + step + 10000, max_tokens=256)
|
| 492 |
+
|
| 493 |
+
summary, success, key_info = self._parse_outcome_summary(llm_response2)
|
| 494 |
+
|
| 495 |
+
if verbose:
|
| 496 |
+
print(f"SUMMARY: {summary}")
|
| 497 |
+
print(f"SUCCESS: {success}")
|
| 498 |
+
if key_info:
|
| 499 |
+
print(f"KEY_INFO: {key_info}")
|
| 500 |
+
|
| 501 |
+
# ========================================================
|
| 502 |
+
# UPDATE MEMORY with summarized outcome
|
| 503 |
+
# ========================================================
|
| 504 |
+
|
| 505 |
+
mem = self.location_memory[current_location]
|
| 506 |
+
mem["tried_actions"].add(chosen_action)
|
| 507 |
+
mem["results"][chosen_action] = {
|
| 508 |
+
"observation": obs, # Full text preserved
|
| 509 |
+
"summary": summary, # LLM-generated summary
|
| 510 |
+
"success": success, # yes/no/partial
|
| 511 |
+
"key_info": key_info # Important detail
|
| 512 |
+
}
|
| 513 |
+
|
| 514 |
+
# Update inventory tracking
|
| 515 |
+
if "take" in chosen_action.lower() and success == "yes":
|
| 516 |
+
words = chosen_action.split()
|
| 517 |
+
if len(words) >= 2:
|
| 518 |
+
item = words[-1]
|
| 519 |
+
self.inventory_items.add(item)
|
| 520 |
+
|
| 521 |
+
# Record in history
|
| 522 |
+
result_history.append((thought, f"play_action({chosen_action})", obs))
|
| 523 |
+
|
| 524 |
+
if self._is_game_over(obs):
|
| 525 |
+
break
|
| 526 |
+
|
| 527 |
+
except Exception as e:
|
| 528 |
+
result_history.append((thought, f"play_action({chosen_action})", f"Error: {e}"))
|
| 529 |
+
return RunResult(self.score, 350, moves_used, self.locations_visited, False, error=str(e), history=result_history)
|
| 530 |
+
|
| 531 |
+
if moves_used >= max_steps:
|
| 532 |
+
break
|
| 533 |
+
|
| 534 |
+
return RunResult(
|
| 535 |
+
final_score=self.score,
|
| 536 |
+
max_score=350,
|
| 537 |
+
moves=moves_used,
|
| 538 |
+
locations_visited=self.locations_visited,
|
| 539 |
+
game_completed=self._is_game_over(obs),
|
| 540 |
+
history=result_history,
|
| 541 |
+
)
|
| 542 |
|
| 543 |
|
| 544 |
# =============================================================================
|
|
|
|
| 548 |
async def test_agent():
|
| 549 |
"""Test the agent locally."""
|
| 550 |
from fastmcp import Client
|
| 551 |
+
|
|
|
|
|
|
|
|
|
|
| 552 |
agent = StudentAgent()
|
| 553 |
+
|
| 554 |
+
async with Client("mcp_server.py") as client:
|
| 555 |
result = await agent.run(
|
| 556 |
client=client,
|
| 557 |
+
game="lostpig",
|
| 558 |
+
max_steps=50,
|
| 559 |
seed=42,
|
| 560 |
verbose=True,
|
| 561 |
)
|
| 562 |
+
|
| 563 |
print(f"\nFinal Score: {result.final_score}")
|
| 564 |
print(f"Moves: {result.moves}")
|
| 565 |
+
print(f"Locations: {len(result.locations_visited)}")
|
| 566 |
|
| 567 |
|
| 568 |
if __name__ == "__main__":
|
mcp_server.py
CHANGED
|
@@ -49,41 +49,67 @@ class GameManager:
|
|
| 49 |
"""
|
| 50 |
Manages the text adventure game state.
|
| 51 |
|
| 52 |
-
|
| 53 |
-
- Action history (for memory tool)
|
| 54 |
-
-
|
| 55 |
- Current score and moves
|
|
|
|
|
|
|
|
|
|
| 56 |
"""
|
| 57 |
|
| 58 |
def __init__(self):
|
| 59 |
self.env: TextAdventureEnv = None
|
| 60 |
self.state = None
|
| 61 |
self.game_name: str = ""
|
| 62 |
-
#
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
# self.current_location: str = ""
|
| 66 |
|
| 67 |
def initialize(self, game: str = "zork1"):
|
| 68 |
"""Initialize or reset the game."""
|
| 69 |
self.game_name = game
|
| 70 |
self.env = TextAdventureEnv(game)
|
| 71 |
self.state = self.env.reset()
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
| 73 |
return self.state.observation
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
def step(self, action: str) -> str:
|
| 76 |
"""Execute an action and return the result."""
|
| 77 |
if self.env is None:
|
| 78 |
self.initialize()
|
| 79 |
-
|
| 80 |
self.state = self.env.step(action)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
# Update location tracking, etc.
|
| 85 |
-
|
| 86 |
-
return self.state.observation
|
| 87 |
|
| 88 |
def get_score(self) -> int:
|
| 89 |
"""Get current score."""
|
|
@@ -97,17 +123,6 @@ class GameManager:
|
|
| 97 |
# Global game manager
|
| 98 |
_game = GameManager()
|
| 99 |
|
| 100 |
-
|
| 101 |
-
def get_game() -> GameManager:
|
| 102 |
-
"""Get or initialize the game manager."""
|
| 103 |
-
global _game
|
| 104 |
-
if _game.env is None:
|
| 105 |
-
# Get game from environment variable (set by evaluator)
|
| 106 |
-
game = os.environ.get("GAME", "zork1")
|
| 107 |
-
_game.initialize(game)
|
| 108 |
-
return _game
|
| 109 |
-
|
| 110 |
-
|
| 111 |
# =============================================================================
|
| 112 |
# MCP Tools - IMPLEMENT THESE
|
| 113 |
# =============================================================================
|
|
@@ -132,72 +147,186 @@ def play_action(action: str) -> str:
|
|
| 132 |
"""
|
| 133 |
game = get_game()
|
| 134 |
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
#
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
#
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
|
| 202 |
|
| 203 |
# =============================================================================
|
|
|
|
| 49 |
"""
|
| 50 |
Manages the text adventure game state.
|
| 51 |
|
| 52 |
+
Tracks:
|
| 53 |
+
- Action history (for memory tool, though agent manages its own memory)
|
| 54 |
+
- Current location (using Jericho API)
|
| 55 |
- Current score and moves
|
| 56 |
+
|
| 57 |
+
Note: The agent handles its own location_memory system and doesn't rely
|
| 58 |
+
on server-side tracking beyond the core MCP tools.
|
| 59 |
"""
|
| 60 |
|
| 61 |
def __init__(self):
|
| 62 |
self.env: TextAdventureEnv = None
|
| 63 |
self.state = None
|
| 64 |
self.game_name: str = ""
|
| 65 |
+
# State tracking (for optional tools - agent manages its own memory)
|
| 66 |
+
self.history: list[tuple[str, str]] = [] # (action, observation)
|
| 67 |
+
self.current_location: str = "Unknown"
|
|
|
|
| 68 |
|
| 69 |
def initialize(self, game: str = "zork1"):
|
| 70 |
"""Initialize or reset the game."""
|
| 71 |
self.game_name = game
|
| 72 |
self.env = TextAdventureEnv(game)
|
| 73 |
self.state = self.env.reset()
|
| 74 |
+
|
| 75 |
+
# Reset tracking data
|
| 76 |
+
self.history = []
|
| 77 |
+
self.current_location = self._get_player_location_internal()
|
| 78 |
return self.state.observation
|
| 79 |
|
| 80 |
+
def _get_player_location_internal(self) -> str:
|
| 81 |
+
"""
|
| 82 |
+
Get current player location using Jericho API.
|
| 83 |
+
"""
|
| 84 |
+
if self.env and hasattr(self.env, 'env') and self.env.env:
|
| 85 |
+
try:
|
| 86 |
+
# Access Jericho's get_player_location() which returns a ZObject
|
| 87 |
+
loc_obj = self.env.env.get_player_location()
|
| 88 |
+
# ZObject has a .name attribute
|
| 89 |
+
if hasattr(loc_obj, 'name'):
|
| 90 |
+
return loc_obj.name
|
| 91 |
+
except Exception:
|
| 92 |
+
pass
|
| 93 |
+
return "Unknown"
|
| 94 |
+
|
| 95 |
def step(self, action: str) -> str:
|
| 96 |
"""Execute an action and return the result."""
|
| 97 |
if self.env is None:
|
| 98 |
self.initialize()
|
| 99 |
+
|
| 100 |
self.state = self.env.step(action)
|
| 101 |
+
obs = self.state.observation
|
| 102 |
+
|
| 103 |
+
# Record history
|
| 104 |
+
self.history.append((action, obs))
|
| 105 |
+
if len(self.history) > 50:
|
| 106 |
+
self.history = self.history[-50:]
|
| 107 |
+
|
| 108 |
+
# Update current location using Jericho API
|
| 109 |
+
self.current_location = self._get_player_location_internal()
|
| 110 |
|
| 111 |
+
return obs
|
| 112 |
+
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
def get_score(self) -> int:
|
| 115 |
"""Get current score."""
|
|
|
|
| 123 |
# Global game manager
|
| 124 |
_game = GameManager()
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
# =============================================================================
|
| 127 |
# MCP Tools - IMPLEMENT THESE
|
| 128 |
# =============================================================================
|
|
|
|
| 147 |
"""
|
| 148 |
game = get_game()
|
| 149 |
|
| 150 |
+
obs = game.step(action)
|
| 151 |
+
|
| 152 |
+
# Append score/moves info for the agent
|
| 153 |
+
score_info = f"\n\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
|
| 154 |
+
|
| 155 |
+
# If the environment exposes reward/done, include them (optional but helpful)
|
| 156 |
+
try:
|
| 157 |
+
if getattr(game.state, "reward", 0) and game.state.reward > 0:
|
| 158 |
+
score_info = f"\n\n+{game.state.reward} points! (Total: {game.get_score()})"
|
| 159 |
+
except Exception:
|
| 160 |
+
pass
|
| 161 |
+
|
| 162 |
+
done_info = ""
|
| 163 |
+
try:
|
| 164 |
+
if getattr(game.state, "done", False):
|
| 165 |
+
done_info = "\n\nGAME OVER"
|
| 166 |
+
except Exception:
|
| 167 |
+
pass
|
| 168 |
+
|
| 169 |
+
return obs + score_info + done_info
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
@mcp.tool()
|
| 173 |
+
def memory() -> str:
|
| 174 |
+
"""
|
| 175 |
+
Get a summary of the current game state.
|
| 176 |
+
|
| 177 |
+
Returns:
|
| 178 |
+
A summary including current location, score, moves, and recent history
|
| 179 |
+
"""
|
| 180 |
+
game = get_game()
|
| 181 |
+
|
| 182 |
+
recent = game.history[-5:] if game.history else []
|
| 183 |
+
if recent:
|
| 184 |
+
recent_str = "\n".join([f" > {a} -> {obs[:80]}..." for a, obs in recent])
|
| 185 |
+
else:
|
| 186 |
+
recent_str = " (none yet)"
|
| 187 |
+
|
| 188 |
+
return (
|
| 189 |
+
"Current State:\n"
|
| 190 |
+
f"- Location: {game.current_location}\n"
|
| 191 |
+
f"- Score: {game.get_score()} points\n"
|
| 192 |
+
f"- Moves: {game.get_moves()}\n"
|
| 193 |
+
f"- Game: {game.game_name}\n\n"
|
| 194 |
+
"Recent Actions:\n"
|
| 195 |
+
f"{recent_str}\n\n"
|
| 196 |
+
"Current Observation:\n"
|
| 197 |
+
f"{game.state.observation if game.state else ''}"
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
@mcp.tool()
|
| 201 |
+
def get_map() -> str:
|
| 202 |
+
"""
|
| 203 |
+
Get a map of explored locations.
|
| 204 |
|
| 205 |
+
Note: This tool is not used by the current agent implementation.
|
| 206 |
+
The agent manages its own location memory internally.
|
| 207 |
+
|
| 208 |
+
Returns:
|
| 209 |
+
A message indicating the agent doesn't use this tool
|
| 210 |
+
"""
|
| 211 |
+
game = get_game()
|
| 212 |
+
return f"Current location: {game.current_location}\\n\\nNote: The agent manages location tracking internally via its location_memory system."
|
| 213 |
+
|
| 214 |
+
@mcp.tool()
|
| 215 |
+
def inventory() -> str:
|
| 216 |
+
"""
|
| 217 |
+
Check what items you are currently carrying.
|
| 218 |
+
"""
|
| 219 |
+
game = get_game()
|
| 220 |
+
|
| 221 |
+
items = []
|
| 222 |
+
try:
|
| 223 |
+
if getattr(game.state, "inventory", None):
|
| 224 |
+
items = game.state.inventory
|
| 225 |
+
except Exception:
|
| 226 |
+
items = []
|
| 227 |
+
|
| 228 |
+
if not items:
|
| 229 |
+
return "Inventory: You are empty-handed."
|
| 230 |
+
|
| 231 |
+
# Convert items to readable names
|
| 232 |
+
item_names = []
|
| 233 |
+
for item in items:
|
| 234 |
+
s = str(item)
|
| 235 |
+
s_lower = s.lower()
|
| 236 |
+
if "parent" in s_lower:
|
| 237 |
+
idx = s_lower.index("parent")
|
| 238 |
+
name = s[:idx].strip()
|
| 239 |
+
if ":" in name:
|
| 240 |
+
name = name.split(":", 1)[1].strip()
|
| 241 |
+
item_names.append(name)
|
| 242 |
+
elif ":" in s:
|
| 243 |
+
item_names.append(s.split(":", 1)[1].strip())
|
| 244 |
+
else:
|
| 245 |
+
item_names.append(s)
|
| 246 |
+
|
| 247 |
+
return "Inventory: " + ", ".join(item_names)
|
| 248 |
+
|
| 249 |
+
def get_game() -> GameManager:
|
| 250 |
+
"""Get or initialize the game manager."""
|
| 251 |
+
global _game
|
| 252 |
+
|
| 253 |
+
game_name = os.environ.get("GAME", "zork1")
|
| 254 |
+
|
| 255 |
+
if _game.env is None:
|
| 256 |
+
_game.initialize(game_name)
|
| 257 |
+
elif _game.game_name != game_name:
|
| 258 |
+
_game.initialize(game_name)
|
| 259 |
+
|
| 260 |
+
return _game
|
| 261 |
+
|
| 262 |
+
|
| 263 |
+
@mcp.tool()
|
| 264 |
+
def get_player_location() -> str:
|
| 265 |
+
"""
|
| 266 |
+
Get the current player location name from Jericho's location tracking.
|
| 267 |
+
|
| 268 |
+
Returns:
|
| 269 |
+
The name of the current location (e.g., "West of House", "Forest")
|
| 270 |
+
"""
|
| 271 |
+
game = get_game()
|
| 272 |
+
|
| 273 |
+
if game.env and hasattr(game.env, 'env') and game.env.env:
|
| 274 |
+
try:
|
| 275 |
+
# Access Jericho's get_player_location() which returns a ZObject
|
| 276 |
+
loc_obj = game.env.env.get_player_location()
|
| 277 |
+
|
| 278 |
+
# ZObject has a .name attribute
|
| 279 |
+
if hasattr(loc_obj, 'name'):
|
| 280 |
+
return loc_obj.name
|
| 281 |
+
except Exception as e:
|
| 282 |
+
pass
|
| 283 |
+
|
| 284 |
+
# Fallback: use heuristic location extraction
|
| 285 |
+
return game.current_location
|
| 286 |
+
|
| 287 |
+
|
| 288 |
+
@mcp.tool()
|
| 289 |
+
def get_valid_actions() -> str:
|
| 290 |
+
"""
|
| 291 |
+
Get valid actions from Jericho's action space at the current location.
|
| 292 |
+
|
| 293 |
+
Returns:
|
| 294 |
+
JSON string with valid actions: {"available": true, "actions": [...], "count": N}
|
| 295 |
+
"""
|
| 296 |
+
game = get_game()
|
| 297 |
+
|
| 298 |
+
if game.env and hasattr(game.env, 'env') and game.env.env:
|
| 299 |
+
try:
|
| 300 |
+
# CRITICAL: use_parallel=False to prevent deadlock on Lost Pig
|
| 301 |
+
valid_actions = game.env.env.get_valid_actions(
|
| 302 |
+
use_object_tree=True,
|
| 303 |
+
use_ctypes=True,
|
| 304 |
+
use_parallel=False # Prevent multiprocessing deadlock
|
| 305 |
+
)
|
| 306 |
+
|
| 307 |
+
import json
|
| 308 |
+
return json.dumps({
|
| 309 |
+
"available": True,
|
| 310 |
+
"actions": valid_actions,
|
| 311 |
+
"count": len(valid_actions),
|
| 312 |
+
"source": "jericho"
|
| 313 |
+
})
|
| 314 |
+
except Exception as e:
|
| 315 |
+
import json
|
| 316 |
+
return json.dumps({
|
| 317 |
+
"available": False,
|
| 318 |
+
"error": str(e),
|
| 319 |
+
"actions": [],
|
| 320 |
+
"count": 0
|
| 321 |
+
})
|
| 322 |
+
|
| 323 |
+
import json
|
| 324 |
+
return json.dumps({
|
| 325 |
+
"available": False,
|
| 326 |
+
"error": "Game environment not initialized",
|
| 327 |
+
"actions": [],
|
| 328 |
+
"count": 0
|
| 329 |
+
})
|
| 330 |
|
| 331 |
|
| 332 |
# =============================================================================
|
z-machine-games-master
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
../z-machine-games-master
|