Spaces:
Sleeping
Sleeping
Chloé Court commited on
Commit ·
9c0dbe0
1
Parent(s): 7a36b3c
Submission
Browse files- README.md +61 -7
- agent.py +597 -210
- mcp_server.py +192 -119
- requirements.txt +15 -7
- utils.py +42 -0
README.md
CHANGED
|
@@ -10,19 +10,72 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# Text Adventure Agent
|
| 14 |
|
| 15 |
## Overview
|
|
|
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
<!-- Describe your approach here -->
|
| 22 |
|
| 23 |
-
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
|
@@ -30,6 +83,7 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
|
|
| 30 |
|------|-------------|
|
| 31 |
| `agent.py` | ReAct agent with `StudentAgent` class |
|
| 32 |
| `mcp_server.py` | MCP server with game interaction tools |
|
|
|
|
| 33 |
| `app.py` | Gradio interface for HF Space |
|
| 34 |
| `requirements.txt` | Additional dependencies |
|
| 35 |
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Autonomous Text Adventure Agent
|
| 14 |
|
| 15 |
## Overview
|
| 16 |
+
This project implements an autonomous text adventure agent designed to master parser-based interactive fiction (e.g., *Zork*). Unlike simple scripted bots, this agent utilizes a **ReAct-style reasoning loop** paired with an **MCP (Model Context Protocol) server** to manage structured memory and strategic planning.
|
| 17 |
|
| 18 |
+
### Primary Objectives
|
| 19 |
+
* **Systematic Exploration:** Map and traverse complex game worlds.
|
| 20 |
+
* **Logic Puzzle Solving:** Interact with objects to unlock progression.
|
| 21 |
+
* **Loop Prevention:** Identify and break repetitive cycles or stagnant states.
|
| 22 |
+
* **State Consistency:** Maintain an accurate, persistent mental model of the world.
|
| 23 |
+
* **Efficiency:** Maximize the game score while minimizing unnecessary moves.
|
| 24 |
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## Core Architecture
|
| 28 |
+
The agent operates on a three-layer decision model that ensures every action is grounded in observation and strategic intent.
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
1. **Observation Input:** Raw text from the game engine is parsed.
|
| 32 |
+
2. **Planner & Memory Update:** The LLM updates the cumulative world state.
|
| 33 |
+
3. **Tool Selection:** Reasoning logic picks the best tool/action based on policy constraints.
|
| 34 |
+
4. **Environment Interaction:** The command is executed via the MCP interface.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## Structured Memory System
|
| 39 |
+
The agent treats each location as an independent world substate. Memory is **incremental**, meaning it evolves with the agent's discoveries rather than being wiped.
|
| 40 |
+
|
| 41 |
+
### Location Memory Schema
|
| 42 |
+
For every discovered room, the agent tracks:
|
| 43 |
+
* **Objects:** Visible and interactable items.
|
| 44 |
+
* **Action History:** Commands already attempted and their results.
|
| 45 |
+
* **Topology:** Explored vs. unexplored directions.
|
| 46 |
+
* **Context:** Cumulative summaries and strategic hints.
|
| 47 |
+
|
| 48 |
+
> **Key Principle:** Preserve previously known facts unless an observation explicitly contradicts them (e.g., "The door is now open").
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
|
|
|
|
| 52 |
|
| 53 |
+
## Anti-Loop & Stagnation Policy
|
| 54 |
+
To prevent getting "stuck," the agent follows strict rules:
|
| 55 |
+
* **No Oscillation:** Tools cannot be toggled more than twice consecutively.
|
| 56 |
+
* **Action Blacklisting:** Actions that have already been done are logged and avoided until the environment state changes.
|
| 57 |
+
* **Stagnation Escape:** If progress halts, the agent is forced to switch interaction verbs or backtrack to the "least recently visited" area.
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## MCP Tool Interface
|
| 62 |
+
The agent interacts with the game through a standardized toolset:
|
| 63 |
+
|
| 64 |
+
* `play_action`: Executes commands (e.g., "north", "take lamp").
|
| 65 |
+
* `memory`: Retrieves the structured world state.
|
| 66 |
+
* `inventory`: Lists currently held items.
|
| 67 |
+
* `get_map`: Visualizes explored connections for navigation.
|
| 68 |
+
* `get_valid_actions`: Filters plausible commands to reduce hallucinations.
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## Performance Metrics
|
| 73 |
+
Progress is measured by an **Efficiency Ratio**:
|
| 74 |
+
$$Efficiency = \frac{Score}{\max(1, Moves)}$$
|
| 75 |
+
|
| 76 |
+
The agent also tracks unique object discoveries and the total percentage of the map explored.
|
| 77 |
+
|
| 78 |
+
---
|
| 79 |
|
| 80 |
## Files
|
| 81 |
|
|
|
|
| 83 |
|------|-------------|
|
| 84 |
| `agent.py` | ReAct agent with `StudentAgent` class |
|
| 85 |
| `mcp_server.py` | MCP server with game interaction tools |
|
| 86 |
+
| `utils.py` | Useful shared functions |
|
| 87 |
| `app.py` | Gradio interface for HF Space |
|
| 88 |
| `requirements.txt` | Additional dependencies |
|
| 89 |
|
agent.py
CHANGED
|
@@ -24,256 +24,643 @@ Tips:
|
|
| 24 |
"""
|
| 25 |
|
| 26 |
import json
|
| 27 |
-
import os
|
| 28 |
import re
|
| 29 |
from dataclasses import dataclass, field
|
| 30 |
from typing import Optional
|
| 31 |
|
| 32 |
-
from
|
| 33 |
-
from huggingface_hub import InferenceClient
|
| 34 |
-
|
| 35 |
-
# Load environment variables
|
| 36 |
-
load_dotenv()
|
| 37 |
|
| 38 |
# =============================================================================
|
| 39 |
-
# LLM Configuration
|
| 40 |
# =============================================================================
|
| 41 |
|
| 42 |
-
# Model to use (fixed for fair evaluation)
|
| 43 |
-
LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
|
| 44 |
-
|
| 45 |
-
# Initialize the LLM client (uses HF_TOKEN from environment)
|
| 46 |
-
_hf_token = os.getenv("HF_TOKEN")
|
| 47 |
-
if not _hf_token:
|
| 48 |
-
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
| 49 |
-
|
| 50 |
-
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
|
| 54 |
-
"""
|
| 55 |
-
Call the LLM with the given prompt. Use this function in your agent.
|
| 56 |
-
|
| 57 |
-
Args:
|
| 58 |
-
prompt: The user prompt (current game state, history, etc.)
|
| 59 |
-
system_prompt: The system prompt (instructions for the agent)
|
| 60 |
-
seed: Random seed for reproducibility
|
| 61 |
-
max_tokens: Maximum tokens in response (default: 300)
|
| 62 |
-
|
| 63 |
-
Returns:
|
| 64 |
-
The LLM's response text
|
| 65 |
-
|
| 66 |
-
Example:
|
| 67 |
-
response = call_llm(
|
| 68 |
-
prompt="You are in a forest. What do you do?",
|
| 69 |
-
system_prompt=SYSTEM_PROMPT,
|
| 70 |
-
seed=42,
|
| 71 |
-
)
|
| 72 |
-
"""
|
| 73 |
-
messages = [
|
| 74 |
-
{"role": "system", "content": system_prompt},
|
| 75 |
-
{"role": "user", "content": prompt},
|
| 76 |
-
]
|
| 77 |
-
|
| 78 |
-
response = LLM_CLIENT.chat.completions.create(
|
| 79 |
-
model=LLM_MODEL,
|
| 80 |
-
messages=messages,
|
| 81 |
-
temperature=0.0, # Deterministic for reproducibility
|
| 82 |
-
max_tokens=max_tokens,
|
| 83 |
-
seed=seed,
|
| 84 |
-
)
|
| 85 |
-
|
| 86 |
-
return response.choices[0].message.content
|
| 87 |
-
|
| 88 |
|
| 89 |
@dataclass
|
| 90 |
class RunResult:
|
| 91 |
-
"""Result of running the agent. Do not modify this class."""
|
| 92 |
final_score: int
|
| 93 |
max_score: int
|
| 94 |
moves: int
|
| 95 |
locations_visited: set[str]
|
| 96 |
game_completed: bool
|
|
|
|
|
|
|
|
|
|
| 97 |
error: Optional[str] = None
|
| 98 |
-
history: list[
|
| 99 |
|
| 100 |
|
| 101 |
# =============================================================================
|
| 102 |
-
# System Prompt
|
| 103 |
# =============================================================================
|
| 104 |
|
| 105 |
-
SYSTEM_PROMPT = """
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
TOOL: <tool_name>
|
| 122 |
-
ARGS: <JSON arguments
|
| 123 |
-
|
| 124 |
-
Example:
|
| 125 |
-
THOUGHT: I should look around to see where I am.
|
| 126 |
-
TOOL: play_action
|
| 127 |
-
ARGS: {"action": "look"}
|
| 128 |
"""
|
| 129 |
|
| 130 |
-
|
| 131 |
# =============================================================================
|
| 132 |
-
#
|
| 133 |
# =============================================================================
|
| 134 |
|
| 135 |
class StudentAgent:
|
| 136 |
-
"""
|
| 137 |
-
Your ReAct agent implementation.
|
| 138 |
-
|
| 139 |
-
TODO:
|
| 140 |
-
1. Implement the run() method with the ReAct loop
|
| 141 |
-
2. Parse LLM responses to extract tool calls
|
| 142 |
-
3. Track state and avoid loops
|
| 143 |
-
|
| 144 |
-
Use the provided call_llm() function to interact with the LLM.
|
| 145 |
-
"""
|
| 146 |
-
|
| 147 |
def __init__(self):
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
) -> RunResult:
|
| 162 |
-
"""
|
| 163 |
-
Run the agent for a game session.
|
| 164 |
-
|
| 165 |
-
Args:
|
| 166 |
-
client: FastMCP Client connected to your MCP server
|
| 167 |
-
game: Name of the game being played (e.g., "zork1")
|
| 168 |
-
max_steps: Maximum number of steps to take
|
| 169 |
-
seed: Random seed for reproducibility (use for LLM calls)
|
| 170 |
-
verbose: Whether to print detailed output
|
| 171 |
-
|
| 172 |
-
Returns:
|
| 173 |
-
RunResult with final score and statistics
|
| 174 |
-
"""
|
| 175 |
-
# TODO: Implement your ReAct loop here
|
| 176 |
-
#
|
| 177 |
-
# Basic structure:
|
| 178 |
-
# 1. Get initial observation (call play_action with "look")
|
| 179 |
-
# 2. Loop for max_steps:
|
| 180 |
-
# a. Build prompt with current observation and history
|
| 181 |
-
# b. Call LLM to get thought and action
|
| 182 |
-
# c. Parse the response to extract tool and args
|
| 183 |
-
# d. Call the tool via client.call_tool(tool_name, args)
|
| 184 |
-
# e. Update history and state
|
| 185 |
-
# f. Check for game over
|
| 186 |
-
# 3. Return RunResult with final statistics
|
| 187 |
-
|
| 188 |
-
# Example of calling a tool:
|
| 189 |
-
# result = await client.call_tool("play_action", {"action": "look"})
|
| 190 |
-
# observation = result[0].text if result else "No response"
|
| 191 |
-
|
| 192 |
-
# Example of calling the LLM:
|
| 193 |
-
# response = call_llm(
|
| 194 |
-
# prompt="Current observation: " + observation,
|
| 195 |
-
# system_prompt=SYSTEM_PROMPT,
|
| 196 |
-
# seed=seed,
|
| 197 |
-
# )
|
| 198 |
-
|
| 199 |
-
# Placeholder implementation - replace with your code
|
| 200 |
-
locations_visited = set()
|
| 201 |
history = []
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
#
|
| 207 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
return RunResult(
|
| 209 |
-
final_score=
|
| 210 |
-
max_score=350,
|
| 211 |
moves=moves,
|
| 212 |
-
locations_visited=
|
| 213 |
-
game_completed=
|
|
|
|
| 214 |
history=history,
|
| 215 |
)
|
| 216 |
-
|
| 217 |
-
def
|
| 218 |
-
"""
|
| 219 |
-
Build the prompt for the LLM.
|
| 220 |
-
|
| 221 |
-
TODO: Implement this to create effective prompts
|
| 222 |
-
"""
|
| 223 |
-
# TODO: Combine system prompt, history, and current observation
|
| 224 |
-
pass
|
| 225 |
-
|
| 226 |
-
def _parse_response(self, response: str) -> tuple[str, str, dict]:
|
| 227 |
-
"""
|
| 228 |
-
Parse LLM response to extract thought, tool name, and arguments.
|
| 229 |
-
|
| 230 |
-
TODO: Implement robust parsing
|
| 231 |
-
|
| 232 |
-
Returns:
|
| 233 |
-
Tuple of (thought, tool_name, args_dict)
|
| 234 |
-
"""
|
| 235 |
-
# TODO: Parse the response format:
|
| 236 |
-
# THOUGHT: ...
|
| 237 |
-
# TOOL: ...
|
| 238 |
-
# ARGS: {...}
|
| 239 |
-
pass
|
| 240 |
-
|
| 241 |
-
def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
|
| 242 |
"""
|
| 243 |
-
Call the LLM
|
| 244 |
-
|
| 245 |
-
|
|
|
|
| 246 |
"""
|
| 247 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
|
|
|
| 249 |
|
| 250 |
-
# =============================================================================
|
| 251 |
-
# For local testing
|
| 252 |
-
# =============================================================================
|
| 253 |
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 262 |
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 271 |
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
|
| 277 |
-
if __name__ == "__main__":
|
| 278 |
-
import asyncio
|
| 279 |
-
asyncio.run(test_agent())
|
|
|
|
| 24 |
"""
|
| 25 |
|
| 26 |
import json
|
|
|
|
| 27 |
import re
|
| 28 |
from dataclasses import dataclass, field
|
| 29 |
from typing import Optional
|
| 30 |
|
| 31 |
+
from utils import call_llm, extract_location, is_new_location
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
# =============================================================================
|
| 34 |
+
# LLM Configuration
|
| 35 |
# =============================================================================
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
@dataclass
|
| 39 |
class RunResult:
|
|
|
|
| 40 |
final_score: int
|
| 41 |
max_score: int
|
| 42 |
moves: int
|
| 43 |
locations_visited: set[str]
|
| 44 |
game_completed: bool
|
| 45 |
+
unique_objects: int = 0
|
| 46 |
+
puzzles_solved: int = 0
|
| 47 |
+
efficiency: float = 0.0
|
| 48 |
error: Optional[str] = None
|
| 49 |
+
history: list[dict] = field(default_factory=list)
|
| 50 |
|
| 51 |
|
| 52 |
# =============================================================================
|
| 53 |
+
# System Prompt
|
| 54 |
# =============================================================================
|
| 55 |
|
| 56 |
+
SYSTEM_PROMPT = """
|
| 57 |
+
You are an expert text adventure game player. Your objective is to explore efficiently, collect treasures, solve puzzles, and maximize your score.
|
| 58 |
+
**Random movement is forbidden.** Always plan actions using context and memory.
|
| 59 |
+
|
| 60 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 61 |
+
AVAILABLE TOOLS (exactly ONE per step)
|
| 62 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 63 |
+
1. memory - Check current state, items, objects, locations, and past actions.
|
| 64 |
+
2. play_action - Execute a game command.
|
| 65 |
+
3. get_map - Return to a previously visited location or get a map of explored areas.
|
| 66 |
+
4. inventory - Check current inventory.
|
| 67 |
+
5. get_valid_actions - Get likely valid actions from the current location.
|
| 68 |
+
|
| 69 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 70 |
+
TOOL PRIORITY RULE
|
| 71 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 72 |
+
Choose tool in this order:
|
| 73 |
+
1. If local puzzle interaction is possible → play_action
|
| 74 |
+
2. If interactable object is visible → play_action
|
| 75 |
+
3. If inventory contains potentially useful item → inventory
|
| 76 |
+
4. If location understanding is uncertain → memory
|
| 77 |
+
5. If planning navigation to solve puzzle → get_map
|
| 78 |
+
6. Exploration of world → play_action movement
|
| 79 |
+
|
| 80 |
+
**CRITICAL:**
|
| 81 |
+
- Do NOT use any tool other than play_action more than 2 times in a row.
|
| 82 |
+
- **DO NOT repeat an action that has already been attempted in the current location, unless the state clearly changed and it is necessary.**
|
| 83 |
+
|
| 84 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━���━━━
|
| 85 |
+
VALID GAME COMMANDS for play_action
|
| 86 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 87 |
+
Movement:
|
| 88 |
+
north, south, east, west, up, down, enter, exit
|
| 89 |
+
Objects:
|
| 90 |
+
take <item>, drop <item>, open <thing>, close <thing>, examine <thing>,
|
| 91 |
+
push <thing>, pull <thing>, move <thing>, lift <thing>, turn <thing>, press <thing>
|
| 92 |
+
Light:
|
| 93 |
+
turn on lamp, turn off lamp
|
| 94 |
+
Combat:
|
| 95 |
+
attack <enemy> with <weapon>
|
| 96 |
+
Other:
|
| 97 |
+
inventory, look, read <thing>, wait
|
| 98 |
+
Forbidden:
|
| 99 |
+
check, inspect, search, grab, use, help
|
| 100 |
+
|
| 101 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 102 |
+
STRATEGIC RULES
|
| 103 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 104 |
+
• **Avoid repeating actions:**
|
| 105 |
+
- **NEVER** repeat an action that has already been attempted in the current location.
|
| 106 |
+
- If an action failed or produced no progress, **do not try it again** in the same context.
|
| 107 |
+
- Track failed actions per location to avoid loops.
|
| 108 |
+
|
| 109 |
+
• Before leaving a location:
|
| 110 |
+
- Collect all useful items.
|
| 111 |
+
- Interact with all interesting objects (push/pull/move/lift/open) if "examine" yields nothing.
|
| 112 |
+
- Solve local puzzles before moving away.
|
| 113 |
+
- Check if there are valid actions related to visible objects or inventory items that haven't been tried yet.
|
| 114 |
+
|
| 115 |
+
• **Systematic exploration > random movement.**
|
| 116 |
+
• Avoid overusing "examine": if it yields nothing, try physical interactions (push/pull/move/lift/open/turn/press).
|
| 117 |
+
• If the previous observation indicates a failed action, **avoid that action and similar ones** in the future.
|
| 118 |
+
|
| 119 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 120 |
+
ANTI-REPETITION RULE (CRITICAL)
|
| 121 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 122 |
+
**STRICT POLICY:**
|
| 123 |
+
1. **Track all attempted actions per location** in memory.
|
| 124 |
+
2. **Never repeat an action** that has already been tried in the current location.
|
| 125 |
+
3. If an action fails (e.g., "The door is locked"), **do not attempt it again** unless new context suggests it might now work (e.g., you found a key).
|
| 126 |
+
4. If no progress is made after 3 actions, **change strategy** (e.g., try a different object or direction).
|
| 127 |
+
|
| 128 |
+
**Example:**
|
| 129 |
+
- If "open door" fails, **do not try it again** unless you acquire a key or new information.
|
| 130 |
+
- If "examine table" yields "nothing special," **try physical interactions** (push/pull/move) instead of repeating "examine."
|
| 131 |
+
|
| 132 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 133 |
+
INTERACTION STRATEGY
|
| 134 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 135 |
+
When you see an object:
|
| 136 |
+
1. If it is a container → try **open** (only once).
|
| 137 |
+
2. If large/fixed → try **move**, **push**, **pull**, or **lift** (only once each).
|
| 138 |
+
3. If "examine" gives no useful info → try **one** physical interaction (e.g., turn/press).
|
| 139 |
+
4. If enterable → try **enter** (only once).
|
| 140 |
+
5. **Never repeat the same interaction** on the same object in the same location.
|
| 141 |
+
|
| 142 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 143 |
+
EXPLORATION RULE
|
| 144 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 145 |
+
- If no immediate objectives:
|
| 146 |
+
- Explore **unexplored directions systematically**.
|
| 147 |
+
- Prefer directions **not previously taken** from this location.
|
| 148 |
+
- **Do not wander randomly**: Always have a reason for movement (e.g., "The path east was not explored yet").
|
| 149 |
+
- Use **get_map** only to return to a location with unsolved puzzles or uncollected items.
|
| 150 |
+
|
| 151 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 152 |
+
RESPONSE FORMAT (STRICT — NO MARKDOWN)
|
| 153 |
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 154 |
+
THOUGHT: <brief reasoning referencing memory, map, or inventory if applicable>
|
| 155 |
TOOL: <tool_name>
|
| 156 |
+
ARGS: <JSON arguments>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
"""
|
| 158 |
|
|
|
|
| 159 |
# =============================================================================
|
| 160 |
+
# StudentAgent
|
| 161 |
# =============================================================================
|
| 162 |
|
| 163 |
class StudentAgent:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
def __init__(self):
|
| 165 |
+
self.history = []
|
| 166 |
+
self.current_location = None
|
| 167 |
+
self.score = 0
|
| 168 |
+
self.recent_actions = []
|
| 169 |
+
self.last_tool = None
|
| 170 |
+
# structured memory
|
| 171 |
+
self.locations = {}
|
| 172 |
+
|
| 173 |
+
# =======================================
|
| 174 |
+
# Run
|
| 175 |
+
# =======================================
|
| 176 |
+
async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = False):
|
| 177 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
history = []
|
| 179 |
+
|
| 180 |
+
tools = await client.list_tools()
|
| 181 |
+
tool_names = [t.name for t in tools]
|
| 182 |
+
|
| 183 |
+
# ---------------------------------
|
| 184 |
+
# Initial observation
|
| 185 |
+
# ---------------------------------
|
| 186 |
+
tool_name, tool_args = "play_action", {"action": "look"}
|
| 187 |
+
self.last_tool = tool_name
|
| 188 |
+
|
| 189 |
+
result = await client.call_tool(tool_name, tool_args)
|
| 190 |
+
observation = self._extract_result(result)
|
| 191 |
+
|
| 192 |
+
# Detect starting location
|
| 193 |
+
self.current_location = extract_location(observation)
|
| 194 |
+
|
| 195 |
+
# Initialize location memory
|
| 196 |
+
self.locations[self.current_location] = {
|
| 197 |
+
"objects_seen": set(),
|
| 198 |
+
"actions_done": set(),
|
| 199 |
+
"directions_explored": set(),
|
| 200 |
+
"promising_hints": set(),
|
| 201 |
+
"memory": observation,
|
| 202 |
+
"observations_seen": set(),
|
| 203 |
+
"valid_actions": set()
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
self.locations[self.current_location]["observations_seen"].add(observation)
|
| 207 |
+
|
| 208 |
+
# Fetch valid actions
|
| 209 |
+
valid_actions = await client.call_tool("get_valid_actions", {})
|
| 210 |
+
parsed = self._extract_result(valid_actions)
|
| 211 |
+
|
| 212 |
+
self.locations[self.current_location]["valid_actions"] = set(
|
| 213 |
+
a.strip() for a in parsed.split(",") if a.strip()
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
if verbose:
|
| 217 |
+
print(observation)
|
| 218 |
+
|
| 219 |
+
# =====================================
|
| 220 |
+
# MAIN LOOP
|
| 221 |
+
# =====================================
|
| 222 |
+
for step in range(1, max_steps + 1):
|
| 223 |
+
# -------------------------
|
| 224 |
+
# Location detection
|
| 225 |
+
# -------------------------
|
| 226 |
+
try:
|
| 227 |
+
if is_new_location(observation, set(self.locations.keys()), self.last_tool):
|
| 228 |
+
|
| 229 |
+
new_location = extract_location(observation)
|
| 230 |
+
|
| 231 |
+
self.locations[self.current_location]["directions_explored"].add(
|
| 232 |
+
("look", new_location)
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
self.current_location = new_location
|
| 236 |
+
|
| 237 |
+
if new_location not in self.locations.keys():
|
| 238 |
+
self.locations[new_location] = {
|
| 239 |
+
"objects_seen": set(),
|
| 240 |
+
"actions_done": set(),
|
| 241 |
+
"directions_explored": set(),
|
| 242 |
+
"promising_hints": set(),
|
| 243 |
+
"memory": observation,
|
| 244 |
+
"observations_seen": set(),
|
| 245 |
+
"valid_actions": set(),
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
# Fetch valid actions on entering location
|
| 249 |
+
try:
|
| 250 |
+
valid_actions = await client.call_tool(
|
| 251 |
+
"get_valid_actions",
|
| 252 |
+
{}
|
| 253 |
+
)
|
| 254 |
+
|
| 255 |
+
parsed = self._extract_result(valid_actions)
|
| 256 |
+
|
| 257 |
+
self.locations[self.current_location]["valid_actions"] = set(
|
| 258 |
+
a.strip() for a in parsed.split(",") if a.strip()
|
| 259 |
+
)
|
| 260 |
+
|
| 261 |
+
except Exception:
|
| 262 |
+
pass
|
| 263 |
+
|
| 264 |
+
except Exception:
|
| 265 |
+
pass
|
| 266 |
+
|
| 267 |
+
# Prevent tool oscillation
|
| 268 |
+
if len(self.history) >= 2:
|
| 269 |
+
actions = ["memory", "get_map", "inventory"]
|
| 270 |
+
# avoid using one of the non-play_action tools more than 2 times in a row
|
| 271 |
+
if any(self.last_tool == a for a in actions):
|
| 272 |
+
# Force exploration action instead of map query
|
| 273 |
+
self.forced_prompt_hint = "\nYou should choose play_action to explore instead of using the same tool again."
|
| 274 |
+
else:
|
| 275 |
+
self.forced_prompt_hint = ""
|
| 276 |
+
|
| 277 |
+
# -------------------------
|
| 278 |
+
# LLM decision step (pre-call for memory, objects, actions)
|
| 279 |
+
# -------------------------
|
| 280 |
+
if self.last_tool == "play_action":
|
| 281 |
+
planner_data = await self._call_planner_llm(observation)
|
| 282 |
+
print(f"\n[PLANNER LLM RESPONSE]\n{planner_data}\n")
|
| 283 |
+
print(f"[VALID ACTIONS]\n{self.locations[self.current_location]['valid_actions']}\n")
|
| 284 |
+
|
| 285 |
+
# Update memory with LLM-generated data
|
| 286 |
+
self.locations[self.current_location]["memory"] = planner_data["memory"]
|
| 287 |
+
|
| 288 |
+
actions = set(planner_data["promising_hints"])
|
| 289 |
+
actions -= self.locations[self.current_location]["actions_done"]
|
| 290 |
+
self.locations[self.current_location]["promising_hints"] = list(actions)
|
| 291 |
+
|
| 292 |
+
objects_seen_before = self.locations[self.current_location]["objects_seen"]
|
| 293 |
+
self.locations[self.current_location]["objects_seen"].update(planner_data["objects_seen"])
|
| 294 |
+
|
| 295 |
+
if objects_seen_before != self.locations[self.current_location]["objects_seen"]:
|
| 296 |
+
# Update valid actions
|
| 297 |
+
valid_actions = await client.call_tool("get_valid_actions", {})
|
| 298 |
+
parsed = self._extract_result(valid_actions)
|
| 299 |
+
|
| 300 |
+
self.locations[self.current_location]["valid_actions"] = set(
|
| 301 |
+
a.strip() for a in parsed.split(",") if a.strip()
|
| 302 |
+
)
|
| 303 |
+
|
| 304 |
+
# -------------------------
|
| 305 |
+
# Build prompt for tool selection (without calling LLM again)
|
| 306 |
+
# -------------------------
|
| 307 |
+
prompt = self._build_prompt(observation)
|
| 308 |
+
|
| 309 |
+
# Call LLM ONLY for tool selection (not for memory/objects/actions)
|
| 310 |
+
response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
|
| 311 |
+
thought, tool_name, tool_args = self._parse_response(response)
|
| 312 |
+
|
| 313 |
+
tool_name, tool_args = self._validate_tool_call(
|
| 314 |
+
tool_name,
|
| 315 |
+
tool_args,
|
| 316 |
+
tool_names
|
| 317 |
+
)
|
| 318 |
+
self.last_tool = tool_name
|
| 319 |
+
|
| 320 |
+
if tool_name == "play_action":
|
| 321 |
+
self.locations[self.current_location]["actions_done"].add(tool_args.get("action", "look"))
|
| 322 |
+
|
| 323 |
+
if verbose:
|
| 324 |
+
print(f"\nStep {step}")
|
| 325 |
+
print(f"Location: {self.current_location}")
|
| 326 |
+
print(f"Thought: {thought}")
|
| 327 |
+
print(f"Tool: {tool_name}")
|
| 328 |
+
print(f"Args: {tool_args}")
|
| 329 |
+
|
| 330 |
+
# -------------------------
|
| 331 |
+
# Tool execution
|
| 332 |
+
# -------------------------
|
| 333 |
+
try:
|
| 334 |
+
result = await client.call_tool(tool_name, tool_args)
|
| 335 |
+
observation = self._extract_result(result)
|
| 336 |
+
self.locations[self.current_location]["observations_seen"].add(observation)
|
| 337 |
+
|
| 338 |
+
except Exception as e:
|
| 339 |
+
observation = str(e)
|
| 340 |
+
|
| 341 |
+
# -------------------------
|
| 342 |
+
# Score tracking
|
| 343 |
+
# -------------------------
|
| 344 |
+
self._update_score(observation)
|
| 345 |
+
|
| 346 |
+
self.history.append({
|
| 347 |
+
"step": step,
|
| 348 |
+
"thought": thought,
|
| 349 |
+
"tool": tool_name,
|
| 350 |
+
"args": tool_args,
|
| 351 |
+
"result": observation
|
| 352 |
+
})
|
| 353 |
+
|
| 354 |
+
history.append((thought, f"{tool_name}({tool_args})", observation))
|
| 355 |
+
|
| 356 |
+
if len(self.history) > 10:
|
| 357 |
+
self.history = self.history[-10:]
|
| 358 |
+
|
| 359 |
+
if verbose:
|
| 360 |
+
print(f"[RESULT] {observation}")
|
| 361 |
+
print(f"[SCORE] {self.score}")
|
| 362 |
+
|
| 363 |
+
if self._is_game_over(observation):
|
| 364 |
+
break
|
| 365 |
+
|
| 366 |
+
# =====================================
|
| 367 |
+
# Final result
|
| 368 |
+
# =====================================
|
| 369 |
+
moves = len(self.history)
|
| 370 |
+
efficiency = self.score / max(1, moves)
|
| 371 |
+
|
| 372 |
return RunResult(
|
| 373 |
+
final_score=self.score,
|
| 374 |
+
max_score=350,
|
| 375 |
moves=moves,
|
| 376 |
+
locations_visited=self.locations,
|
| 377 |
+
game_completed=self._is_game_over(observation),
|
| 378 |
+
efficiency=efficiency,
|
| 379 |
history=history,
|
| 380 |
)
|
| 381 |
+
|
| 382 |
+
async def _call_planner_llm(self, observation: str) -> dict:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 383 |
"""
|
| 384 |
+
Call the LLM to:
|
| 385 |
+
1. Update the location memory.
|
| 386 |
+
2. Extract interactable objects from the observation.
|
| 387 |
+
3. Generate promising actions grounded in the observation.
|
| 388 |
"""
|
| 389 |
+
current_data = self.locations.get(self.current_location, {})
|
| 390 |
+
|
| 391 |
+
prompt = """
|
| 392 |
+
You are an expert text adventure agent. Your **only** goal is to maximize progress by:
|
| 393 |
+
- Solving puzzles (e.g., "use <object> on <thing>").
|
| 394 |
+
- Collecting useful items (e.g., "take <object>").
|
| 395 |
+
- Exploring new areas (e.g., "enter").
|
| 396 |
+
- Avoiding redundant or vague actions.
|
| 397 |
+
|
| 398 |
+
---
|
| 399 |
+
|
| 400 |
+
### CURRENT CONTEXT
|
| 401 |
+
**Location:**
|
| 402 |
+
{location}
|
| 403 |
+
|
| 404 |
+
**Current Observation:**
|
| 405 |
+
{observation}
|
| 406 |
+
|
| 407 |
+
**Current Memory of this Location:**
|
| 408 |
+
{memory}
|
| 409 |
+
|
| 410 |
+
---
|
| 411 |
+
|
| 412 |
+
### STRICT INSTRUCTIONS
|
| 413 |
+
Your task is to:
|
| 414 |
+
1. **Update the memory**.
|
| 415 |
+
2. **Extract interactable objects** (only explicitly mentioned in the observation).
|
| 416 |
+
3. **Generate ≤5 promising actions** (strictly grounded in the observation + valid actions).
|
| 417 |
+
|
| 418 |
+
---
|
| 419 |
+
|
| 420 |
+
#### 1. LOCATION MEMORY UPDATE
|
| 421 |
+
|
| 422 |
+
You are maintaining a cumulative memory of this location.
|
| 423 |
+
|
| 424 |
+
Goal:
|
| 425 |
+
Update the existing location description by merging it with the new observation,
|
| 426 |
+
while ensuring that the final description reflects the CURRENT STATE of the location.
|
| 427 |
+
|
| 428 |
+
Rules:
|
| 429 |
+
|
| 430 |
+
1. Preserve all previously known environmental facts unless explicitly contradicted.
|
| 431 |
+
2. Add any new information from the new observation.
|
| 432 |
+
3. Remove facts that are clearly invalidated by the new observation.
|
| 433 |
+
4. If an object is taken, it is no longer present in the location.
|
| 434 |
+
5. If an object is dropped, it becomes present in the location.
|
| 435 |
+
6. If an object changes state (opened, closed, locked, unlocked, broken, etc.), replace the old state with the new one.
|
| 436 |
+
7. Only the CURRENT state of each object should appear in the final description.
|
| 437 |
+
8. Do not keep outdated state history (e.g., do not keep both "closed" and "opened").
|
| 438 |
+
9. Do NOT rewrite stylistically.
|
| 439 |
+
10. Do not duplicate information.
|
| 440 |
+
11. Keep it concise while preserving all relevant environmental details.
|
| 441 |
+
|
| 442 |
+
The final description must represent the current true state of the location,
|
| 443 |
+
not a history of past states.
|
| 444 |
+
|
| 445 |
+
#### 2. OBJECTS SEEN
|
| 446 |
+
List **only** objects that are:
|
| 447 |
+
- Explicitly mentioned in the observation.
|
| 448 |
+
- The objects should be clearly interactable (e.g., "a shiny key on the table" → "key", "a path" → not an object).
|
| 449 |
+
- Required for puzzle-solving.
|
| 450 |
+
Keep only the name of the object, without adjectives or extra description.
|
| 451 |
+
|
| 452 |
+
#### 3. **PROMISING HINTS**:
|
| 453 |
+
- Suggest **strategic hints** (not direct actions) that are strictly supported by the current observation and the valid actions for this location.
|
| 454 |
+
- Do not suggest actions already done in this location: {actions_done}.
|
| 455 |
+
- Do not suggest actions that do not seem possible (e.g., "take key" if the key is not mentioned in the observation, "open locked door").
|
| 456 |
+
- Hints must be directly supported by the current observation.
|
| 457 |
+
- Each hint should be a concise suggestion of what to try next, grounded in the current context (e.g., "The door is open, maybe you can enter it" → "try entering the door").
|
| 458 |
+
- Use the following action verbs if applicable: take, open, close, push, pull, move, lift, turn, press, enter, ... with the relevant object.
|
| 459 |
+
|
| 460 |
+
- Focus on:
|
| 461 |
+
* Potential puzzle solutions
|
| 462 |
+
* Object interactions
|
| 463 |
+
* Hidden opportunities
|
| 464 |
+
- Forbidden:
|
| 465 |
+
- Vague hints ("There might be something interesting")
|
| 466 |
+
- Repeats of already done actions
|
| 467 |
+
- Random movement without reason
|
| 468 |
+
|
| 469 |
+
- Movement rules:
|
| 470 |
+
- Do NOT suggest movement if there are still meaningful interactions available in the current location.
|
| 471 |
+
- If all useful local interactions have been exhausted, suggest exploring an unexplored direction.
|
| 472 |
+
- Prefer unexplored directions over previously visited ones.
|
| 473 |
+
|
| 474 |
+
### OUTPUT FORMAT (STRICT JSON) with no markdown or explanations:
|
| 475 |
+
{{
|
| 476 |
+
"memory": "<updated_memory>",
|
| 477 |
+
"promising_hints": ["<hint1>", "<hint2>"],
|
| 478 |
+
"objects_seen": ["<object1>", "<object2>"]
|
| 479 |
+
}}
|
| 480 |
+
""".format(
|
| 481 |
+
observation=observation,
|
| 482 |
+
location=self.current_location,
|
| 483 |
+
memory=current_data.get("memory", ""),
|
| 484 |
+
actions_done=list(current_data.get("actions_done", set())),
|
| 485 |
+
)
|
| 486 |
|
| 487 |
+
response = call_llm(prompt=prompt, seed=42)
|
| 488 |
|
|
|
|
|
|
|
|
|
|
| 489 |
|
| 490 |
+
try:
|
| 491 |
+
data = json.loads(response)
|
| 492 |
+
json_data = {
|
| 493 |
+
"memory": data.get("memory", ""),
|
| 494 |
+
"promising_hints": data.get("promising_hints", []),
|
| 495 |
+
"objects_seen": data.get("objects_seen", [])
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
# remove promising actions that are already done
|
| 499 |
+
done_actions = self.locations[self.current_location].get("actions_done", set())
|
| 500 |
+
json_data["promising_hints"] = list(
|
| 501 |
+
set(json_data["promising_hints"]) - set(done_actions)
|
| 502 |
+
)
|
| 503 |
+
return json_data
|
| 504 |
+
|
| 505 |
+
except json.JSONDecodeError:
|
| 506 |
+
return {
|
| 507 |
+
"memory": "",
|
| 508 |
+
"promising_hints": [],
|
| 509 |
+
"objects_seen": []
|
| 510 |
+
}
|
| 511 |
+
|
| 512 |
+
|
| 513 |
+
def _build_prompt(self, observation: str) -> str:
|
| 514 |
+
"""Build the prompt for the LLM, using pre-filled memory/objects/actions."""
|
| 515 |
+
current_location_data = self.locations.get(self.current_location, {})
|
| 516 |
+
|
| 517 |
+
prompt = f"""
|
| 518 |
+
OBSERVATION:
|
| 519 |
+
{observation}
|
| 520 |
+
|
| 521 |
+
LOCATION:
|
| 522 |
+
{self.current_location}
|
| 523 |
+
|
| 524 |
+
LOCATION MEMORY:
|
| 525 |
+
{current_location_data.get("memory", "None")}
|
| 526 |
+
|
| 527 |
+
OBJECTS_SEEN:
|
| 528 |
+
{list(current_location_data.get("objects_seen", set()))}
|
| 529 |
+
|
| 530 |
+
PROMISING_HINTS:
|
| 531 |
+
{", ".join(current_location_data.get("promising_hints", []))}
|
| 532 |
+
|
| 533 |
+
VALID_ACTIONS:
|
| 534 |
+
{list(current_location_data.get("valid_actions", set()))}
|
| 535 |
+
|
| 536 |
+
ACTIONS ALREADY DONE IN THIS LOCATION:
|
| 537 |
+
{list(current_location_data.get("actions_done", set()))}
|
| 538 |
+
AVOID REPEATING THESE ACTIONS.
|
| 539 |
+
|
| 540 |
+
HINT:
|
| 541 |
+
{self.forced_prompt_hint if hasattr(self, 'forced_prompt_hint') else ""}
|
| 542 |
+
"""
|
| 543 |
+
return prompt
|
| 544 |
|
| 545 |
+
def _parse_response(self, response: str) -> tuple[str, str, dict]:
|
| 546 |
+
thought = "No reasoning provided"
|
| 547 |
+
tool_name = "play_action"
|
| 548 |
+
tool_args = {"action": "look"}
|
| 549 |
+
lines = response.strip().split("\n")
|
| 550 |
+
for line in lines:
|
| 551 |
+
line_clean = line.strip()
|
| 552 |
+
line_upper = line_clean.upper()
|
| 553 |
+
if line_upper.startswith("THOUGHT:"):
|
| 554 |
+
thought = line_clean.split(":", 1)[1].strip()
|
| 555 |
+
elif line_upper.startswith("TOOL:"):
|
| 556 |
+
raw_tool = line_clean.split(":", 1)[1].strip().lower()
|
| 557 |
+
raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
|
| 558 |
+
raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
|
| 559 |
+
tool_name = raw_tool
|
| 560 |
+
elif line_upper.startswith("ARGS:"):
|
| 561 |
+
args_part = line_clean.split(":", 1)[1].strip()
|
| 562 |
+
try:
|
| 563 |
+
args_part = args_part.replace("'", '"')
|
| 564 |
+
tool_args = json.loads(args_part)
|
| 565 |
+
except json.JSONDecodeError:
|
| 566 |
+
match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
|
| 567 |
+
if match:
|
| 568 |
+
tool_args = {"action": match.group(1)}
|
| 569 |
+
else:
|
| 570 |
+
tool_args = {"action": "look"}
|
| 571 |
+
return thought, tool_name, tool_args
|
| 572 |
+
|
| 573 |
+
def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
|
| 574 |
+
"""Robust tool call validator."""
|
| 575 |
+
|
| 576 |
+
# --------------------------------------------------
|
| 577 |
+
# Ensure tool_args is a dictionary (LLM can hallucinate)
|
| 578 |
+
# --------------------------------------------------
|
| 579 |
+
if not isinstance(tool_args, dict):
|
| 580 |
+
tool_args = {}
|
| 581 |
+
|
| 582 |
+
# --------------------------------------------------
|
| 583 |
+
# Normalize tool name
|
| 584 |
+
# --------------------------------------------------
|
| 585 |
+
tool_name = str(tool_name).lower().strip()
|
| 586 |
+
|
| 587 |
+
tool_alias_map = {
|
| 588 |
+
"action": "play_action",
|
| 589 |
+
"do": "play_action",
|
| 590 |
+
"command": "play_action",
|
| 591 |
+
"map": "get_map",
|
| 592 |
+
"location": "get_map",
|
| 593 |
+
"mem": "memory",
|
| 594 |
+
"state": "memory",
|
| 595 |
+
"status": "memory",
|
| 596 |
+
"inv": "inventory",
|
| 597 |
+
"items": "inventory",
|
| 598 |
+
}
|
| 599 |
+
|
| 600 |
+
if tool_name in tool_alias_map:
|
| 601 |
+
tool_name = tool_alias_map[tool_name]
|
| 602 |
+
|
| 603 |
+
if tool_name not in valid_tools:
|
| 604 |
+
tool_name = "play_action"
|
| 605 |
+
|
| 606 |
+
# --------------------------------------------------
|
| 607 |
+
# Fix play_action argument schema
|
| 608 |
+
# --------------------------------------------------
|
| 609 |
+
if tool_name == "play_action":
|
| 610 |
+
|
| 611 |
+
action = tool_args.get("action")
|
| 612 |
+
|
| 613 |
+
if not isinstance(action, str) or not action:
|
| 614 |
+
action = "look"
|
| 615 |
+
|
| 616 |
+
action = action.lower()
|
| 617 |
+
|
| 618 |
+
# Normalize verb aliases
|
| 619 |
+
invalid_verb_map = {
|
| 620 |
+
"check": "examine",
|
| 621 |
+
"inspect": "examine",
|
| 622 |
+
"search": "look",
|
| 623 |
+
"grab": "take",
|
| 624 |
+
"pick": "take",
|
| 625 |
+
"use": "examine",
|
| 626 |
+
"investigate": "examine",
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
+
words = action.split()
|
| 630 |
+
if words and words[0] in invalid_verb_map:
|
| 631 |
+
words[0] = invalid_verb_map[words[0]]
|
| 632 |
+
action = " ".join(words)
|
| 633 |
+
|
| 634 |
+
# Remove markdown artifacts
|
| 635 |
+
action = action.replace("**", "").replace("*", "").replace("`", "")
|
| 636 |
+
|
| 637 |
+
# Normalize whitespace
|
| 638 |
+
action = " ".join(action.strip().split())
|
| 639 |
+
|
| 640 |
+
tool_args = {"action": action}
|
| 641 |
+
|
| 642 |
+
else:
|
| 643 |
+
# Non-action tools should have empty args
|
| 644 |
+
tool_args = {}
|
| 645 |
+
|
| 646 |
+
return tool_name, tool_args
|
| 647 |
|
| 648 |
+
def _extract_result(self, result) -> str:
|
| 649 |
+
if hasattr(result, 'content') and result.content:
|
| 650 |
+
return result.content[0].text
|
| 651 |
+
if isinstance(result, list) and result:
|
| 652 |
+
return result[0].text if hasattr(result[0], 'text') else str(result[0])
|
| 653 |
+
return str(result)
|
| 654 |
+
|
| 655 |
+
def _update_score(self, text: str) -> None:
|
| 656 |
+
patterns = [r'Score:\s*(\d+)', r'score[:\s]+(\d+)', r'\[Score:\s*(\d+)']
|
| 657 |
+
for pattern in patterns:
|
| 658 |
+
match = re.search(pattern, text, re.IGNORECASE)
|
| 659 |
+
if match:
|
| 660 |
+
self.score = max(self.score, int(match.group(1)))
|
| 661 |
+
|
| 662 |
+
def _is_game_over(self, text: str) -> bool:
|
| 663 |
+
phrases = ["game over","you have died","you are dead","*** you have died ***"]
|
| 664 |
+
text_lower = text.lower()
|
| 665 |
+
return any(p in text_lower for p in phrases)
|
| 666 |
|
|
|
|
|
|
|
|
|
mcp_server.py
CHANGED
|
@@ -24,77 +24,121 @@ Test your server with:
|
|
| 24 |
Then open the MCP Inspector in your browser to test the tools interactively.
|
| 25 |
"""
|
| 26 |
|
|
|
|
| 27 |
import sys
|
| 28 |
import os
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
# Add parent directory to path to import games module
|
| 31 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 32 |
|
| 33 |
from fastmcp import FastMCP
|
| 34 |
from games.zork_env import TextAdventureEnv
|
| 35 |
|
| 36 |
-
|
| 37 |
-
#
|
| 38 |
-
#
|
| 39 |
-
# =============================================================================
|
| 40 |
|
| 41 |
mcp = FastMCP("Student Text Adventure Server")
|
| 42 |
|
| 43 |
|
| 44 |
-
# =========================================================
|
| 45 |
-
# Game State
|
| 46 |
-
# =========================================================
|
| 47 |
|
| 48 |
class GameManager:
|
| 49 |
-
|
| 50 |
-
Manages the text adventure game state.
|
| 51 |
-
|
| 52 |
-
TODO: Extend this class to track:
|
| 53 |
-
- Action history (for memory tool)
|
| 54 |
-
- Explored locations (for mapping)
|
| 55 |
-
- Current score and moves
|
| 56 |
-
"""
|
| 57 |
-
|
| 58 |
def __init__(self):
|
| 59 |
-
self.env: TextAdventureEnv = None
|
| 60 |
self.state = None
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
| 70 |
self.env = TextAdventureEnv(game)
|
| 71 |
self.state = self.env.reset()
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
self.state = self.env.step(action)
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
return self.state.score if self.state else 0
|
| 91 |
-
|
| 92 |
-
def get_moves(self)
|
| 93 |
-
"""Get number of moves taken."""
|
| 94 |
return self.state.moves if self.state else 0
|
| 95 |
|
| 96 |
|
| 97 |
-
#
|
|
|
|
|
|
|
|
|
|
| 98 |
_game = GameManager()
|
| 99 |
|
| 100 |
|
|
@@ -107,10 +151,9 @@ def get_game() -> GameManager:
|
|
| 107 |
_game.initialize(game)
|
| 108 |
return _game
|
| 109 |
|
| 110 |
-
|
| 111 |
-
#
|
| 112 |
-
#
|
| 113 |
-
# =============================================================================
|
| 114 |
|
| 115 |
@mcp.tool()
|
| 116 |
def play_action(action: str) -> str:
|
|
@@ -133,77 +176,107 @@ def play_action(action: str) -> str:
|
|
| 133 |
game = get_game()
|
| 134 |
|
| 135 |
# TODO: You might want to add action validation here
|
| 136 |
-
# TODO: You might want to include score changes in the response
|
| 137 |
|
|
|
|
| 138 |
result = game.step(action)
|
| 139 |
-
|
| 140 |
-
#
|
| 141 |
-
#
|
| 142 |
-
|
| 143 |
return result
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
|
| 146 |
-
#
|
| 147 |
-
|
| 148 |
-
#
|
| 149 |
-
# def memory() -> str:
|
| 150 |
-
# """
|
| 151 |
-
# Get the current game state summary.
|
| 152 |
-
#
|
| 153 |
-
# Returns:
|
| 154 |
-
# A summary including current location, score, moves, and recent history
|
| 155 |
-
# """
|
| 156 |
-
# game = get_game()
|
| 157 |
-
# # TODO: Return useful state information
|
| 158 |
-
# pass
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
# @mcp.tool()
|
| 162 |
-
# def inventory() -> str:
|
| 163 |
-
# """
|
| 164 |
-
# Check what the player is carrying.
|
| 165 |
-
#
|
| 166 |
-
# Returns:
|
| 167 |
-
# List of items in the player's inventory
|
| 168 |
-
# """
|
| 169 |
-
# game = get_game()
|
| 170 |
-
# result = game.step("inventory")
|
| 171 |
-
# return result
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
# @mcp.tool()
|
| 175 |
-
# def get_map() -> str:
|
| 176 |
-
# """
|
| 177 |
-
# Get a map of explored locations.
|
| 178 |
-
#
|
| 179 |
-
# Returns:
|
| 180 |
-
# A text representation of explored locations and connections
|
| 181 |
-
# """
|
| 182 |
-
# game = get_game()
|
| 183 |
-
# # TODO: Return map of explored locations
|
| 184 |
-
# pass
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
# @mcp.tool()
|
| 188 |
-
# def get_valid_actions() -> str:
|
| 189 |
-
# """
|
| 190 |
-
# Get a list of likely valid actions from the current location.
|
| 191 |
-
#
|
| 192 |
-
# Returns:
|
| 193 |
-
# List of actions that might work here
|
| 194 |
-
# """
|
| 195 |
-
# # This is a hint: Jericho provides get_valid_actions()
|
| 196 |
-
# game = get_game()
|
| 197 |
-
# if game.env and game.env.env:
|
| 198 |
-
# valid = game.env.env.get_valid_actions()
|
| 199 |
-
# return "Valid actions: " + ", ".join(valid[:20])
|
| 200 |
-
# return "Could not determine valid actions"
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
# =============================================================================
|
| 204 |
-
# Run the server
|
| 205 |
-
# =============================================================================
|
| 206 |
|
| 207 |
if __name__ == "__main__":
|
| 208 |
-
|
| 209 |
-
mcp.run()
|
|
|
|
| 24 |
Then open the MCP Inspector in your browser to test the tools interactively.
|
| 25 |
"""
|
| 26 |
|
| 27 |
+
|
| 28 |
import sys
|
| 29 |
import os
|
| 30 |
+
import re
|
| 31 |
+
from utils import is_new_location, extract_location
|
| 32 |
|
|
|
|
| 33 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 34 |
|
| 35 |
from fastmcp import FastMCP
|
| 36 |
from games.zork_env import TextAdventureEnv
|
| 37 |
|
| 38 |
+
# =========================================================
|
| 39 |
+
# Server Initialization
|
| 40 |
+
# =========================================================
|
|
|
|
| 41 |
|
| 42 |
mcp = FastMCP("Student Text Adventure Server")
|
| 43 |
|
| 44 |
|
| 45 |
+
# =========================================================
|
| 46 |
+
# Game State Manager
|
| 47 |
+
# =========================================================
|
| 48 |
|
| 49 |
class GameManager:
|
| 50 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
def __init__(self):
|
| 52 |
+
self.env: TextAdventureEnv | None = None
|
| 53 |
self.state = None
|
| 54 |
+
|
| 55 |
+
self.history = []
|
| 56 |
+
self.locations = {}
|
| 57 |
+
self.current_location = ""
|
| 58 |
+
|
| 59 |
+
self.inventory = set()
|
| 60 |
+
|
| 61 |
+
# -----------------------------------------------------
|
| 62 |
+
|
| 63 |
+
def initialize(self, game="zork1"):
|
| 64 |
+
|
| 65 |
self.env = TextAdventureEnv(game)
|
| 66 |
self.state = self.env.reset()
|
| 67 |
+
|
| 68 |
+
self.history.clear()
|
| 69 |
+
self.locations.clear()
|
| 70 |
+
|
| 71 |
+
# Initial observation
|
| 72 |
+
self.state = self.env.step("look")
|
| 73 |
+
|
| 74 |
+
obs = self.state.observation
|
| 75 |
+
|
| 76 |
+
self.current_location = extract_location(obs)
|
| 77 |
+
|
| 78 |
+
self.locations[self.current_location] = {
|
| 79 |
+
"objects": set(),
|
| 80 |
+
"actions": set(),
|
| 81 |
+
"directions": set(),
|
| 82 |
+
"observations": set(),
|
| 83 |
+
"summary": ""
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
self.inventory=set()
|
| 87 |
+
|
| 88 |
+
return obs
|
| 89 |
+
|
| 90 |
+
# -----------------------------------------------------
|
| 91 |
+
|
| 92 |
+
def step(self, action: str):
|
| 93 |
+
|
| 94 |
+
if not self.env:
|
| 95 |
+
return "Game not initialized."
|
| 96 |
+
|
| 97 |
self.state = self.env.step(action)
|
| 98 |
+
|
| 99 |
+
obs = self.state.observation
|
| 100 |
+
|
| 101 |
+
action_lower = action.lower()
|
| 102 |
+
|
| 103 |
+
# Location detection
|
| 104 |
+
if is_new_location(obs, set(self.locations.keys()), "play_action") and action != "inventory":
|
| 105 |
+
previous_location = self.current_location
|
| 106 |
+
|
| 107 |
+
self.current_location = extract_location(obs)
|
| 108 |
+
|
| 109 |
+
self.locations[previous_location]["directions"].add(
|
| 110 |
+
(action_lower, self.current_location)
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
self.locations[self.current_location] = {
|
| 114 |
+
"objects": set(),
|
| 115 |
+
"actions": set(),
|
| 116 |
+
"directions": set(),
|
| 117 |
+
"observations": set(),
|
| 118 |
+
"summary": ""
|
| 119 |
+
}
|
| 120 |
+
|
| 121 |
+
# Track action history (server level only)
|
| 122 |
+
self.history.append((action, obs))
|
| 123 |
+
|
| 124 |
+
if len(self.history) > 20:
|
| 125 |
+
self.history = self.history[-20:]
|
| 126 |
+
|
| 127 |
+
return obs
|
| 128 |
+
|
| 129 |
+
# -----------------------------------------------------
|
| 130 |
+
|
| 131 |
+
def get_score(self):
|
| 132 |
return self.state.score if self.state else 0
|
| 133 |
+
|
| 134 |
+
def get_moves(self):
|
|
|
|
| 135 |
return self.state.moves if self.state else 0
|
| 136 |
|
| 137 |
|
| 138 |
+
# =========================================================
|
| 139 |
+
# Global Game Instance
|
| 140 |
+
# =========================================================
|
| 141 |
+
|
| 142 |
_game = GameManager()
|
| 143 |
|
| 144 |
|
|
|
|
| 151 |
_game.initialize(game)
|
| 152 |
return _game
|
| 153 |
|
| 154 |
+
# =========================================================
|
| 155 |
+
# Tools (Execution Only)
|
| 156 |
+
# =========================================================
|
|
|
|
| 157 |
|
| 158 |
@mcp.tool()
|
| 159 |
def play_action(action: str) -> str:
|
|
|
|
| 176 |
game = get_game()
|
| 177 |
|
| 178 |
# TODO: You might want to add action validation here
|
|
|
|
| 179 |
|
| 180 |
+
# Execute the action
|
| 181 |
result = game.step(action)
|
| 182 |
+
|
| 183 |
+
# TODO: You might want to include score changes in the response
|
| 184 |
+
# Optional: Append score info
|
| 185 |
+
return f"{result}\n\n[Score: {game.get_score()}, Moves: {game.get_moves()}]"
|
| 186 |
return result
|
| 187 |
+
# ---------------------------------------------------------
|
| 188 |
+
|
| 189 |
+
@mcp.tool()
|
| 190 |
+
def memory(query: str = "") -> str:
|
| 191 |
+
"""
|
| 192 |
+
State viewer only.
|
| 193 |
+
No LLM inference.
|
| 194 |
+
"""
|
| 195 |
+
|
| 196 |
+
game = get_game()
|
| 197 |
+
|
| 198 |
+
if not game.state:
|
| 199 |
+
return "Game not initialized."
|
| 200 |
+
|
| 201 |
+
loc = game.current_location
|
| 202 |
+
data = game.locations.get(loc, {})
|
| 203 |
+
|
| 204 |
+
return f"""
|
| 205 |
+
STATE
|
| 206 |
+
Location: {loc}
|
| 207 |
+
Score: {game.get_score()}
|
| 208 |
+
Moves: {game.get_moves()}
|
| 209 |
+
|
| 210 |
+
RECENT HISTORY
|
| 211 |
+
{game.history[-10:]}
|
| 212 |
+
""".strip()
|
| 213 |
+
|
| 214 |
+
# ---------------------------------------------------------
|
| 215 |
+
|
| 216 |
+
@mcp.tool()
|
| 217 |
+
def get_map() -> str:
|
| 218 |
+
"""
|
| 219 |
+
Exploration graph dump.
|
| 220 |
+
"""
|
| 221 |
+
|
| 222 |
+
game = get_game()
|
| 223 |
+
|
| 224 |
+
if not game.locations:
|
| 225 |
+
return "No map discovered."
|
| 226 |
+
|
| 227 |
+
text = "EXPLORED MAP\n"
|
| 228 |
+
|
| 229 |
+
for loc, data in game.locations.items():
|
| 230 |
+
|
| 231 |
+
text += f"\n[{loc}]\n"
|
| 232 |
+
|
| 233 |
+
for direction, dest in data.get("directions", set()):
|
| 234 |
+
text += f" {direction} -> {dest}\n"
|
| 235 |
+
|
| 236 |
+
return text.strip()
|
| 237 |
+
|
| 238 |
+
|
| 239 |
+
# ---------------------------------------------------------
|
| 240 |
+
|
| 241 |
+
@mcp.tool()
|
| 242 |
+
def inventory() -> str:
|
| 243 |
+
"""
|
| 244 |
+
Inventory viewer using the game command.
|
| 245 |
+
"""
|
| 246 |
+
|
| 247 |
+
game = get_game()
|
| 248 |
+
|
| 249 |
+
if not game.env:
|
| 250 |
+
return "Game not initialized."
|
| 251 |
+
|
| 252 |
+
try:
|
| 253 |
+
state = game.env.step("inventory")
|
| 254 |
+
return state.observation
|
| 255 |
+
except Exception:
|
| 256 |
+
return "Unable to retrieve inventory."
|
| 257 |
+
|
| 258 |
+
# ---------------------------------------------------------
|
| 259 |
+
|
| 260 |
+
@mcp.tool()
|
| 261 |
+
def get_valid_actions() -> str:
|
| 262 |
+
"""
|
| 263 |
+
Environment hint helper.
|
| 264 |
+
"""
|
| 265 |
+
|
| 266 |
+
game = get_game()
|
| 267 |
+
|
| 268 |
+
if game.env and game.env.env:
|
| 269 |
+
|
| 270 |
+
valid = game.env.env.get_valid_actions()
|
| 271 |
+
|
| 272 |
+
return ", ".join(valid) if valid else "No valid actions."
|
| 273 |
+
|
| 274 |
+
return "Environment not available."
|
| 275 |
|
| 276 |
|
| 277 |
+
# =========================================================
|
| 278 |
+
# Run Server
|
| 279 |
+
# =========================================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
|
| 281 |
if __name__ == "__main__":
|
| 282 |
+
mcp.run()
|
|
|
requirements.txt
CHANGED
|
@@ -1,9 +1,17 @@
|
|
| 1 |
-
#
|
| 2 |
-
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
#
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core dependencies
|
| 2 |
+
jericho
|
| 3 |
+
python-dotenv
|
| 4 |
+
spacy
|
| 5 |
|
| 6 |
+
torch
|
| 7 |
+
spaces
|
| 8 |
+
transformers
|
| 9 |
+
accelerate
|
| 10 |
|
| 11 |
+
# MCP Server
|
| 12 |
+
fastmcp
|
| 13 |
+
|
| 14 |
+
# Function calling (optional, for the alternative approach)
|
| 15 |
+
langchain-core
|
| 16 |
+
|
| 17 |
+
huggingface_hub
|
utils.py
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from huggingface_hub import InferenceClient
|
| 2 |
+
import os
|
| 3 |
+
from dotenv import load_dotenv
|
| 4 |
+
|
| 5 |
+
load_dotenv()
|
| 6 |
+
|
| 7 |
+
LLM_MODEL = "Qwen/Qwen2.5-7B-Instruct"
|
| 8 |
+
|
| 9 |
+
_hf_token = os.getenv("HF_TOKEN")
|
| 10 |
+
if not _hf_token:
|
| 11 |
+
raise ValueError("HF_TOKEN not found. Set it in your .env file.")
|
| 12 |
+
|
| 13 |
+
LLM_CLIENT = InferenceClient(token=_hf_token)
|
| 14 |
+
|
| 15 |
+
def call_llm(prompt: str, system_prompt: str = "", seed: int = 0, max_tokens: int = 300) -> str:
|
| 16 |
+
messages = []
|
| 17 |
+
|
| 18 |
+
if system_prompt.strip():
|
| 19 |
+
messages.append({"role": "system", "content": system_prompt})
|
| 20 |
+
|
| 21 |
+
messages.append({"role": "user", "content": prompt})
|
| 22 |
+
|
| 23 |
+
response = LLM_CLIENT.chat.completions.create(
|
| 24 |
+
model=LLM_MODEL,
|
| 25 |
+
messages=messages,
|
| 26 |
+
temperature=0.0,
|
| 27 |
+
max_tokens=max_tokens,
|
| 28 |
+
seed=seed,
|
| 29 |
+
)
|
| 30 |
+
|
| 31 |
+
return response.choices[0].message.content
|
| 32 |
+
|
| 33 |
+
def is_new_location(observation: str, known_locations: set, last_tool:str) -> bool:
|
| 34 |
+
if last_tool != "play_action":
|
| 35 |
+
return False
|
| 36 |
+
location = extract_location(observation)
|
| 37 |
+
if location.strip().endswith(('.', '!', '?', ')')) or location in known_locations:
|
| 38 |
+
return False
|
| 39 |
+
return True
|
| 40 |
+
|
| 41 |
+
def extract_location(observation: str) -> str:
|
| 42 |
+
return observation.lower().split("\n")[0].strip()
|