Spaces:

LLM-course
/

Text-game-agent-EILLES

Running

App Files Files Community

stephecw commited on 21 days ago

Commit

e4885d4

verified ·

1 Parent(s): 9dee17a

Upload 6 files

Browse files

Files changed (6) hide show

README.md +53 -7
agent.py +1996 -0
app.py +36 -0
explanations.md +379 -0
mcp_server.py +819 -0
requirements.txt +9 -0

README.md CHANGED Viewed

@@ -1,13 +1,59 @@
 ---
-title: Text Game Agent
-emoji: 🏆
-colorFrom: gray
-colorTo: indigo
 sdk: gradio
-sdk_version: 6.5.1
 app_file: app.py
 pinned: false
-short_description: MCP ReAct aent playing text-based games
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Text Adventure Agent Submission
+emoji: "\U0001F5FA"
+colorFrom: green
+colorTo: blue
 sdk: gradio
+sdk_version: "5.12.0"
 app_file: app.py
 pinned: false
+license: mit
 ---
+# Text Adventure Agent Submission
+## Overview
+This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
+## Approach
+<!-- Describe your approach here -->
+- What strategy does your agent use?
+- What tools did you implement in your MCP server?
+- Any interesting techniques or optimizations?
+## Files
+| File | Description |
+|------|-------------|
+| `agent.py` | ReAct agent with `StudentAgent` class |
+| `mcp_server.py` | MCP server with game interaction tools |
+| `app.py` | Gradio interface for HF Space |
+| `requirements.txt` | Additional dependencies |
+## How to Submit
+1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
+2. Clone your fork locally
+3. Implement your agent in `agent.py` and `mcp_server.py`
+4. Test locally (see below)
+5. Push your changes to your Space
+6. Submit your Space URL on the course platform
+## Local Testing
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Test the MCP server interactively
+fastmcp dev mcp_server.py
+# Run your agent on a game
+python run_agent.py --agent . --game lostpig -v -n 20
+# Run evaluation
+python -m evaluation.evaluate -s . -g lostpig -t 3
+```

agent.py ADDED Viewed

	@@ -0,0 +1,1996 @@

+"""
+Student Agent for Text Adventure Games
+This is your submission file. Implement the StudentAgent class to play
+text adventure games using the MCP server you also implement.
+Your agent should:
+1. Connect to the MCP server via the provided client
+2. Use the ReAct pattern (Thought -> Action -> Observation)
+3. Call MCP tools to interact with the game
+4. Maximize the game score within the step limit
+Required method:
+    async def run(self, client, game, max_steps, seed, verbose) -> RunResult
+The 'client' is a FastMCP Client already connected to your MCP server.
+Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
+Tips:
+- Start by looking around and understanding your environment
+- Keep track of visited locations to avoid loops
+- Pick up useful items (lamp, sword, etc.)
+- The seed parameter should be used to set your LLM's seed for reproducibility
+"""
+import json
+import os
+import re
+from dataclasses import dataclass, field
+from typing import Optional
+from collections import deque
+from dotenv import load_dotenv
+from huggingface_hub import InferenceClient
+# Load environment variables
+load_dotenv()
+# =============================================================================
+# LLM Configuration - DO NOT MODIFY
+# =============================================================================
+# Model to use (fixed for fair evaluation)
+LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+# Initialize the LLM client (uses HF_TOKEN from environment)
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
+def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    """
+    Call the LLM with the given prompt. Use this function in your agent.
+    Args:
+        prompt: The user prompt (current game state, history, etc.)
+        system_prompt: The system prompt (instructions for the agent)
+        seed: Random seed for reproducibility
+        max_tokens: Maximum tokens in response (default: 300)
+    Returns:
+        The LLM's response text
+    Example:
+        response = call_llm(
+            prompt="You are in a forest. What do you do?",
+            system_prompt=SYSTEM_PROMPT,
+            seed=42,
+        )
+    """
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": prompt},
+    ]
+    response = LLM_CLIENT.chat.completions.create(
+        model=LLM_MODEL,
+        messages=messages,
+        temperature=0.0,  # Deterministic for reproducibility
+        max_tokens=max_tokens,
+        seed=seed,
+    )
+    return response.choices[0].message.content
+SYNTH_SYSTEM = """You are a memory manager for a Zork-like agent.
+Your job: compress recent experience into DECISION-USEFUL memory: what happened AND why it matters.
+Return ONLY valid JSON with keys:
+facts, blocking, inventory_goals, open_threads, visited, last_update_move.
+Rules:
+- Each entry must be <= 12 words, start with a strong noun/verb, no filler.
+- Prefer durable, actionable info: "X is locked -> need key" beats "saw a door".
+- Do NOT restate raw room description unless it implies a new affordance.
+- Track failed attempts as blocking only if they should NOT be retried.
+- If something changed (inventory, location access, score), capture the consequence.
+- Deduplicate aggressively across all lists.
+- Keep each list length <= 6. Keep only highest-value items.
+Interpretation guide:
+facts: stable world knowledge learned (locations, items, mechanics).
+blocking: obstacles + what is needed; include "avoid retry" if relevant.
+inventory_goals: items/tools to seek next (lamp, key, etc.).
+open_threads: unresolved leads worth returning to.
+visited: important locations only (not every room).
+last_update_move: copy STATE_MINIMAL.moves if present; else use prior value.
+"""
+def build_synth_prompt(mem_json, recent_history, state_obj):
+    state_obj = state_obj or {}
+    minimal_state = {
+        "location": state_obj.get("location"),
+        "moves": state_obj.get("moves"),
+        "score": state_obj.get("score"),
+        "inventory": state_obj.get("inventory"),
+        "visible_objects": state_obj.get("visible_objects"),
+        "last_observation_head": (state_obj.get("last_observation") or "")[:220],
+    }
+    return f"""
+CURRENT_MEMORY_JSON:
+{json.dumps(mem_json, ensure_ascii=False)}
+RECENT_STEPS (action -> observation head):
+{recent_history}
+STATE_MINIMAL (json):
+{json.dumps(minimal_state, ensure_ascii=False)}
+Update the memory JSON. Only output JSON.
+"""
+PLANNER_SYSTEM = """You are an objective planner for a Zork-like agent.
+You DO NOT act in the game. You only output a plan.
+Return ONLY valid JSON with keys:
+- objectives: list of {type, description, priority, status, evidence}
+- suggested_actions: list of strings (game commands)
+- notes: short string
+Rules:
+- Keep objectives <= 8, deduplicate, prefer durable goals.
+- priority: 0 (highest) .. 5 (lowest)
+- status: "open" | "done" | "blocked"
+- evidence: <= 12 words
+- suggested_actions: max 3 actions; MUST respect the agent command grammar.
+- Use short nouns from observation (mailbox, leaflet, grating, egg, etc.)
+- If valid_actions_list is provided, prefer actions from it exactly.
+"""
+def build_planner_prompt(
+    observation: str,
+    state_obj: dict,
+    synth_memory: dict,
+    objectives_text: str,
+    valid_actions_list: list[str],
+    tried_here: list[str],
+) -> str:
+    return f"""
+OBSERVATION:
+{observation}
+STATE (json):
+{json.dumps(state_obj or {}, ensure_ascii=False)}
+SYNTH_MEMORY (json):
+{json.dumps(synth_memory or {}, ensure_ascii=False)}
+CURRENT_OBJECTIVES (text):
+{objectives_text}
+VALID_ACTIONS_LIST:
+{json.dumps(valid_actions_list or [], ensure_ascii=False)}
+TRIED_ACTIONS_HERE:
+{json.dumps(tried_here or [], ensure_ascii=False)}
+Update objectives and propose up to 3 suggested_actions.
+Output ONLY JSON.
+"""
+@dataclass
+class RunResult:
+    """Result of running the agent. Do not modify this class."""
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: set[str]
+    game_completed: bool
+    error: Optional[str] = None
+    history: list[tuple[str, str, str]] = field(default_factory=list)
+@dataclass
+class Objective:
+    id: str
+    type: str               # "explore", "get_item", "unlock", "solve", "return"
+    description: str
+    priority: int           # 0 = top
+    status: str             # "open" | "done" | "blocked"
+    evidence: list[str]     # traces courtes
+class ObjectiveManager:
+    def __init__(self):
+        self.objectives = []
+        self.counter = 0
+    def add(self, type_, desc, priority, evidence=""):
+        oid = f"obj{self.counter}"
+        self.counter += 1
+        self.objectives.append(Objective(oid, type_, desc, priority, "open", [evidence] if evidence else []))
+    def update_from_observation(self, obs: str, state_obj: dict):
+        low = (obs or "").lower()
+        vis = [str(x).lower() for x in (state_obj.get("visible_objects") or [])]
+        inv = " ".join([str(x).lower() for x in (state_obj.get("inventory") or [])])
+        # darkness
+        if "dark" in low and "lamp" not in inv and "lantern" not in inv:
+            if not self._has_open("get_item", "lamp"):
+                self.add("get_item", "Find a lamp/lantern", priority=0, evidence="It is dark")
+        # grating locked
+        if "grating" in low and "locked" in low:
+            if not self._has_open("get_item", "key"):
+                self.add("get_item", "Find a key", priority=1, evidence="Grating locked")
+            if not self._has_open("unlock", "grating"):
+                self.add("unlock", "Unlock the grating", priority=2, evidence="Grating locked")
+        # containers
+        for c in ["mailbox", "chest", "box"]:
+            if c in low and not self._has_open("open", c):
+                self.add("open", f"Open the {c}", priority=2, evidence=f"Seen {c}")
+        # visited
+        loc = (state_obj.get("location") or "").strip()
+        if loc and not self._has_open("visit", loc) and not self._has_done("visit", loc):
+            # pas forcément un "objectif", mais utile si tu veux "return"
+            pass
+    def propose_actions(self, state_obj: dict, valid_actions_list: list[str]) -> list[str]:
+        """Return ordered action candidates."""
+        cands = []
+        # Sort objectives by priority then FIFO
+        open_objs = sorted([o for o in self.objectives if o.status == "open"], key=lambda o: (o.priority, o.id))
+        for o in open_objs[:3]:
+            if o.type == "get_item":
+                target = "lamp" if "lamp" in o.description.lower() else "key"
+                # propose "take lamp" if visible
+                vis = [str(x).lower() for x in (state_obj.get("visible_objects") or [])]
+                if target in vis:
+                    cands.append(f"take {target}")
+            elif o.type == "open":
+                noun = o.description.split()[-1]
+                cands.append(f"open {noun}")
+            elif o.type == "unlock":
+                noun = o.description.split()[-1]
+                cands.append(f"unlock {noun}")
+        if valid_actions_list:
+            va = {re.sub(r"\s+", " ", a.strip().lower()) for a in valid_actions_list}
+            cands = [a for a in cands if re.sub(r"\s+", " ", a.strip().lower()) in va]
+        return cands
+    def _has_open(self, type_, keyword):
+        k = keyword.lower()
+        return any(o.status=="open" and o.type==type_ and k in o.description.lower() for o in self.objectives)
+    def _has_done(self, type_, keyword):
+        k = keyword.lower()
+        return any(o.status=="done" and o.type==type_ and k in o.description.lower() for o in self.objectives)
+    def render(self, k: int = 6) -> str:
+        open_objs = sorted([o for o in self.objectives if o.status == "open"], key=lambda o: (o.priority, o.id))
+        if not open_objs:
+            return "- (none)"
+        lines = []
+        for o in open_objs[:k]:
+            lines.append(f"- [{o.priority}] {o.type}: {o.description}")
+        return "\n".join(lines)
+    def mark_done_if_progress(self, before_state: dict, after_state: dict, action: str, obs: str):
+        a = (action or "").lower().strip()
+        low = (obs or "").lower()
+        # mark "open X" as done if it didn't say "closed/locked/can't"
+        if a.startswith("open "):
+            noun = a.split(" ", 1)[1]
+            if "can't" not in low and "locked" not in low and "does not" not in low:
+                for o in self.objectives:
+                    if o.status == "open" and o.type == "open" and noun in o.description.lower():
+                        o.status = "done"
+    def replace_from_llm(self, llm_objectives: list[dict]):
+        """
+        Replace internal objectives with the list coming from the planner LLM.
+        Expected dict keys: type, description, priority, status, evidence
+        """
+        self.objectives = []
+        self.counter = 0
+        if not llm_objectives:
+            return
+        for o in llm_objectives[:12]:
+            try:
+                type_ = str(o.get("type", "explore")).strip()
+                desc = str(o.get("description", "")).strip()
+                if not desc:
+                    continue
+                pr = int(o.get("priority", 3))
+                st = str(o.get("status", "open")).strip()
+                ev = o.get("evidence", "")
+                if isinstance(ev, list):
+                    evidence = [str(x)[:80] for x in ev[:2]]
+                else:
+                    evidence = [str(ev)[:80]] if ev else []
+                oid = f"obj{self.counter}"
+                self.counter += 1
+                self.objectives.append(Objective(
+                    id=oid,
+                    type=type_,
+                    description=desc,
+                    priority=max(0, min(pr, 5)),
+                    status=st if st in {"open", "done", "blocked"} else "open",
+                    evidence=evidence
+                ))
+            except Exception:
+                continue
+# =============================================================================
+# System Prompt - Customize this for your agent
+# =============================================================================
+SYSTEM_PROMPT = """You are playing a Zork-style text adventure.
+GOAL:
+Explore, solve puzzles, collect treasures, maximize score.
+YOU CONTROL THE GAME ONLY USING TOOLS.
+You never speak to the game directly.
+============================================================
+TOOLS (ONLY THESE EXIST)
+- play_action
+- memory
+- get_map
+- inventory
+- valid_actions
+- tried_actions
+- hint
+- state
+- exits
+- graph
+- checkpoint_save
+- checkpoint_restore
+- action_probe
+ARGS RULE:
+- play_action -> {"action": "<command>"}
+- checkpoint_save/checkpoint_restore -> {"name": "<string>"}  (optional)
+- action_probe -> {"action": "<command>"}
+- all others -> {}
+ABSOLUTE TOOL RULE:
+TOOL must be exactly one of the 13 names above.
+Everything else (look, north, open mailbox, etc.) is a GAME COMMAND used only with play_action.
+============================================================
+OUTPUT FORMAT (MANDATORY, EXACT)
+THOUGHT: <1 short sentence>
+TOOL: <tool_name>
+ARGS: <json>
+============================================================
+COMMAND GRAMMAR
+Normally, your play_action command CAN be one of:
+A) Movement (single word only):
+north / south / east / west / up / down / in / out / northeast / northwest / southeast / southwest
+B) Simple verb + noun (2–3 words max):
+look
+inventory
+take <noun>
+drop <noun>
+open <noun>
+examine <noun>
+read <noun>
+climb <noun>
+enter <noun>
+pull <noun>
+push <noun>
+unlock <noun>
+FORBIDDEN (never use):
+- "look around"
+- "go north", "go west", "go northwest"
+- "look south"
+- placeholders like "<item>", "<thing>", "<object>"
+SPECIAL EXCEPTION:
+If (and only if) you previously called valid_actions, you may use a multi-word command ONLY if it appears EXACTLY in that valid_actions list.
+Example: if valid_actions includes "go around forest", then you may use "go around forest".
+Otherwise, do not invent it.
+NOUN RULE:
+Use the shortest noun from the latest observation (egg, nest, tree, grating, mailbox, leaflet).
+Do not invent adjectives (say "egg", not "jewel-encrusted egg").
+============================================================
+TURN POLICY (ANTI-SPAM)
+- Default tool is play_action.
+- Do NOT call valid_actions unless you hit an error or you are stuck.
+- Do NOT call memory unless confused. Never call memory twice in a row.
+- get_map is occasional (only if lost).
+- Call tried_actions only when stuck/looping OR when you have valid_actions and you want to pick a NEW action not yet tried in this location.
+- Call hint when you are stuck or after a parser failure / loop OR after a special description with new possibilities.
+============================================================
+TREASURE RULE (CRITICAL)
+If you see something valuable/rare (jewels, gold, treasure, ornate, precious, encrusted, crystal, egg, crown, painting):
+YOUR NEXT ACTION MUST BE: take <item>.
+Secure it first. Open/examine later.
+If you try open/examine and the game says locked / no tools / no expertise:
+STOP trying. KEEP the item. Leave to search for tools/keys elsewhere.
+Do not retry the same blocked action.
+============================================================
+LOCAL BEFORE LEAVING (CRITICAL)
+When entering a location:
+1) If full description is not shown, do: look
+2) Interact locally ONCE with the most important object(s):
+   - take treasure
+   - open container
+   - examine new object
+3) Only then move.
+============================================================
+VALID_ACTIONS EXPLORATION (IMPORTANT)
+When you have a valid_actions list for the current location:
+- Before leaving the location, try at most 1–2 NEW high-value actions from that list that you have not tried here yet.
+- High-value actions (try in this order): take*, open*, unlock*, enter*, climb*, up, down, pull*, push*, read*, examine*.
+- Avoid low-value management actions unless clearly needed: "put down ...", "put ... in ...", "close ...".
+- Never repeat the same action in the same location if it produced no progress or an error message.
+- Exception: you may retry an action ONLY if your inventory has changed since the last attempt.
+- Use tried_actions to know which actions you already attempted in this location.
+============================================================
+EXPLICIT POSSIBILITY OVERRIDE (CRITICAL)
+If the observation explicitly says something is possible/available
+(e.g., "It is possible to climb down", "You can enter", "A door leads ..."):
+TRY the corresponding canonical command EVEN IF it is not listed in valid_actions.
+Mapping (canonical):
+- "possible to climb down" / "climb down" / "descend" -> down
+- "possible to climb up" / "climb up" / "ascend" -> up
+- "possible to enter" / "you can enter" / "way in" / "entrance" -> in
+- "possible to leave" / "way out" -> out
+Do this only once per location; if it fails, do not spam it—switch strategy or call valid_actions.
+============================================================
+MOVEMENT PRIORITY (IMPORTANT)
+If you decide to MOVE and multiple movement actions are available, use this priority order:
+1) Prefer "in", then "up", then "down" (these often unlock new areas/puzzles).
+2) Then prefer a movement you have NOT tried recently from this location.
+3) Only then choose cardinal directions: north / east / south / west (and diagonals if present).
+Notes:
+- This is only a preference when you are moving (not a rule to always move).
+- If you just arrived in a room, follow LOCAL BEFORE LEAVING first (look + one local interaction), then move.
+Examples:
+- If valid_actions includes: in, up, north, east -> choose "in" (unless you just tried it and it failed).
+- If valid_actions includes: up, north, south -> choose "up" (unless you just tried it and it failed).
+============================================================
+EXAMINE POLICY (ANTI-SPAM, CRITICAL)
+- Do NOT use "examine X" as a default action.
+- Use "examine X" ONLY if:
+  A) X is NEW in the latest observation, OR
+  B) X looks interactive/blocking (door, window, grating, trapdoor, gate, chest, mailbox, leaves/pile, rope, lever, button), OR
+  C) you just got a blocking message ("locked", "not enough to allow entry", "can't", etc.) and you need more detail.
+- If the game replied "nothing special" (or equivalent) for the same object at the same location:
+  DO NOT examine it again there. Change strategy (open/take/pull/enter/move).
+- Informational items (leaflet, note, inscription):
+  Read/examine ONCE, then ignore. Never put them in containers.
+============================================================
+============================================================
+ERROR RECOVERY (CRITICAL)
+If the game replies:
+- "I don't know the word ..."
+- "That sentence isn't one I recognize"
+- "You can't see any X here"
+- "locked" / "no tools"
+Then:
+1) Do NOT repeat the same command.
+2) Simplify: shorter noun, simpler verb (look / examine <noun> / take <noun>).
+3) If still stuck: call valid_actions {} ONCE, then pick ONE action from that list.
+============================================================
+ANTI-LOOP (CRITICAL)
+If you already tried the same interaction with the same object and it gave no progress:
+STOP interacting with that object.
+Move to a new location.
+Informational items:
+If an item only prints text (like a leaflet), read once then ignore it (do not put in containers, do not shuffle it).
+============================================================
+EXAMPLES
+THOUGHT: There is a mailbox; it may contain something.
+TOOL: play_action
+ARGS: {"action": "open mailbox"}
+THOUGHT: Valuable item spotted; secure it first.
+TOOL: play_action
+ARGS: {"action": "take egg"}
+THOUGHT: My last command failed; I need valid options.
+TOOL: valid_actions
+ARGS: {}
+============================================================
+STRATEGY TIPS
+1. Explore systematically, but prefer in/up/down if available; otherwise try one new direction at a time.
+2. Read documents once. Examine only new/blocking/valuable objects.
+3. Use get_map() to track explored locations
+4. Light is essential - find a light source before dark areas!
+5. Manage inventory - you can only carry limited items
+"""
+# =============================================================================
+# Student Agent - IMPLEMENT THIS CLASS
+# =============================================================================
+class StudentAgent:
+    """
+    Your ReAct agent implementation.
+    TODO:
+    1. Implement the run() method with the ReAct loop
+    2. Parse LLM responses to extract tool calls
+    3. Track state and avoid loops
+    Use the provided call_llm() function to interact with the LLM.
+    """
+    def __init__(self):
+        """Initialize your agent here."""
+        # Internal trace (used to build prompts)
+        # Each entry: {"thought": str, "tool": str, "args": dict, "result": str}
+        self.history: list[dict] = []
+        # Stats/state (for RunResult)
+        self.locations_visited: set[str] = set()
+        # Track room changes (so we can reset recommendation cache on new rooms)
+        self._last_room_line: str | None = None
+        # Keep last play_action to prevent trivial repeats
+        self._last_action: str | None = None
+        # Loop detection based on server state hash
+        self._recent_state_hashes = deque(maxlen=20)
+        # Checkpoint management
+        self._checkpoint_enabled = True
+        self._checkpoint_best = "best"
+        self._checkpoint_loop = "loop"
+        self._last_score_seen: int | None = None
+        # synthetic memory
+        self.synth_memory = {
+        "facts": [],
+        "blocking": [],
+        "inventory_goals": [],
+        "open_threads": [],
+        "visited": [],
+        "last_update_move": 0
+        }
+        # objective manager
+        self.objman = ObjectiveManager()
+        # LLM planner cache
+        self._planner_last_step = 0
+        self._planner_cooldown = 5  # run planner at most every 5 steps (tweak)
+        self._planner_suggested_actions: list[str] = []
+        self._planner_notes: str = ""
+    async def run(
+        self,
+        client,  # FastMCP Client connected to your MCP server
+        game: str,
+        max_steps: int,
+        seed: int,
+        verbose: bool = False,
+    ) -> RunResult:
+        """
+        Run the agent for a game session.
+        Args:
+            client: FastMCP Client connected to your MCP server
+            game: Name of the game being played (e.g., "zork1")
+            max_steps: Maximum number of steps to take
+            seed: Random seed for reproducibility (use for LLM calls)
+            verbose: Whether to print detailed output
+        Returns:
+            RunResult with final score and statistics
+        """
+        # Utilities for robustness
+        def _tool_text(res) -> str:
+            """
+            FastMCP returns different shapes depending on version:
+            - sometimes an object with .content[0].text
+            - sometimes a list of parts with .text
+            - sometimes already a string
+            """
+            if res is None:
+                return ""
+            if isinstance(res, str):
+                return res
+            if isinstance(res, dict):
+                return json.dumps(res)
+            # Newer fastmcp style: result.content[0].text
+            content = getattr(res, "content", None)
+            if content:
+                try:
+                    if isinstance(content, list) and content and hasattr(content[0], "text"):
+                        return content[0].text or ""
+                except Exception:
+                    pass
+            # Older / alternate: list of parts
+            if isinstance(res, list) and res:
+                try:
+                    if hasattr(res[0], "text"):
+                        return res[0].text or ""
+                except Exception:
+                    pass
+            # Fallback
+            return str(res)
+        def _extract_location(obs: str) -> str | None:
+            """Heuristic: first plausible room-title line."""
+            if not obs:
+                return None
+            for line in obs.splitlines():
+                s = line.strip()
+                if not s:
+                    continue
+                low = s.lower()
+                # Skip common headers
+                if low.startswith(("copyright", "revision", "serial number")):
+                    continue
+                if "trademark" in low:
+                    continue
+                # Zork titles: short, not a full sentence
+                if len(s) > 50:
+                    continue
+                if s.endswith((".", "!", "?", ":", ";")):
+                    continue
+                bad_starts = (
+                    "you ", "it ", "i ", "there ", "the ", "a ", "an ",
+                    "what ", "can't ", "i don't", "unknown", "error"
+                )
+                if low.startswith(bad_starts):
+                    continue
+                return s
+            return None
+        def _parse_score_moves_from_memory(mem: str) -> tuple[int | None, int | None]:
+            """Parse lines like 'Score: X' / 'Moves: Y' (best-effort)."""
+            if not mem:
+                return (None, None)
+            score = None
+            moves = None
+            m = re.search(r"\bScore:\s*(\d+)\b", mem)
+            if m:
+                score = int(m.group(1))
+            m = re.search(r"\bMoves:\s*(\d+)\b", mem)
+            if m:
+                moves = int(m.group(1))
+            return (score, moves)
+        async def _force_valid_actions_feedback(msg: str) -> str:
+            """
+            Retourne un feedback + la liste des actions valides (si dispo).
+            """
+            va_text = ""
+            if "valid_actions" in available_tool_names:
+                try:
+                    va_text = _tool_text(await client.call_tool("valid_actions", {}))
+                except Exception as e:
+                    va_text = f"(valid_actions failed: {e})"
+            return f"{msg}\n\nValid actions:\n{va_text}".strip()
+        def should_summarize(step_idx, observation, state_obj):
+            if step_idx % 10 == 0:
+                return True
+            low = (observation or "").lower()
+            triggers = [
+                "locked", "dark", "can't", "you don't know", "you can't see",
+                "grating", "trapdoor", "door", "key", "lamp"
+            ]
+            return any(t in low for t in triggers)
+        def _print_step(step_idx: int, thought: str, tool: str, args: dict):
+            if not verbose:
+                return
+            print("\n" + "─" * 40)
+            print(f"Step {step_idx}/{max_steps}")
+            print("THOUGHT:", thought)
+            print("TOOL:", tool)
+            print("ARGS:", args)
+        def _parse_points_from_obs(obs: str) -> tuple[int | None, int | None]:
+            """
+            Returns (delta_points, total_points) if present in observation, else (None, None)
+            Matches patterns like: "+10 points! (Total: 15)"
+            """
+            if not obs:
+                return (None, None)
+            delta = None
+            total = None
+            m = re.search(r"\+(\d+)\s*point(?:s)?!", obs)
+            if m:
+                delta = int(m.group(1))
+            m = re.search(r"\(Total:\s*(\d+)\)", obs)
+            if m:
+                total = int(m.group(1))
+            return (delta, total)
+        # Discover tools
+        default_tools = {
+            "play_action",
+            "memory",
+            "inventory",
+            "get_map",
+            "valid_actions",
+            "tried_actions",
+            "hint",
+            "state",
+            "exits",
+            "graph",
+            "checkpoint_save",
+            "checkpoint_restore",
+            "action_probe"
+        }
+        available_tool_names = set(default_tools)
+        if hasattr(client, "list_tools"):
+            try:
+                tools = await client.list_tools()
+                # tools can be list of objects with `.name`
+                names = []
+                for t in tools or []:
+                    n = getattr(t, "name", None)
+                    if n:
+                        names.append(n.strip().lower())
+                if names:
+                    available_tool_names = set(names)
+            except Exception:
+                # If list_tools isn't available / fails, keep defaults
+                pass
+        # session initialization
+        self.history.clear()
+        self.locations_visited.clear()
+        self._last_room_line = None
+        self._last_action = None
+        self._recent_state_hashes.clear()
+        self._last_score_seen = None
+        self.objman = ObjectiveManager()
+        # 1) Initial look
+        try:
+            res = await client.call_tool("play_action", {"action": "look"})
+            observation = _tool_text(res)
+        except Exception as e:
+            return RunResult(
+                final_score=0,
+                max_score=350 if game == "zork1" else 0,
+                moves=0,
+                locations_visited=set(),
+                game_completed=False,
+                error=f"Initial call_tool failed: {e}",
+                history=[],
+            )
+        loc = _extract_location(observation)
+        if loc:
+            self.locations_visited.add(loc)
+            self._last_room_line = loc.strip().lower()
+        # Save an initial checkpoint if supported (for loop recovery)
+        if self._checkpoint_enabled and "checkpoint_save" in available_tool_names:
+            try:
+                await client.call_tool("checkpoint_save", {"name": self._checkpoint_loop})
+            except Exception:
+                pass
+        if verbose:
+            print("=" * 60)
+            print(f"Starting agent on game={game} | max_steps={max_steps}")
+            print("=" * 60)
+            print("\nInitial observation:\n", observation)
+        # If we detect loops / no-ops, force valid actions next
+        force_valid_actions_next = False
+        # Track run history for grading
+        run_history: list[tuple[str, str, str]] = []
+        # We'll keep best-known score/moves
+        best_score: int | None = None
+        best_moves: int | None = None
+        # anti-loop state
+        # actions that produced "no progress" recently in this location
+        blocked_actions_by_loc: dict[str, set[str]] = {}
+        # recommended actions in this location (to avoid repeating same suggestions)
+        recommended_actions_by_loc: dict[str, set[str]] = {}
+        def _result_short(txt: str) -> str:
+            return re.sub(r"\s+", " ", (txt or "").strip())[:180]
+        def _cur_loc_key() -> str:
+            # use last known room line if available, else fallback to empty
+            if self._last_room_line:
+                return self._last_room_line
+            return "unknown:" + _result_short(observation)
+        def _is_no_progress_result(txt: str) -> bool:
+            t = (txt or "").lower()
+            triggers = [
+                "but thing not happen",
+                "not see any way",
+                "too heavy",
+                "stuck",
+                "not happen",
+                "that not thing",
+                "grunk not see that there",
+                "you can't see any",
+                "not know how",
+                "nothing special",
+            ]
+            return any(x in t for x in triggers)
+        # 2) ReAct loop
+        for step_idx in range(1, max_steps + 1):
+            memory_text = None
+            map_text = None
+            valid_actions_text = None
+            hint_text = None
+            state_text = None
+            state_obj = None
+            if "state" in available_tool_names:
+                try:
+                    state_text = _tool_text(await client.call_tool("state", {}))
+                    state_obj = json.loads(state_text) if state_text else None
+                except Exception:
+                    state_obj = None
+            # update objectives from latest observation/state
+            if state_obj and isinstance(state_obj, dict):
+                try:
+                    self.objman.update_from_observation(observation, state_obj)
+                except Exception:
+                    pass
+            # deterministic overrides (before LLM)
+            if state_obj and isinstance(state_obj, dict):
+                visible = [str(x).lower() for x in (state_obj.get("visible_objects") or [])]
+                inv = " ".join([str(x).lower() for x in (state_obj.get("inventory") or [])])
+                # treasure rule (simple keyword scan)
+                treasure_words = {"treasure","gold","jewel","jewels","diamond","emerald","ruby","sapphire","crown","painting","egg","crystal"}
+                if any(w in visible for w in treasure_words):
+                    # pick first matching visible object
+                    item = next((x for x in visible if x in treasure_words), None)
+                    if item:
+                        # force play_action take <item>
+                        tool_name = "play_action"
+                        tool_args = {"action": f"take {item}"}
+                        thought = "Valuable item spotted; secure it first."
+                        # execute immediately (skip LLM)
+                        _print_step(step_idx, thought, tool_name, tool_args)
+                        res = await client.call_tool(tool_name, tool_args)
+                        observation = _tool_text(res)
+                        self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                        run_history.append((thought, f"{tool_name} {json.dumps(tool_args)}", observation))
+                        continue
+                # darkness handling
+                obs_low = (state_obj.get("last_observation") or "").lower()
+                if "dark" in obs_low and ("lamp" in inv or "lantern" in inv):
+                    # try turning on if valid_actions allows it, else skip
+                    # (only do this if you decide to allow multi-word exotic via valid_actions)
+                    pass
+            # Loop detection using state_hash (server-side)
+            if isinstance(state_obj, dict) and state_obj.get("state_hash"):
+                h = str(state_obj["state_hash"])
+                self._recent_state_hashes.append(h)
+                # If the exact same hash repeats 3 times IN A ROW, we are looping
+                if len(self._recent_state_hashes) >= 3 and \
+                self._recent_state_hashes[-1] == self._recent_state_hashes[-2] == self._recent_state_hashes[-3]:
+                    force_valid_actions_next = True
+                    # Try to rollback to last checkpoint if possible (loop detected)
+                    if self._checkpoint_enabled and "checkpoint_restore" in available_tool_names:
+                        try:
+                            # --- DEBUG: before restore ---
+                            if verbose:
+                                before_score = state_obj.get("score") if isinstance(state_obj, dict) else None
+                                before_moves = state_obj.get("moves") if isinstance(state_obj, dict) else None
+                                before_hash = state_obj.get("state_hash") if isinstance(state_obj, dict) else None
+                                print(
+                                    f"[DEBUG] RESTORE requested checkpoint={self._checkpoint_loop} "
+                                    f"at step={step_idx} (before score={before_score}, moves={before_moves}, hash={before_hash})"
+                                )
+                            # --- restore checkpoint (capture tool output) ---
+                            restore_res = await client.call_tool("checkpoint_restore", {"name": self._checkpoint_loop})
+                            if verbose:
+                                print("[DEBUG] checkpoint_restore result:", _tool_text(restore_res))
+                            # --- refresh observation after restore ---
+                            res = await client.call_tool("play_action", {"action": "look"})
+                            observation = _tool_text(res)
+                            self._recent_state_hashes.clear()
+                            # --- fetch state AFTER restore (so debug + correct state_obj) ---
+                            after_state_obj = None
+                            if "state" in available_tool_names:
+                                try:
+                                    st = _tool_text(await client.call_tool("state", {}))
+                                    after_state_obj = json.loads(st) if st else None
+                                    if isinstance(after_state_obj, dict):
+                                        state_obj = after_state_obj
+                                        try:
+                                            if isinstance(after_state_obj, dict) and "score" in after_state_obj:
+                                                self._last_score_seen = int(after_state_obj["score"])
+                                        except Exception:
+                                            pass
+                                    if verbose:
+                                        after_score = after_state_obj.get("score") if isinstance(after_state_obj, dict) else None
+                                        after_moves = after_state_obj.get("moves") if isinstance(after_state_obj, dict) else None
+                                        after_hash = after_state_obj.get("state_hash") if isinstance(after_state_obj, dict) else None
+                                        print(
+                                            f"[DEBUG] state after restore: score={after_score}, moves={after_moves}, hash={after_hash}"
+                                        )
+                                except Exception as e:
+                                    after_state_obj = None
+                                    if verbose:
+                                        print("[DEBUG] state after restore failed:", e)
+                            # --- optional: mark objective progress using AFTER state (not before) ---
+                            try:
+                                if isinstance(after_state_obj, dict):
+                                    self.objman.mark_done_if_progress({}, after_state_obj, self._last_action or "", observation)
+                            except Exception:
+                                pass
+                        except Exception as e:
+                            if verbose:
+                                print("[DEBUG] checkpoint_restore block failed:", e)
+                            pass
+            # Occasional tools (only if available)
+            if step_idx % 10 == 0 and "memory" in available_tool_names:
+                try:
+                    memory_text = _tool_text(await client.call_tool("memory", {}))
+                    s, m = _parse_score_moves_from_memory(memory_text)
+                    if s is not None:
+                        best_score = s
+                    if m is not None:
+                        best_moves = m
+                except Exception:
+                    memory_text = None
+            if (force_valid_actions_next or step_idx % 25 == 0) and "get_map" in available_tool_names:
+                try:
+                    map_text = _tool_text(await client.call_tool("get_map", {}))
+                except Exception:
+                    map_text = None
+            if force_valid_actions_next and "hint" in available_tool_names:
+                try:
+                    hint_text = _tool_text(await client.call_tool("hint", {}))
+                except Exception:
+                    hint_text = None
+            tried_here_cached: set[str] | None = None
+            # Forced valid_actions on loop / parser failure
+            force_before = force_valid_actions_next
+            if force_valid_actions_next and "valid_actions" in available_tool_names:
+                try:
+                    valid_actions_text = _tool_text(await client.call_tool("valid_actions", {}))
+                    va_list = self._extract_valid_actions(valid_actions_text)
+                    tried_here = set()
+                    if "tried_actions" in available_tool_names:
+                        try:
+                            tried_text = _tool_text(await client.call_tool("tried_actions", {}))
+                            tried_here = self._extract_tried_actions_for_current_location(tried_text)
+                        except Exception:
+                            tried_here = set()
+                    tried_here_cached = tried_here
+                    loc_key = _cur_loc_key()
+                    blocked_here = blocked_actions_by_loc.setdefault(loc_key, set())
+                    recommended_here = recommended_actions_by_loc.setdefault(loc_key, set())
+                    # candidates = not tried, not blocked, not already recommended (if possible)
+                    cands = [a for a in va_list
+                            if self._norm(a) not in tried_here
+                            and self._norm(a) not in blocked_here
+                            and self._norm(a) not in recommended_here]
+                    if not cands:
+                        # we relax the "recommended_here" constraint
+                        cands = [a for a in va_list
+                                if self._norm(a) not in tried_here
+                                and self._norm(a) not in blocked_here]
+                    if not cands:
+                        cands = va_list
+                    best_act = await self._choose_with_probe(client, cands, available_tool_names) if cands else None
+                    if best_act:
+                        recommended_here.add(self._norm(best_act))
+                    # Inject a recommendation BUT do not execute anything
+                    valid_actions_text = (
+                        valid_actions_text.strip()
+                        + ("\n\nAlready tried here:\n- " + "\n- ".join(sorted(tried_here)) if tried_here else "")
+                        + ("\n\nBlocked here:\n- " + "\n- ".join(sorted(blocked_here)) if blocked_here else "")
+                        + (f"\n\nRECOMMENDED NEXT (choose exactly ONE from valid_actions):\n- {best_act}" if best_act else "")
+                        + "\n\nSYSTEM: If recommended fails, choose a DIFFERENT action from valid_actions."
+                    )
+                except Exception as e:
+                    valid_actions_text = f"(valid_actions failed: {e})"
+                force_valid_actions_next = False
+            # Build helpers for planner inputs
+            va_list = self._extract_valid_actions(valid_actions_text) if valid_actions_text else []
+            tried_here = []
+            if tried_here_cached is not None:
+                tried_here = sorted(list(tried_here_cached))
+            elif "tried_actions" in available_tool_names:
+                try:
+                    tried_text = _tool_text(await client.call_tool("tried_actions", {}))
+                    tried_here = sorted(list(self._extract_tried_actions_for_current_location(tried_text)))
+                except Exception:
+                    tried_here = []
+            # Update synth memory
+            if should_summarize(step_idx, observation, state_obj):
+                recent = self.history[-8:]
+                recent_lines = "\n".join(
+                    f"- {h['args'].get('action','')} -> {(h['result'].splitlines()[0] if h['result'] else '')}"
+                    for h in recent if h.get("tool") == "play_action"
+                )
+                prompt = build_synth_prompt(self.synth_memory, recent_lines, state_obj or {})
+                try:
+                    txt = call_llm(prompt, SYNTH_SYSTEM, seed=seed + 10_000 + step_idx, max_tokens=350)
+                    new_mem = json.loads(txt)
+                    for k in ["facts", "blocking", "inventory_goals", "open_threads", "visited"]:
+                        if k not in new_mem or not isinstance(new_mem[k], list):
+                            new_mem[k] = []
+                    new_mem["last_update_move"] = int((state_obj or {}).get("moves", step_idx))
+                    self.synth_memory = new_mem
+                except Exception:
+                    pass
+            # Run planner after synth memory update
+            if self._planner_should_run(step_idx, observation, force_before):
+                self._run_planner_llm(
+                    observation=observation,
+                    state_obj=state_obj or {},
+                    valid_actions_list=va_list,
+                    tried_here_list=tried_here,
+                    seed=seed,
+                    step_idx=step_idx,
+                )
+            # Build prompt + call LLM
+            prompt = self._build_prompt(
+                observation=observation,
+                memory_text=memory_text,
+                map_text=map_text,
+                valid_actions_text=valid_actions_text,
+                hint_text=hint_text,
+            )
+            va_list = self._extract_valid_actions(valid_actions_text) if valid_actions_text else []
+            llm_response = self._call_llm(prompt=prompt, system_prompt=SYSTEM_PROMPT, seed=seed + step_idx)
+            thought, tool_name, tool_args = self._parse_response(llm_response)
+            tool_name = (tool_name or "").strip().lower()
+            if valid_actions_text:
+                va_list = self._extract_valid_actions(valid_actions_text)
+                if tool_name == "play_action":
+                    act = self._norm(tool_args.get("action", ""))
+                    if va_list and not self._is_allowed_exotic(act, va_list) and not self._is_canonical_action(act):
+                        # action non-canonique et pas dans valid_actions -> re-trigger recovery
+                        force_valid_actions_next = True
+            if force_valid_actions_next:
+                observation = "SYSTEM FEEDBACK: Non-canonical action not in valid_actions. Recomputing valid_actions."
+                continue
+            # avoid calling tried_actions/valid_actions twice in a row
+            if self.history:
+                last_tool = (self.history[-1].get("tool") or "").strip().lower()
+                if tool_name in {"tried_actions", "valid_actions"} and last_tool == tool_name:
+                    observation = (
+                        f"SYSTEM FEEDBACK: Do not call {tool_name} twice in a row. "
+                        "Choose ONE concrete play_action from the last valid_actions list, or move."
+                    )
+                    self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                    run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                    force_valid_actions_next = True
+                    continue
+            # Hard tool validation: if unknown tool, coach and continue without spending a move
+            if tool_name not in available_tool_names:
+                va_text = ""
+                if "valid_actions" in available_tool_names:
+                    try:
+                        va_text = _tool_text(await client.call_tool("valid_actions", {}))
+                    except Exception as e:
+                        va_text = f"(valid_actions failed: {e})"
+                observation = (
+                    "SYSTEM FEEDBACK: You requested an UNKNOWN TOOL.\n"
+                    f"Tool must be one of: {', '.join(sorted(available_tool_names))}.\n"
+                    "Use play_action with ARGS {\"action\": \"...\"} for game commands.\n\n"
+                    f"Suggested valid actions:\n{va_text}"
+                )
+                self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                force_valid_actions_next = True
+                continue
+            # Enforce args shape
+            if tool_name not in {"play_action", "checkpoint_save", "checkpoint_restore", "action_probe"}:
+                tool_args = {}
+            if tool_name in {"checkpoint_save", "checkpoint_restore"}:
+                if not isinstance(tool_args, dict):
+                    tool_args = {}
+                if "name" in tool_args and not isinstance(tool_args["name"], str):
+                    tool_args["name"] = "auto"
+            if tool_name == "action_probe":
+                if not isinstance(tool_args, dict):
+                    tool_args = {"action": "look"}
+                if not isinstance(tool_args.get("action",""), str) or not tool_args["action"].strip():
+                    tool_args["action"] = "look"
+            # Normalize / validate play_action command a bit
+            if tool_name == "play_action":
+                if not isinstance(tool_args, dict):
+                    # au lieu de "look"
+                    observation = await _force_valid_actions_feedback(
+                        "SYSTEM FEEDBACK: Invalid ARGS for play_action. Call valid_actions and pick ONE exact action."
+                    )
+                    self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                    run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                    force_valid_actions_next = True
+                    continue
+                raw_action = str(tool_args.get("action", "") or "").strip()
+                if not raw_action:
+                    # au lieu de "look"
+                    observation = await _force_valid_actions_feedback(
+                        "SYSTEM FEEDBACK: Missing action. Call valid_actions and pick ONE exact action."
+                    )
+                    self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                    run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                    force_valid_actions_next = True
+                    continue
+                action = self._normalize_action(raw_action) if hasattr(self, "_normalize_action") else raw_action.lower()
+                action = action.strip().lower()
+                if not action:
+                    observation = await _force_valid_actions_feedback(
+                        "SYSTEM FEEDBACK: Empty/invalid action after normalization. Call valid_actions and pick ONE exact action."
+                    )
+                    self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                    run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                    force_valid_actions_next = True
+                    continue
+                tool_args["action"] = action
+                # movement priority (in/up/down) when valid_actions known
+                if valid_actions_text and self._is_move(action):
+                    va_list = [self._norm(x) for x in self._extract_valid_actions(valid_actions_text)]
+                    if action in {"north","south","east","west"}:
+                        if any(m in va_list for m in {"in","up","down"}):
+                            observation = (
+                                "SYSTEM FEEDBACK: Movement priority: prefer in/up/down when available. "
+                                "Call valid_actions and pick one of those if present."
+                            )
+                            self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                            run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                            force_valid_actions_next = True
+                            continue
+                # If not canonical, allow only if it appears in valid_actions (when available).
+                # If we don't have valid_actions_text yet, fetch it once and decide.
+                if not self._is_canonical_action(action):
+                    va_text = valid_actions_text
+                    va_list: list[str] = self._extract_valid_actions(va_text) if va_text else []
+                    if not va_list and "valid_actions" in available_tool_names:
+                        try:
+                            va_text = _tool_text(await client.call_tool("valid_actions", {}))
+                            va_list = self._extract_valid_actions(va_text)
+                        except Exception:
+                            va_text = None
+                            va_list = []
+                    if not va_list or not self._is_allowed_exotic(action, va_list):
+                        # coach and force valid_actions next
+                        observation = (
+                            "SYSTEM FEEDBACK: Your command is not canonical and is not allowed unless it appears "
+                            "EXACTLY in valid_actions. Simplify (look / north / take X / open X / examine X / read X / etc.) "
+                            "or call valid_actions then pick ONE action from it.\n"
+                        )
+                        if va_text:
+                            observation += f"\nvalid_actions for this location:\n{va_text}"
+                        self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                        run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                        force_valid_actions_next = True
+                        continue
+                # Simple anti-repeat (soft): if same as last action and we got no new info previously, force valid actions
+                if self._last_action and action.strip().lower() == self._last_action:
+                    # Don’t block always; only if last observation looked identical-ish
+                    prev_short = (self.history[-1]["result"] if self.history else "")[:200].strip()
+                    cur_short = (observation or "")[:200].strip()
+                    if prev_short and prev_short == cur_short:
+                        observation = (
+                            "SYSTEM FEEDBACK: You are repeating the same action with no progress. "
+                            "Call valid_actions and choose ONE different action."
+                        )
+                        self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                        run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                        force_valid_actions_next = True
+                        continue
+            # per-location blocked actions (avoid "pull lever" spam etc.)
+            if tool_name == "play_action":
+                loc_key = _cur_loc_key()
+                blocked = blocked_actions_by_loc.setdefault(loc_key, set())
+                act_norm = self._norm(tool_args.get("action", ""))
+                if act_norm in blocked:
+                    observation = (
+                        "SYSTEM FEEDBACK: This action already produced no progress in this location. "
+                        "Do NOT repeat it. Call valid_actions and pick a different action (prefer the recommended one)."
+                    )
+                    self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+                    run_history.append((thought, f"{tool_name} {tool_args}", observation))
+                    force_valid_actions_next = True
+                    continue
+            if verbose:
+                print("\n" + "─" * 40)
+                print(f"Step {step_idx}/{max_steps}")
+                print("THOUGHT:", thought)
+                print("TOOL:", tool_name)
+                print("ARGS:", tool_args)
+            # Execute tool
+            prev_observation = observation
+            try:
+                res = await client.call_tool(tool_name, tool_args)
+                observation = _tool_text(res)
+                # immediate checkpoint on score gain (robust) ---
+                if tool_name == "play_action" and self._checkpoint_enabled and "checkpoint_save" in available_tool_names:
+                    delta_pts, total_pts = _parse_points_from_obs(observation)
+                    if delta_pts is not None and delta_pts > 0:
+                        low_obs = (observation or "").lower()
+                        if "game over" not in low_obs and "grue" not in low_obs:
+                            # prefer authoritative score from state if available
+                            cur_total = None
+                            try:
+                                if isinstance(state_obj, dict) and "score" in state_obj:
+                                    cur_total = int(state_obj["score"])
+                            except Exception:
+                                cur_total = None
+                            # fallback to parsed total if present
+                            if cur_total is None and total_pts is not None:
+                                cur_total = total_pts
+                            # monotone rule (only if we have a total); otherwise save anyway
+                            should_save = (cur_total is None) or (self._last_score_seen is None) or (cur_total > self._last_score_seen)
+                            if should_save:
+                                if cur_total is not None:
+                                    self._last_score_seen = cur_total
+                                try:
+                                    if verbose:
+                                        print(f"[DEBUG] CHECKPOINT_SAVE after +{delta_pts} (total={cur_total}) at step={step_idx}")
+                                    # 1) best
+                                    await client.call_tool("checkpoint_save", {"name": self._checkpoint_best})
+                                    # 2) loop-safe position
+                                    try:
+                                        await client.call_tool("checkpoint_save", {"name": self._checkpoint_loop})
+                                    except Exception:
+                                        pass
+                                except Exception as e:
+                                    if verbose:
+                                        print("[DEBUG] checkpoint_save after points failed:", e)
+            except Exception as e:
+                observation = f"Error calling tool {tool_name}: {e}"
+                if tool_name == "play_action":
+                    loc_key = _cur_loc_key()
+                    blocked_actions_by_loc.setdefault(loc_key, set()).add(self._norm(tool_args.get("action", "")))
+                force_valid_actions_next = True
+            if self._looks_like_parser_failure(observation):
+                force_valid_actions_next = True
+            if tool_name == "play_action":
+                loc_key = _cur_loc_key()
+                act_norm = self._norm(tool_args.get("action", ""))
+                rs = _result_short(observation)
+                if _is_no_progress_result(observation):
+                    blocked_actions_by_loc.setdefault(loc_key, set()).add(act_norm)
+            # Track location changes
+            loc = _extract_location(observation)
+            if loc:
+                self.locations_visited.add(loc)
+                loc_line = loc.strip().lower()
+                if self._last_room_line is None or loc_line != self._last_room_line:
+                    self._last_room_line = loc_line
+            # Loop detection: same play_action 3x with same short observation
+            if tool_name == "play_action":
+                act = str(tool_args.get("action", "")).strip().lower()
+                self._last_action = act
+                last_plays = [
+                    h for h in reversed(self.history)
+                    if h.get("tool") == "play_action" and isinstance(h.get("args"), dict)
+                ][:2]
+                if len(last_plays) == 2:
+                    last_act1 = str(last_plays[0]["args"].get("action", "")).strip().lower()
+                    last_act2 = str(last_plays[1]["args"].get("action", "")).strip().lower()
+                    cur_short = (observation or "").strip()[:200]
+                    last_obs1 = (last_plays[0].get("result") or "").strip()[:200]
+                    last_obs2 = (last_plays[1].get("result") or "").strip()[:200]
+                    if act and act == last_act1 == last_act2 and cur_short and cur_short == last_obs1 == last_obs2:
+                        observation = (
+                            "SYSTEM FEEDBACK: You repeated the same action 3 times with no new info. "
+                            "Stop repeating. Call valid_actions and pick ONE different action."
+                        )
+                        force_valid_actions_next = True
+            # No-op movement detection: movement but observation unchanged
+            if tool_name == "play_action":
+                a = str(tool_args.get("action", "")).strip().lower()
+                if a in {"north", "south", "east", "west", "up", "down", "in", "out"}:
+                    prev_short = (prev_observation or "").strip()[:200]
+                    cur_short = (observation or "").strip()[:200]
+                    if prev_short and cur_short == prev_short:
+                        force_valid_actions_next = True
+            if verbose:
+                print("\nRESULT:\n", observation)
+            # Save traces
+            self.history.append({"thought": thought, "tool": tool_name, "args": tool_args, "result": observation})
+            run_history.append((thought, f"{tool_name} {json.dumps(tool_args) if isinstance(tool_args, dict) else tool_args}", observation))
+            # Stop conditions
+            if "game over" in (observation or "").lower() or "*** game over ***" in (observation or "").lower():
+                break
+        # 3) Final Stats
+        # Prefer memory tool to get authoritative score/moves if available
+        final_score = best_score if best_score is not None else 0
+        moves = best_moves if best_moves is not None else 0
+        if "memory" in available_tool_names:
+            try:
+                mem = _tool_text(await client.call_tool("memory", {}))
+                s, m = _parse_score_moves_from_memory(mem)
+                if s is not None:
+                    final_score = s
+                if m is not None:
+                    moves = m
+            except Exception:
+                pass
+        game_completed = "game over" in (observation or "").lower()
+        max_score = 350 if game == "zork1" else 0  # keep simple; adjust later if you want per-game max scores
+        return RunResult(
+            final_score=final_score,
+            max_score=max_score,  # Zork1 max score, adjust if needed
+            moves=moves,
+            locations_visited=set(self.locations_visited),
+            game_completed=game_completed,
+            history=run_history,
+        )
+    def _build_prompt(
+            self,
+            observation: str,
+            memory_text: str | None = None,
+            map_text: str | None = None,
+            valid_actions_text: str | None = None,
+            hint_text: str | None = None,
+        ) -> str:
+        """
+        Build the prompt for the LLM.
+        """
+        parts: list[str] = []
+        if memory_text:
+            parts.append("Game memory (authoritative):")
+            parts.append(memory_text.strip())
+            parts.append("")
+        if map_text:
+            parts.append("Explored map:")
+            parts.append(map_text.strip())
+            parts.append("")
+        if valid_actions_text:
+            parts.append("Suggested valid actions (choose EXACTLY if you use one):")
+            parts.append(valid_actions_text.strip())
+            parts.append("")
+        if hint_text:
+            parts.append("Hint (non-spoiler):")
+            parts.append(hint_text.strip())
+            parts.append("")
+        # Short recent history: last 2 interactions
+        if getattr(self, "history", None):
+            last = self.history[-2:]
+            if last:
+                parts.append("Recent actions (most recent last):")
+                for h in last:
+                    tool = h.get("tool", "")
+                    args = h.get("args", {})
+                    # Keep result short to avoid prompt bloat
+                    res = (h.get("result") or "").strip().replace("\n", " ")
+                    if len(res) > 160:
+                        res = res[:160] + "..."
+                    parts.append(f"- {tool} {args} -> {res}")
+                parts.append("")
+        # Current observation always last
+        parts.append("Current observation:")
+        parts.append((observation or "").strip())
+        # Tiny nudges based on common patterns
+        low = (observation or "").lower()
+        if "contains:" in low:
+            parts.append("")
+            parts.append("Hint: If a container contains items, try 'take <noun>' using the exact noun shown.")
+        if "is closed" in low:
+            parts.append("")
+            parts.append("Hint: If something is closed, try 'open <noun>' using the exact noun shown.")
+        if "dark" in low:
+            parts.append("")
+            parts.append("Hint: If it is dark, prioritize finding/using a light source (take lamp, turn on lamp).")
+        parts.append("Synthesized memory (high signal):")
+        parts.append(json.dumps(self.synth_memory, ensure_ascii=False, indent=2))
+        parts.append("")
+        parts.append("Current objectives (highest priority first):")
+        parts.append(self.objman.render())  # short text
+        parts.append("")
+        # LLM planner suggestions (do not auto-execute)
+        if getattr(self, "_planner_suggested_actions", None):
+            parts.append("Planner suggestions (DO NOT auto-execute; pick one if sensible):")
+            for a in self._planner_suggested_actions[:3]:
+                parts.append(f"- {a}")
+            if getattr(self, "_planner_notes", ""):
+                parts.append(f"Planner notes: {self._planner_notes}")
+            parts.append("")
+        parts.append("")
+        parts.append("What do you do next? Remember the required output format.")
+        return "\n".join(parts)
+    def _parse_response(self, response: str) -> tuple[str, str, dict]:
+        """
+        Parse LLM response to extract thought, tool name, and arguments.
+        Returns:
+            Tuple of (thought, tool_name, args_dict)
+        """
+        thought = ""
+        tool_name = "play_action"
+        tool_args: dict = {"action": "look"}
+        if not response:
+            return ("", "play_action", {"action": "look"})
+        text = response.strip()
+        # Fast path: try regex extraction that works even with extra text/noise
+        thought_m = re.search(r"(?im)^\s*THOUGHT\s*:\s*(.+?)\s*$", text)
+        tool_m = re.search(r"(?im)^\s*TOOL\s*:\s*([a-zA-Z0-9_]+)\s*$", text)
+        args_m = re.search(r"(?im)^\s*ARGS\s*:\s*(\{.*\}|\[.*\]|.+?)\s*$", text)
+        if thought_m:
+            thought = thought_m.group(1).strip()
+        if tool_m:
+            tool_name = tool_m.group(1).strip().lower()
+        # Parse ARGS (best-effort)
+        raw_args = None
+        if args_m:
+            raw_args = args_m.group(1).strip()
+        # If ARGS line exists but JSON is on next lines, try to capture a JSON block
+        if raw_args is None:
+            # Try to find the first JSON object after "ARGS:"
+            idx = text.lower().find("args:")
+            if idx != -1:
+                tail = text[idx + 5 :].strip()
+                # If tail doesn't start with '{', try to find one
+                jstart = tail.find("{")
+                if jstart != -1:
+                    tail2 = tail[jstart:]
+                    # naive brace matching
+                    depth = 0
+                    end = None
+                    for i, ch in enumerate(tail2):
+                        if ch == "{":
+                            depth += 1
+                        elif ch == "}":
+                            depth -= 1
+                            if depth == 0:
+                                end = i + 1
+                                break
+                    if end is not None:
+                        raw_args = tail2[:end].strip()
+        if raw_args is not None:
+            try:
+                parsed = json.loads(raw_args)
+                if isinstance(parsed, dict):
+                    tool_args = parsed
+                else:
+                    # If model gave a list/string, treat as invalid
+                    tool_args = {"action": "look"} if tool_name == "play_action" else {}
+            except Exception:
+                tool_args = {"action": ""} if tool_name == "play_action" else {}
+        # Enforce schema expectations
+        if tool_name == "play_action":
+            if not isinstance(tool_args, dict):
+                tool_args = {"action": "look"}
+            action = tool_args.get("action", "")
+            if not isinstance(action, str) or not action.strip():
+                tool_args["action"] = "look"
+        elif tool_name in {"checkpoint_save", "checkpoint_restore"}:
+            if not isinstance(tool_args, dict):
+                tool_args = {}
+            # name is optional; server default = "auto"
+            if "name" in tool_args and not isinstance(tool_args["name"], str):
+                tool_args["name"] = "auto"
+        elif tool_name == "action_probe":
+            if not isinstance(tool_args, dict):
+                tool_args = {}
+            if not isinstance(tool_args.get("action", ""), str) or not tool_args["action"].strip():
+                tool_args["action"] = "look"
+        else:
+            tool_args = {}
+        return thought, tool_name, tool_args
+    def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
+        """
+        Call the LLM with the given prompt.
+        This is a convenience wrapper - you can also use call_llm() directly.
+        """
+        return call_llm(prompt, system_prompt, seed)
+    def _normalize_action(self, action: str) -> str:
+        """Soft normalizer: only for very safe rewrites."""
+        a = (action or "").strip().lower()
+        # "look around ..." -> "look"
+        if a.startswith("look around"):
+            return "look"
+        # common harmless variants -> "look"
+        if a in {"l", "look.", "look!", "look?"}:
+            return "look"
+        # "go north" -> "north" (only for cardinal/up/down)
+        if a.startswith("go "):
+            rest = a[3:].strip()
+            if rest in {
+                "north", "south", "east", "west",
+                "up", "down", "in", "out",
+                "northeast", "northwest", "southeast", "southwest",
+            }:
+                return rest
+        # "look at X" -> "examine X"
+        m = re.match(r"look at (.+)", a)
+        if m:
+            return f"examine {m.group(1).strip()}"
+        # "look X" -> "examine X" ONLY when it's not a "look <preposition> ..." form
+        m = re.match(r"^look\s+(.+)$", a)
+        if m:
+            tail = m.group(1).strip()
+            # Keep Zork-ish / parser forms like: look for/in/under/behind/through...
+            first = tail.split(" ", 1)[0]
+            if first in {"for", "in", "inside", "under", "behind", "through", "around", "over", "on", "at"}:
+                # note: "look at X" is handled above already, so here we just keep it
+                return a
+            # Otherwise: treat "look <noun>" as "examine <noun>"
+            if tail:
+                return f"examine {tail}"
+        return a
+    def _is_canonical_action(self, action: str) -> bool:
+        """True if action matches strict canonical grammar."""
+        a = (action or "").strip().lower()
+        # single-word commands
+        if a in {
+            "look", "inventory",
+            "north", "south", "east", "west",
+            "up", "down", "in", "out",
+            "northeast", "northwest", "southeast", "southwest",
+        }:
+            return True
+        # verb + noun (2-3 tokens max)
+        parts = a.split()
+        if len(parts) in (2, 3):
+            verb = parts[0]
+            if verb in {"take", "drop", "open", "examine", "read", "climb", "enter", "pull", "push", "unlock"}:
+                # forbid placeholders
+                if any(tok.startswith("<") and tok.endswith(">") for tok in parts[1:]):
+                    return False
+                return True
+        return False
+    def _is_allowed_exotic(self, action: str, valid_actions_list: list[str]) -> bool:
+        """Exotic commands are allowed only if they appear EXACTLY in valid_actions (spacing/case-tolerant)."""
+        if not action:
+            return False
+        a_norm = re.sub(r"\s+", " ", action.strip().lower())
+        for va in valid_actions_list:
+            va_norm = re.sub(r"\s+", " ", va.strip().lower())
+            if a_norm == va_norm:
+                return True
+        return False
+    def _looks_like_parser_failure(self, obs: str) -> bool:
+        """Detect common parser failure messages."""
+        if not obs:
+            return False
+        o = obs.lower()
+        triggers = [
+            "i don't know the word",
+            "that sentence isn't one i recognize",
+            "you used the word",
+            "there was no verb",
+            "i don't understand",
+            "you must tell me how to do that",
+            "you can't see any",
+        ]
+        return any(t in o for t in triggers)
+    def _extract_valid_actions(self, valid_actions_text: str) -> list[str]:
+        """
+        Parse MCP valid_actions output into a list of exact commands.
+        Supports formats like:
+        'Valid actions:\n- close mailbox\n- north\n...'
+        """
+        if not valid_actions_text:
+            return []
+        lines = [ln.strip() for ln in valid_actions_text.splitlines()]
+        actions: list[str] = []
+        for ln in lines:
+            if ln.startswith("- "):
+                actions.append(ln[2:].strip())
+        return actions
+    def _norm(self, s: str) -> str:
+        return re.sub(r"\s+", " ", (s or "").strip().lower())
+    def _is_move(self, action: str) -> bool:
+        return self._norm(action) in {
+            "north", "south", "east", "west", "up", "down", "in", "out",
+            "northwest", "northeast", "southwest", "southeast",
+        }
+    def _extract_tried_actions_for_current_location(self, tried_actions_text: str) -> set[str]:
+        """
+        Parse output of tried_actions() from the server and return the set of actions
+        already attempted in the current location (best-effort).
+        """
+        if not tried_actions_text:
+            return set()
+        cur = (self._last_room_line or "").strip().lower()
+        if not cur:
+            return set()
+        lines = tried_actions_text.splitlines()
+        # Look for a block:
+        # - <Location>:
+        #   - action
+        in_block = False
+        tried = set()
+        for ln in lines:
+            s = ln.rstrip("\n")
+            # Start of a location block
+            if re.match(r"^\-\s+.+:\s*$", s.strip()):
+                loc_name = s.strip()[2:-1].strip().lower()
+                in_block = (loc_name == cur)
+                continue
+            # Action lines in the block (format "  - xxx")
+            if in_block:
+                st = s.strip()
+                if st.startswith("- "):
+                    act = st[2:].strip().lower()
+                    if act:
+                        tried.add(self._norm(act))
+        return tried
+    def _rank_action_candidate(self, action: str) -> int:
+        """
+        Smaller is better. Gives a deterministic ranking for probing/choosing.
+        """
+        a = self._norm(action)
+        if a.startswith("take "): return 0
+        if a.startswith("open "): return 1
+        if a.startswith("unlock "): return 2
+        if a.startswith("enter "): return 3
+        if a in {"in", "up", "down"}: return 4
+        if a.startswith("read "): return 5
+        if a.startswith("examine "): return 6
+        if a in {"north","east","south","west","northeast","northwest","southeast","southwest"}: return 7
+        if a == "look": return 8
+        if a == "inventory": return 9
+        return 50
+    async def _choose_with_probe(
+        self,
+        client,
+        candidates: list[str],
+        available_tool_names: set[str],
+    ) -> str | None:
+        """
+        Use action_probe to select the best candidate.
+        Best = positive score_delta, else state_hash change, else first candidate.
+        Probes at most 2 actions to stay cheap.
+        """
+        if not candidates:
+            return None
+        if "action_probe" not in available_tool_names:
+            return candidates[0]
+        # Sort candidates by heuristic rank, then probe top 2
+        candidates_sorted = sorted(candidates, key=lambda x: self._rank_action_candidate(x))
+        to_probe = candidates_sorted[:2]
+        best = None
+        best_tuple = None  # (score_delta, hash_changed, reward_delta)
+        for act in to_probe:
+            try:
+                rep_raw = await client.call_tool("action_probe", {"action": act})
+                rep_txt = self._tool_text_any(rep_raw)
+                rep = json.loads(rep_txt) if rep_txt else {}
+                sd = int(rep.get("score_delta", 0) or 0)
+                rd = int(rep.get("reward_delta", 0) or 0)
+                hc = bool(rep.get("hash_changed")) # not perfect
+                tup = (sd, hc, rd)
+                if best_tuple is None or tup > best_tuple:
+                    best_tuple = tup
+                    best = act
+            except Exception:
+                continue
+        return best or candidates_sorted[0]
+    def _tool_text_any(self, res) -> str:
+        if res is None:
+            return ""
+        if isinstance(res, str):
+            return res
+        if isinstance(res, dict):
+            return json.dumps(res)
+        content = getattr(res, "content", None)
+        if content:
+            try:
+                if isinstance(content, list) and content and hasattr(content[0], "text"):
+                    return content[0].text or ""
+            except Exception:
+                pass
+        if isinstance(res, list) and res:
+            try:
+                if hasattr(res[0], "text"):
+                    return res[0].text or ""
+            except Exception:
+                pass
+        return str(res)
+    def _planner_should_run(self, step_idx: int, observation: str, force: bool) -> bool:
+        if force:
+            return True
+        if (step_idx - self._planner_last_step) < self._planner_cooldown:
+            return False
+        low = (observation or "").lower()
+        triggers = ["locked", "dark", "can't", "i don't know the word", "that sentence isn't one i recognize"]
+        return any(t in low for t in triggers) or (step_idx % self._planner_cooldown == 0)
+    def _filter_planner_actions(self, actions: list[str], valid_actions_list: list[str]) -> list[str]:
+        """
+        Keep only actions that are:
+        - canonical OR appear exactly in valid_actions_list (if provided)
+        - non-empty
+        """
+        out = []
+        va = [self._norm(x) for x in (valid_actions_list or [])]
+        for a in (actions or []):
+            a = (a or "").strip().lower()
+            if not a:
+                continue
+            if self._is_canonical_action(a):
+                out.append(a)
+                continue
+            # exotic allowed only if in valid_actions
+            if va and self._is_allowed_exotic(a, valid_actions_list):
+                out.append(a)
+        # max 3
+        return out[:3]
+    def _run_planner_llm(
+        self,
+        observation: str,
+        state_obj: dict,
+        valid_actions_list: list[str],
+        tried_here_list: list[str],
+        seed: int,
+        step_idx: int,
+    ) -> None:
+        prompt = build_planner_prompt(
+            observation=observation,
+            state_obj=state_obj or {},
+            synth_memory=self.synth_memory or {},
+            objectives_text=self.objman.render(),
+            valid_actions_list=valid_actions_list or [],
+            tried_here=tried_here_list or [],
+        )
+        try:
+            txt = call_llm(prompt, PLANNER_SYSTEM, seed=seed + 50_000 + step_idx, max_tokens=450)
+            plan = json.loads(txt)
+            llm_objs = plan.get("objectives", [])
+            if isinstance(llm_objs, list):
+                self.objman.replace_from_llm(llm_objs)
+            sugg = plan.get("suggested_actions", [])
+            if not isinstance(sugg, list):
+                sugg = []
+            self._planner_suggested_actions = self._filter_planner_actions(sugg, valid_actions_list)
+            self._planner_notes = str(plan.get("notes", "") or "")[:200]
+            self._planner_last_step = step_idx
+        except Exception:
+            # planner failure should be silent (don’t break run)
+            self._planner_suggested_actions = []
+            self._planner_notes = ""
+# =============================================================================
+# For local testing
+# =============================================================================
+async def test_agent():
+    """Test the agent locally."""
+    from fastmcp import Client
+    # Path to your MCP server
+    server_path = "mcp_server.py"
+    agent = StudentAgent()
+    async with Client(server_path) as client:
+        result = await agent.run(
+            client=client,
+            game="zork1",
+            max_steps=10,
+            seed=42,
+            verbose=True,
+        )
+        print(f"\nFinal Score: {result.final_score}")
+        print(f"Moves: {result.moves}")
+        print(f"Locations: {result.locations_visited}")
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(test_agent())

app.py ADDED Viewed

	@@ -0,0 +1,36 @@

+"""
+Hugging Face Space - Text Adventure Agent Submission
+This is a code-only Space for submitting your agent implementation.
+The evaluation is run separately.
+Files in this submission:
+- agent.py: Your ReAct agent implementation
+- mcp_server.py: Your MCP server implementation
+- requirements.txt: Additional dependencies
+To test locally:
+    fastmcp dev mcp_server.py
+    python agent.py
+"""
+import gradio as gr
+from pathlib import Path
+# Create the Gradio interface
+with gr.Blocks(title="Text Adventure Agent Submission") as demo:
+    gr.Markdown("# Text Adventure Agent Submission")
+    gr.Markdown(
+        "This Space contains a template submission for the Text Adventure Agent assignment. "
+    )
+    gr.Markdown(
+        "---\n"
+        "**Note:** This is a code submission Space. "
+        "Evaluation is performed using the evaluation script.\n\n"
+        "[Back to main assignment page](https://huggingface.co/spaces/LLM-course/Agentic-zork)"
+    )
+if __name__ == "__main__":
+    demo.launch()

explanations.md ADDED Viewed

	@@ -0,0 +1,379 @@

+# Building a Reliable MCP Agent for Zork-Style Text Adventures
+Text adventures sound trivial: you read a paragraph, type a command, get a new paragraph.
+But once you put an LLM in that loop, you learn quickly that the hardest enemies aren’t in the dungeon—they’re in the interface.
+What kills most LLM agents in Zork-like games is a predictable set of failure modes:
+- **Parser brittleness**: the game rejects slightly-wrong phrasing.
+- **Looping**: the model repeats actions, rooms, or “no-op” moves.
+- **Move budget waste**: doing “admin” actions that consume moves.
+- **Prompt bloat**: raw history gets too long and too noisy.
+- **Goal drift**: the model forgets what it was trying to do.
+Some of these ideas are also exposed in the article "TextQuests: How Good are LLMs at Text-Based Video Games?" (https://arxiv.org/pdf/2507.23701) namely memory, coherence and planning.
+So we didn’t build “a prompt.” We built a **system** with two main components:
+- an **MCP server** that exposes the game through robust tools and instrumentation
+- and an **agent** that treats the LLM as one component among others (memory, planning, recovery policies)
+Our focus was on the previous failure modes, and how to design around them with tools and guardrails.
+This is a high-level tour of the approach, focusing on the big ideas, without getting into implementation details.
+The code is available in the HuggingFace space:
+---
+## The Setup: Two Pieces, One Loop
+### 1) `mcp_server.py` — the game adapter + instrumentation layer
+The MCP server acts like the game interface for the agent. It:
+- owns the environment (`TextAdventureEnv`)
+- runs commands  (`play_action`)
+- tracks exploration metadata (rooms, transitions, tried actions)
+- exposes tools that help reasoning **without spending moves**
+- provides safety mechanisms like checkpoints and action simulation
+### 2) `agent.py` — the policy engine + ReAct decision-maker
+The agent:
+- outputs strict **ReAct** steps (THOUGHT -> TOOL -> ARGS)
+- can only interact via MCP tools (never “talks to the game” directly)
+- uses guardrails to keep the LLM from hallucinating tools/commands, looping, spamming, etc.
+- uses *two additional LLM calls* as specialized modules:
+  - **memory compression** (long-term, high-signal memory)
+  - **objective planning** (goal updates + suggested next actions)
+---
+## Why “Tooling” is important: The MCP Server as a Game Interface
+A Zork parser is not a friendly API. If the model invents commands like *“look around carefully”*, the game will often respond with something like:
+> “That sentence isn’t one I recognize.”
+If you only expose `play_action`, the agent becomes a guessing machine.
+So the MCP server provides a richer interface that makes the world “legible”:
+- **Structured state** (score, moves, inventory, room, “done”, a stable hash)
+- **Inventory without spending a move**
+- **Valid actions** (best-effort list) for recovery
+- **A map/graph** of explored rooms and transitions
+- **Actions tried per room** to avoid repeating
+- **Checkpoints** to rollback after loops or risky moves
+- **Action probing** (simulate before committing)
+This set of tools is what turns the text game into something the agent can navigate reliably.
+---
+# Part 1 — The MCP Server: Turning a Game into a Usable API
+## The Server’s Core Idea: Track More Than the Game Tracks
+The environment gives you:
+- observation text
+- score/moves (usually)
+- maybe inventory (depending on wrapper)
+But it *doesn’t* give you the extra structure an agent needs to be efficient:
+- Where have I been?
+- What did I already try here?
+- How do rooms connect?
+- Am I stuck in a loop?
+So the server maintains that meta-state itself:
+- a short **history** of actions and results
+- a set of **locations** (rooms) discovered
+- a **transition graph** (`room --action--> room`)
+- an index of **actions tried per location**
+- checkpoint snapshots for rollback
+- a stable-ish **state hash** used to detect loops
+This is *not* just logging. It becomes actionable tool output the agent can rely on.
+---
+## Room Awareness: The Small Heuristic That Makes Everything Work
+Most downstream reasoning depends on “what room am I in?”
+The server uses a heuristic to extract the room title from the observation:
+- pick the first plausible “header-like” line
+- ignore copyright/revision boilerplate
+- ignore long narrative sentences
+This matters because room identity powers:
+- mapping
+- “tried actions” grouping
+- loop detection context
+- objective tracking (“return to grating”, “open mailbox”, etc.)
+If you don’t have stable room identity, the agent’s memory becomes confused.
+---
+## The Minimal but Critical Tools
+### `play_action(action)`
+The main interaction tool:
+- runs the command
+- returns the observation
+- appends optional “+points” signals and “GAME OVER”
+- never crashes the tool (so the run doesn’t die on edge cases)
+This tool is deliberately boring—but highly reliable.
+### `inventory()`
+A huge move-saver: it returns inventory **without advancing the game**.
+In text adventures, calling `inventory` as a game command costs a move in many setups, so treating inventory as a *tool query* is a big advantage.
+### `memory()`
+A compact summary tool that provides “authoritative state”:
+- location
+- score/moves
+- recent action heads
+- last observation
+It’s a sanity anchor when the agent gets confused.
+### `valid_actions()`
+An helpful tool when stuck:
+- tries to fetch the actual valid actions if the environment exposes them
+- otherwise falls back to a canonical action menu
+The agent uses it sparingly—only when stuck or after parser failures.
+### `tried_actions()`
+The anti-loop tool:
+- returns actions already attempted in each room
+- helps the agent choose *new* high-value actions instead of repeating `open mailbox` 10 times
+### `get_map()` and `graph()`
+These expose exploration as:
+- a human-readable map (for prompts)
+- a structured JSON graph (for future logic/visualization)
+Mapping gives the agent an explicit “where have I been?” memory that the LLM doesn’t have to hallucinate.
+---
+## Guardrail Tools That Make the System Feel "Serious"
+### Checkpoints (`checkpoint_save`, `checkpoint_restore`)
+Checkpoints are a reliability hack with real impact:
+- if the agent detects a loop or makes a catastrophic move, it can rollback
+- we keep at least one “loop” checkpoint as a stable anchor
+- we can also maintain a “best” checkpoint after scoring gains
+This transforms the exploration strategy:
+- you can take risks, because you can recover
+### `action_probe(action)` — action simulation without commitment
+This is one of the more original parts of the server.
+The idea:
+- save a snapshot
+- perform the action
+- record deltas (score, moves, hash, location changes)
+- restore the snapshot
+- restore tracking metadata too (so probing doesn’t poison history/map)
+It returns a compact JSON “what would happen if…?” report.
+This enables a surprisingly strong behavior:
+> choose between candidates without committing a move (when snapshot/restore succeeds)
+We keep it cheap (probe only a couple of actions) but it’s an excellent tie-breaker when stuck.
+---
+# Part 2 — The Agent: ReAct, But Constrained and Safe
+## Strict ReAct as a Contract (Not a Style)
+The agent uses a strict format:
+- THOUGHT: one short sentence
+- TOOL: one of the allowed tool names
+- ARGS: valid JSON
+That format is useful for stability:
+- the agent becomes machine-parseable
+- tool calls are consistent
+---
+## Important Policy: Command Grammar Discipline
+Text adventure parsers punish creativity.
+So the agent enforces a tight grammar:
+- movement is single-word: `north`, `in`, `up`, …
+- interaction is short verb+noun: `open mailbox`, `take lamp`, …
+- exotic multiword commands are allowed **only if** they appear exactly in `valid_actions`
+That last rule is a big deal:
+- it prevents the LLM from inventing fancy commands
+- it converts “language” into “API calls”
+- it makes the agent much more robust across seeds
+---
+## The Agent’s Guardrails: How We Stop Thrashing
+Here are the big guardrail categories (conceptually, not line-by-line):
+### 1) Tool validation
+If the model requests an unknown tool:
+- we don’t execute it
+- we inject feedback listing allowed tools
+- we force recovery behavior next
+### 2) Parser failure detection
+If the observation looks like a parser error (“I don’t know the word…”, “sentence isn’t recognized”):
+- we switch into recovery mode
+- we fetch valid actions (once)
+- we force a simpler action selection
+### 3) Anti-repeat behavior (local)
+We track:
+- the last action
+- actions blocked in the current room
+- actions tried in the current room
+If the model repeats a no-progress action:
+- we refuse it
+- we force a new choice
+### 4) Loop detection (global)
+The agent uses the server’s `state_hash`:
+- if the same hash repeats several times, we’re looping
+Then we can:
+- restore a checkpoint
+- re-orient with `look`
+- switch strategy
+### 5) Movement bias (Zork-specific optimization)
+When multiple movement options exist:
+- “in / up / down” tend to unlock deeper progress
+- cardinal directions tend to be broad exploration
+So we bias toward `in/up/down` (especially after seeing them in valid actions).
+It’s a small heuristic that often pays off.
+---
+## Two Specialized LLM Modules: Memory and Planning
+This is where the project becomes more than a typical ReAct agent.
+### Specialized module #1: Memory Compression (Long-Term Memory)
+Raw history is short-term memory. It’s verbose, expensive, and noisy.
+So we maintain a **synthesized memory JSON**, updated periodically by an LLM whose only job is to compress experience into decision-useful facts:
+- durable facts learned
+- obstacles + what is needed
+- what items/tools to search for
+- open threads worth returning to
+- important visited places
+We keep it:
+- short
+- deduplicated
+- structured
+- bounded (so it doesn’t explode)
+If that LLM call fails or returns invalid JSON:
+- we simply skip the update
+- the run continues safely
+The goal is to make the agent stay coherent over long runs.”
+### Specialized module #2: Objective Planning (Goal Management)
+Action selection is short-horizon.
+But Zork requires long-horizon intent.
+So we run a separate “planner” LLM that:
+- updates objectives (explore, open, unlock, acquire key/lamp, return somewhere)
+- proposes up to a few suggested next actions
+- provides short evidence
+Crucially:
+- planner suggestions are **not auto-executed**
+- they are injected into the prompt as guidance
+- the main ReAct decision still chooses the next tool/action
+This separation reduces goal drift:
+- the agent behaves like it has a mental TODO list
+- and doesn’t wander aimlessly as often
+---
+## Deterministic Overrides: Sometimes We Don’t Ask the LLM
+Some policies are too important to leave to “model mood.”
+Example: **treasure acquisition**
+If we see obvious treasure nouns in visible objects:
+- we immediately `take <item>`
+- no debate, no planning, no cleverness
+---
+## Checkpoints as a Strategy, Not Just a Feature
+The agent uses checkpoints like a game speedrunner would:
+- keep a “loop” checkpoint as a stable anchor
+- save a “best” checkpoint after scoring gains
+That means:
+- progress is protected
+- exploration can be more aggressive
+- loop recovery is fast
+It’s a pragmatic way to make the system resilient under a move budget.
+---
+# What You Get From This Approach
+Compared to a vanilla “LLM + play_action” loop, this system is:
+- **more reliable** (fewer parser deaths, fewer infinite loops)
+- **more efficient** (less move waste, less repeated actions)
+- **more scalable** (memory doesn’t balloon)
+- **more coherent** (objectives keep the agent on track)
+- **more intentional** (action_probe and valid_actions are used strategically)
+---
+## Final Takeaway
+Text adventures punish the exact things LLMs love:
+- improvisation in language
+- repetition
+- vague intent
+- verbose context
+So we respond with the opposite:
+- strict grammar
+- structured state
+- explicit recovery
+- bounded but long term memory
+- deliberate planning
+---
+# Evaluations
+The evaluation has been made on 200 steps and 2 seeds, using lostpig and zork1 as test games.
+# Potential Improvements
+- **Navigation tool** — a `go_to(location)` tool that uses the transition graph to find a sequence of moves to go from the current location to the target location (with a BFS algorithm for example) and apply them automatically instead of letting the LLM guessing the path. The agent could reduce move waste and improve reliability.

mcp_server.py ADDED Viewed

	@@ -0,0 +1,819 @@

+"""
+Student MCP Server for Text Adventure Games
+This is your MCP server submission. Implement the tools that your agent
+will use to play text adventure games.
+Required tool:
+    play_action(action: str) -> str
+        Execute a game command and return the result.
+Recommended tools:
+    memory() -> str
+        Return current game state, score, and recent history.
+    inventory() -> str
+        Return the player's current inventory.
+    get_map() -> str
+        Return a map of explored locations.
+Test your server with:
+    fastmcp dev submission_template/mcp_server.py
+Then open the MCP Inspector in your browser to test the tools interactively.
+"""
+import sys
+import os
+import re
+from collections import defaultdict
+import json
+import hashlib
+from copy import deepcopy
+# Add parent directory to path to import games module
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from fastmcp import FastMCP
+from games.zork_env import TextAdventureEnv
+# =============================================================================
+# Create the MCP Server
+# =============================================================================
+mcp = FastMCP("Student Text Adventure Server")
+# =============================================================================
+# Game State Management
+# =============================================================================
+class GameManager:
+    """
+    Manages the text adventure game state.
+    TODO: Extend this class to track:
+    - Action history (for memory tool)
+    - Explored locations (for mapping)
+    - Current score and moves
+    """
+    def __init__(self):
+        self.env: TextAdventureEnv = None
+        self.state = None
+        self.game_name: str = ""
+        # History
+        self.max_history = 50  # Max number of recent actions to store
+        self.history: list[tuple[str, str]] = []
+        # checkpoints
+        self.checkpoints = {}  # name -> opaque state snapshot
+        self.last_reward = 0
+        # Map tracking
+        self.locations = set()  # Set of explored locations
+        self.current_location: str | None = None
+        # Transions
+        self.transitions = defaultdict(dict)  # location -> action -> new_location
+        # Action tracking
+        self.actions_tried_by_location = defaultdict(list) # location -> list of actions tried
+        self._actions_tried_set = defaultdict(set)
+    def initialize(self, game: str = "zork1"):
+        """Initialize or reset the game."""
+        self.game_name = game
+        self.env = TextAdventureEnv(game)
+        self.state = self.env.reset()
+        # reset tracking
+        self.history = []
+        self.locations = set()
+        self.transitions = defaultdict(dict)
+        self.actions_tried_by_location = defaultdict(list)
+        self._actions_tried_set = defaultdict(set)
+        # set initial location
+        obs = (self.state.observation or "")
+        self.current_location = self._extract_location(obs)
+        if self.current_location:
+            self.locations.add(self.current_location)
+        return obs
+    def step(self, action: str) -> str:
+        """Execute an action and return the result."""
+        if self.env is None:
+            self.initialize()
+        action_clean = (action or "").strip().lower()
+        from_location = self.current_location
+        # track "action tried" in the room
+        if from_location and action_clean not in self._actions_tried_set[from_location]:
+            self.actions_tried_by_location[from_location].append(action_clean)
+            self._actions_tried_set[from_location].add(action_clean)
+        #Execute the requested action
+        self.state = self.env.step(action)
+        raw_obs = self.state.observation or ""
+        # No forced look: avoid consuming extra moves
+        result_obs = raw_obs
+        # Track history (single action only)
+        self.history.append((action, result_obs))
+        # Cap history
+        while len(self.history) > self.max_history:
+            self.history.pop(0)
+        # update last reward
+        try:
+            self.last_reward = getattr(self.state, "reward", 0) or 0
+        except Exception:
+            self.last_reward = 0
+        # Track locations + transitions using the best available observation (result_obs)
+        new_location = self._extract_location(result_obs)
+        if new_location:
+            self.locations.add(new_location)
+            # Record transition only if location actually changed
+            if from_location and new_location != from_location:
+                # store canonical mapping: from -> action -> to (overwrite is OK)
+                self.transitions[from_location][action_clean] = new_location
+            # update current location
+            self.current_location = new_location
+        return result_obs
+    def _extract_location(self, observation: str) -> str | None:
+        """Extract the current location name from the observation text."""
+        # This is a heuristic that works for Zork and similar games where the location is in ALL CAPS at the start
+        if not observation:
+            return None
+        for line in observation.splitlines():
+            s = line.strip()
+            if not s:
+                continue
+            low = s.lower()
+            # filter common non-room headers / system lines
+            if low.startswith("copyright"):
+                continue
+            if "trademark" in low:
+                continue
+            if low.startswith("revision"):
+                continue
+            if low.startswith("serial number"):
+                continue
+            if "revision" in low and "serial" in low:
+                continue
+            # room titles in Zork are typically short and NOT full sentences
+            if len(s) > 50:
+                continue
+            if s.endswith((".", "!", "?", ":", ";")):
+                continue
+            # also avoid lines that look like status messages
+            bad_starts = (
+                "you ", "it ", "i ", "there ", "the ", "a ", "an ",
+                "what ", "can't ", "i don't", "unknown", "error"
+            )
+            if low.startswith(bad_starts):
+                continue
+            return s
+        return None
+    def get_memory(self, last_k: int = 10) -> str:
+        """Return a short summary of state + recent history."""
+        loc = self.current_location or "Unknown"
+        score = self.get_score()
+        moves = self.get_moves()
+        obs = (self.state.observation or "").strip() if self.state else ""
+        recent = self.history[-last_k:] if self.history else []
+        if recent:
+            recent_lines = "\n".join(
+                f"- {a} -> {(o.splitlines()[0] if o else '')}"
+                for a, o in recent
+            )
+        else:
+            recent_lines = "(none)"
+        return (
+            f"Game: {self.game_name}\n"
+            f"Location: {loc}\n"
+            f"Score: {score}\n"
+            f"Moves: {moves}\n\n"
+            f"Recent actions:\n{recent_lines}\n\n"
+            f"Last observation:\n{obs}"
+        )
+    def get_score(self) -> int:
+        """Get current score."""
+        return self.state.score if self.state else 0
+    def get_moves(self) -> int:
+        """Get number of moves taken."""
+        return self.state.moves if self.state else 0
+    def get_map(self) -> str:
+        """Return a simple text map of explored locations with action-labeled transitions."""
+        if not self.locations:
+            return "No locations explored yet."
+        lines = [f"Current location: {self.current_location or 'Unknown'}", ""]
+        lines.append("Explored locations:")
+        for loc in sorted(self.locations):
+            lines.append(f"- {loc}")
+        lines.append("")
+        lines.append("Transitions (from --action--> to):")
+        any_edge = False
+        for frm in sorted(self.transitions.keys()):
+            for act, to in sorted(self.transitions[frm].items()):
+                any_edge = True
+                lines.append(f"- {frm} --{act}--> {to}")
+        if not any_edge:
+            lines.append("- (none yet)")
+        return "\n".join(lines)
+    def _item_name(self, item) -> str:
+        """Best-effort: extract a human-friendly name from a Jericho item object."""
+        for attr in ("name", "label", "noun", "text"):
+            v = getattr(item, attr, None)
+            if isinstance(v, str) and v.strip():
+                return v.strip()
+        s = str(item)
+        m = re.search(r"Obj\d+:\s*([^\s]+)", s)
+        if m:
+            return m.group(1)
+        return s.strip() if s.strip() else "unknown"
+    def get_inventory(self) -> str:
+        """
+        Return inventory WITHOUT advancing the game (does not call env.step).
+        If state.inventory doesn't exist, returns a fallback message.
+        """
+        if not self.state:
+            return "Inventory not available (game not initialized)."
+        inv = getattr(self.state, "inventory", None)
+        # Case 0: inventory exposed as a string
+        if isinstance(inv, str):
+            return inv.strip() if inv.strip() else "You are not carrying anything."
+        # Case 1: inventory exposed as list/tuple of objects
+        if isinstance(inv, (list, tuple)):
+            if len(inv) == 0:
+                return "You are not carrying anything."
+            pretty = [self._item_name(x) for x in inv]
+            return "You are carrying:\n" + "\n".join(f"- {name}" for name in pretty)
+        return "Inventory not available from state (no state.inventory)."
+    def get_valid_actions(self, max_actions: int = 30) -> str:
+        try:
+            # Option A: wrapper exposes it
+            if self.env is not None and hasattr(self.env, "get_valid_actions"):
+                valid = self.env.get_valid_actions()
+            # Option B: underlying Jericho env
+            elif self.env is not None and hasattr(self.env, "env") and hasattr(self.env.env, "get_valid_actions"):
+                valid = self.env.env.get_valid_actions()
+            else:
+                valid = None
+            if isinstance(valid, (list, tuple)) and valid:
+                valid = [str(v) for v in valid][:max_actions]
+                return "Valid actions:\n" + "\n".join(f"- {v}" for v in valid)
+        except Exception:
+            pass
+        return (
+            "Valid actions (fallback):\n"
+            "- look\n- inventory\n- north/south/east/west/up/down/in/out\n"
+            "- take <noun>\n- drop <noun>\n- open <noun>\n- examine <noun>\n- read <noun>\n"
+        )
+    def get_actions_tried(self, limit_per_room: int = 50) -> str:
+        """Return actions tried per location (most recent last)."""
+        if not self.actions_tried_by_location:
+            return "No actions tracked yet."
+        lines = [
+            f"Current location: {self.current_location or 'Unknown'}",
+            "",
+            "Actions tried by location:",
+        ]
+        for loc in sorted(self.actions_tried_by_location.keys()):
+            acts = self.actions_tried_by_location[loc]
+            if not acts:
+                continue
+            shown = acts[-limit_per_room:]
+            lines.append(f"- {loc}:")
+            for a in shown:
+                lines.append(f"  - {a}")
+        return "\n".join(lines)
+    def _snapshot(self):
+        """
+        Best-effort snapshot. Tries env/state native methods if available, else deepcopies state.
+        """
+        if self.env is None:
+            return None
+        # 1) Native env snapshot if exists
+        for obj in (self.env, getattr(self.env, "env", None)):
+            if obj is None:
+                continue
+            if hasattr(obj, "get_state") and callable(obj.get_state):
+                try:
+                    return ("native", obj.get_state())
+                except Exception:
+                    pass
+        # 2) Fallback: deepcopy state object (works often, not always)
+        try:
+            return ("deepcopy", deepcopy(self.state))
+        except Exception:
+            # 3) Last resort: keep nothing (restore impossible)
+            return ("none", None)
+    def _restore_snapshot(self, snap):
+        """
+        Best-effort restore snapshot created by _snapshot().
+        """
+        if self.env is None or snap is None:
+            return False
+        kind, payload = snap
+        if kind == "native":
+            for obj in (self.env, getattr(self.env, "env", None)):
+                if obj is None:
+                    continue
+                if hasattr(obj, "set_state") and callable(obj.set_state):
+                    try:
+                        obj.set_state(payload)
+                        # re-sync wrapper state if needed
+                        if hasattr(self.env, "state"):
+                            try:
+                                self.state = self.env.state
+                            except Exception:
+                                pass
+                        return True
+                    except Exception:
+                        pass
+            return False
+        if kind == "deepcopy":
+            try:
+                self.state = payload
+                # If wrapper uses internal state, try to set it too
+                if hasattr(self.env, "state"):
+                    try:
+                        self.env.state = payload
+                    except Exception:
+                        pass
+                return True
+            except Exception:
+                return False
+        return False
+    def _state_hash(self) -> str:
+        """
+        Stable-ish hash to detect loops. Prefer env-provided hash; else hash observation+inv+loc+score+moves.
+        """
+        # If Jericho exposes something like state.hash or env.get_world_state_hash, use it (best-effort).
+        for obj in (self.state, self.env, getattr(self.env, "env", None)):
+            if obj is None:
+                continue
+            for attr in ("hash", "state_hash", "world_hash"):
+                if hasattr(obj, attr):
+                    try:
+                        v = getattr(obj, attr)
+                        if callable(v):
+                            v = v()
+                        if isinstance(v, (str, int)):
+                            return str(v)
+                    except Exception:
+                        pass
+        loc = self.current_location or ""
+        obs = (getattr(self.state, "observation", "") or "")
+        score = self.get_score()
+        moves = self.get_moves()
+        inv = getattr(self.state, "inventory", None)
+        inv_str = ""
+        if isinstance(inv, str):
+            inv_str = inv
+        elif isinstance(inv, (list, tuple)):
+            inv_str = "|".join(self._item_name(x) for x in inv)
+        payload = f"{loc}\n{score}\n{moves}\n{inv_str}\n{obs[:500]}"
+        return hashlib.sha1(payload.encode("utf-8", errors="ignore")).hexdigest()
+    def _extract_visible_objects_heuristic(self, observation: str) -> list[str]:
+        """
+        Heuristic object noun extraction. Not perfect but useful.
+        Keeps short nouns; removes stopwords; favors known Zork-ish interactables.
+        """
+        if not observation:
+            return []
+        obs = observation.lower()
+        # quick whitelist of common objects
+        common = [
+            "mailbox","leaflet","door","window","grating","lamp","lantern","sword","knife",
+            "trapdoor","chest","box","table","rug","mat","rope","key","keys","bottle","water",
+            "egg","nest","tree","stairs","staircase","gate"
+        ]
+        found = [w for w in common if w in obs]
+        # de-dup
+        out = []
+        seen = set()
+        for x in found:
+            if x not in seen:
+                out.append(x)
+                seen.add(x)
+        return out
+    def get_state_struct(self) -> dict:
+        obs = (getattr(self.state, "observation", "") or "")
+        inv = getattr(self.state, "inventory", None)
+        inv_list = []
+        if isinstance(inv, str):
+            # can't parse reliably => keep as one string
+            inv_list = [inv.strip()] if inv.strip() else []
+        elif isinstance(inv, (list, tuple)):
+            inv_list = [self._item_name(x) for x in inv]
+        return {
+            "game": self.game_name,
+            "location": self.current_location or "Unknown",
+            "score": self.get_score(),
+            "moves": self.get_moves(),
+            "done": bool(getattr(self.state, "done", False)) if self.state else False,
+            "last_reward": int(getattr(self, "last_reward", 0) or 0),
+            "state_hash": self._state_hash(),
+            "inventory": inv_list,
+            "visible_objects": self._extract_visible_objects_heuristic(obs),
+            "last_observation": obs,
+        }
+# Global game manager
+_game = GameManager()
+def get_game() -> GameManager:
+    """Get or initialize the game manager."""
+    global _game
+    if _game.env is None:
+        # Get game from environment variable (set by evaluator)
+        game = os.environ.get("GAME", "zork1")
+        _game.initialize(game)
+    return _game
+# =============================================================================
+# MCP Tools - IMPLEMENT THESE
+# =============================================================================
+@mcp.tool()
+def play_action(action: str) -> str:
+    """
+    Execute a game command and return the result.
+    This is the main tool for interacting with the game.
+    Args:
+        action: The command to execute (e.g., "north", "take lamp", "open mailbox")
+    Returns:
+        The game's response to the action
+    Valid commands include:
+        - Movement: north, south, east, west, up, down, enter, exit
+        - Objects: take <item>, drop <item>, open <thing>, examine <thing>
+        - Other: look, inventory, read <thing>, turn on lamp
+    """
+    game = get_game()
+    # Basic validation / normalization
+    action = (action or "").strip()
+    if not action:
+        return "I didn't receive an action. Try: look, north, open mailbox, take lamp."
+    # Execute
+    result = game.step(action)
+    # Optional: append score deltas + game over
+    try:
+        reward = getattr(game.state, "reward", 0) or 0
+        score = getattr(game.state, "score", None)
+        done = bool(getattr(game.state, "done", False))
+        if reward and score is not None and reward > 0:
+            result += f"\n\n+{reward} points! (Total: {score})"
+        if done:
+            result += "\n\n*** GAME OVER ***"
+    except Exception:
+        # Never crash the tool — keep returning the observation
+        pass
+    return result
+@mcp.tool()
+def memory() -> str:
+    """
+   Return a compact summary of the current game state:
+    location, score, moves, recent history, last observation.
+    """
+    game = get_game()
+    return game.get_memory(last_k=10)
+@mcp.tool()
+def get_map() -> str:
+    """
+   Return a simple map of explored locations + known transitions.
+    """
+    game = get_game()
+    return game.get_map()
+@mcp.tool()
+def inventory() -> str:
+    """
+   Return the player's inventory WITHOUT advancing the game.
+    """
+    game = get_game()
+    return game.get_inventory()
+@mcp.tool()
+def valid_actions() -> str:
+    """
+   Return a list of likely valid actions (best-effort).
+    """
+    game = get_game()
+    return game.get_valid_actions(max_actions=30)
+@mcp.tool()
+def tried_actions() -> str:
+    """
+   Return actions tried, grouped by location, to avoid loops.
+    """
+    game = get_game()
+    return game.get_actions_tried(limit_per_room=50)
+@mcp.tool()
+def hint() -> str:
+    """
+   Get non-spoiler hints based on the current observation/inventory/location.
+    """
+    game = get_game()
+    observation = (getattr(game.state, "observation", "") or "")
+    obs = observation.lower()
+    loc = (game.current_location or "").lower()
+    # Best-effort inventory WITHOUT advancing game
+    inv_lower = ""
+    inv = getattr(game.state, "inventory", None)
+    if isinstance(inv, str):
+        inv_lower = inv.lower()
+    elif isinstance(inv, (list, tuple)):
+        names = []
+        for item in inv:
+            try:
+                names.append(game._item_name(item).lower())
+            except Exception:
+                names.append(str(item).lower())
+        inv_lower = " ".join(names)
+    hints: list[str] = []
+    # Darkness / light
+    if ("dark" in obs) or ("pitch black" in obs) or ("dark" in loc):
+        hints.append("It is dangerous to move around in the dark. You need a light source.")
+        if "lamp" in inv_lower or "lantern" in inv_lower:
+            hints.append("You seem to have a lamp/lantern. Try turning it on if that action is available.")
+        else:
+            hints.append("If you see a lamp or lantern anywhere, pick it up immediately.")
+    # Window
+    if "window" in obs:
+        if "ajar" in obs or "open" in obs:
+            hints.append("An open/ajar window may be an entry point. Try 'enter window' or 'in' if allowed.")
+        else:
+            hints.append("A window often leads somewhere. Try 'open window' or examine it more closely.")
+    # Leaves
+    if "pile of leaves" in obs or "leaves" in obs:
+        hints.append("A pile of leaves often hides something. Try moving or taking them.")
+    # Grating
+    if "grating" in obs:
+        hints.append("A grating is usually a passage. Try opening or unlocking it, or inspect nearby objects.")
+    # Containers
+    containers = ["mailbox", "chest", "box", "container", "cabinet", "case", "sack"]
+    if any(w in obs for w in containers):
+        hints.append("Try opening containers. They often contain useful items.")
+    # Trees / climbing
+    if "tree" in obs or "trees" in obs:
+        hints.append("Trees may be climbable. Look for branches or try climbing if possible.")
+    if "climbable" in obs or "you can climb" in obs:
+        hints.append("Climbing may lead to new areas. Try climbing up or down if available.")
+    # Keys / weapons
+    if "key" in obs and "key" not in inv_lower:
+        hints.append("Keys are important. Pick it up if you can.")
+    if ("sword" in obs or "knife" in obs) and ("sword" not in inv_lower and "knife" not in inv_lower):
+        hints.append("A weapon may be useful later. Consider taking it.")
+    # Explicit possibility override (narration cues)
+    low_obs = observation.lower()
+    if "possible to climb down" in low_obs or "it is possible to climb down" in low_obs or "you can climb down" in low_obs:
+        hints.append("The narration says you can climb down here — try: 'down'.")
+    if "possible to climb up" in low_obs or "it is possible to climb up" in low_obs or "you can climb up" in low_obs:
+        hints.append("The narration says you can climb up here — try: 'up'.")
+    if "possible to enter" in low_obs or "it is possible to enter" in low_obs or "you can enter" in low_obs or "way in" in low_obs:
+        hints.append("The narration suggests an entry is possible — try: 'in'.")
+    if "way out" in low_obs or "possible to leave" in low_obs or "you can leave" in low_obs:
+        hints.append("The narration suggests an exit — try: 'out'.")
+    if not hints:
+        hints.append("If you feel stuck, call valid_actions and try 1–2 new high-value actions (take/open/enter/climb/pull).")
+        hints.append("Avoid repeating actions that produced no new information in the same location.")
+    return "Hints:\n" + "\n".join(f"- {h}" for h in hints)
+@mcp.tool()
+def state() -> str:
+    """
+    Structured state as JSON string.
+    """
+    game = get_game()
+    return json.dumps(game.get_state_struct(), ensure_ascii=False, indent=2)
+@mcp.tool()
+def exits() -> str:
+    """
+    Return possible movement actions from valid_actions (best-effort).
+    """
+    game = get_game()
+    va = game.get_valid_actions(max_actions=80)
+    moves = []
+    for line in va.splitlines():
+        line = line.strip()
+        if line.startswith("- "):
+            act = line[2:].strip().lower()
+            if act in {"north","south","east","west","up","down","in","out","northeast","northwest","southeast","southwest"}:
+                moves.append(act)
+    return json.dumps({"location": game.current_location or "Unknown", "exits": moves}, ensure_ascii=False, indent=2)
+@mcp.tool()
+def graph() -> str:
+    """
+    Return explored graph as JSON (nodes + edges).
+    """
+    game = get_game()
+    nodes = sorted(list(game.locations))
+    edges = []
+    for frm, d in game.transitions.items():
+        for act, to in d.items():
+            edges.append({"from": frm, "action": act, "to": to})
+    payload = {"current": game.current_location or "Unknown", "nodes": nodes, "edges": edges}
+    return json.dumps(payload, ensure_ascii=False, indent=2)
+@mcp.tool()
+def checkpoint_save(name: str = "auto") -> str:
+    """
+    Save an environment snapshot under 'name'.
+    """
+    game = get_game()
+    snap = game._snapshot()
+    game.checkpoints[name] = snap
+    ok = snap is not None and snap[0] != "none"
+    return json.dumps({"ok": bool(ok), "name": name, "kind": snap[0] if snap else "none"}, ensure_ascii=False, indent=2)
+@mcp.tool()
+def checkpoint_restore(name: str = "auto") -> str:
+    """
+    Restore a previously saved snapshot.
+    """
+    game = get_game()
+    snap = game.checkpoints.get(name)
+    ok = game._restore_snapshot(snap)
+    # re-derive location after restore
+    if ok and game.state:
+        game.current_location = game._extract_location(getattr(game.state, "observation", "") or "") or game.current_location
+        if game.current_location:
+            game.locations.add(game.current_location)
+    return json.dumps({"ok": bool(ok), "name": name}, ensure_ascii=False, indent=2)
+@mcp.tool()
+def action_probe(action: str) -> str:
+    """
+    Simulate an action: save -> step(action) -> capture -> restore.
+    Returns a JSON report without committing.
+    """
+    game = get_game()
+    snap = game._snapshot()
+    tracking_backup = {
+        "history": list(game.history),
+        "locations": set(game.locations),
+        "current_location": game.current_location,
+        "transitions": deepcopy(game.transitions),
+        "actions_tried_by_location": deepcopy(game.actions_tried_by_location),
+        "_actions_tried_set": deepcopy(game._actions_tried_set),
+        "last_reward": game.last_reward,
+    }
+    before = game.get_state_struct()
+    obs = game.step(action)
+    after = game.get_state_struct()
+    # attempt restore
+    restored = game._restore_snapshot(snap)
+    if restored and game.state:
+        game.current_location = game._extract_location(getattr(game.state, "observation", "") or "") or game.current_location
+    # restore tracking too (avoid probe side-effects)
+    game.history = tracking_backup["history"]
+    game.locations = tracking_backup["locations"]
+    game.current_location = tracking_backup["current_location"]
+    game.transitions = tracking_backup["transitions"]
+    game.actions_tried_by_location = tracking_backup["actions_tried_by_location"]
+    game._actions_tried_set = tracking_backup["_actions_tried_set"]
+    game.last_reward = tracking_backup["last_reward"]
+    report = {
+        "action": (action or "").strip(),
+        "ok": True,
+        "restored": bool(restored),
+        "reward_delta": int(after.get("last_reward", 0) or 0),
+        "score_delta": int(after.get("score", 0) - before.get("score", 0)),
+        "moves_delta": int(after.get("moves", 0) - before.get("moves", 0)),
+        "done": bool(after.get("done", False)),
+        "new_location": after.get("location"),
+        "state_hash": after.get("state_hash"),
+        "observation_head": (obs or "").strip().splitlines()[0] if (obs or "").strip() else "",
+        "hash_changed": before.get("state_hash") != after.get("state_hash")
+    }
+    return json.dumps(report, ensure_ascii=False, indent=2)
+# =============================================================================
+# Run the server
+# =============================================================================
+if __name__ == "__main__":
+    # This runs the server with stdio transport (for MCP clients)
+    mcp.run()

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+# HF Spaces already has gradio and huggingface_hub pre-installed
+# Do not add them here or you may get version conflicts
+# Agent dependencies (these are provided by the evaluation infrastructure)
+# Do not add jericho, fastmcp here - they are installed during evaluation
+# Add any additional packages your agent needs below:
+# numpy
+# requests