text-adventure-template

Sleeping

App Files Files Community

Ryn11H commited on Feb 21

Commit

3b082d0

1 Parent(s): 615a63b

Final submission

Browse files

Files changed (7) hide show

.ipynb_checkpoints/README-checkpoint.md +212 -0
.ipynb_checkpoints/agent-checkpoint.py +614 -0
.ipynb_checkpoints/app-checkpoint.py +36 -0
.ipynb_checkpoints/mcp_server-checkpoint.py +520 -0
README.md +157 -4
agent.py +507 -198
mcp_server.py +331 -20

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,212 @@

+---
+title: Text Adventure Agent Submission
+emoji: "\U0001F5FA"
+colorFrom: green
+colorTo: blue
+sdk: gradio
+sdk_version: "5.12.0"
+app_file: app.py
+pinned: false
+license: mit
+---
+# Text Adventure Agent Submission
+## Overview
+This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
+## Approach
+# My Report (MCP-Based Text Adventure Agent  )
+## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
+## Overview
+This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
+The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
+---
+## 1. MCP Server Improvements
+The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
+### 1.1 Robust Location Extraction
+Instead of relying solely on the first line of the observation, the server now:
+- Filters out status-like lines (score, moves, headers, bracketed text)
+- Detects likely room titles heuristically
+- Falls back gracefully when uncertain
+This improves compatibility across different text adventure engines.
+---
+### 1.2 Structured Memory Output
+The `memory()` tool was redesigned to provide:
+- Current game
+- Location
+- Score and moves
+- Extracted visible objects (best-effort heuristics)
+- Mentioned exits
+- Recent action history
+- Full current observation
+This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
+---
+### 1.3 Intelligent Map Construction
+Movement tracking is no longer naive. A move is recorded only if:
+- The location actually changes, and
+- The observation does not contain known movement failure phrases.
+This prevents corrupt map edges and keeps spatial reasoning reliable.
+The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
+---
+### 1.4 Robust Inventory Handling
+Inventory retrieval now:
+- Uses structured state inventory when available
+- Falls back to issuing the `inventory` command
+- Cleans and normalizes item strings
+This ensures cross-game compatibility.
+---
+## 2. Agent-Side Stability and Reasoning Enhancements
+The ReAct loop was significantly extended to address common LLM failure modes.
+---
+### 2.1 Context Refresh Strategy
+The agent periodically refreshes:
+- `memory()` (state grounding)
+- `inventory()` (after item acquisition)
+- `get_map()` (navigation support)
+This improves decision consistency without consuming extra game moves.
+---
+### 2.2 Action Validation and Normalization
+Before execution:
+- Tool names are validated
+- Invalid verbs are mapped to supported equivalents
+- Formatting noise is removed
+- Actions are normalized to consistent lower-case grammar
+This dramatically reduces invalid command generation.
+---
+### 2.3 Multi-Layer Anti-Loop Mechanisms
+Several defensive layers were introduced:
+#### (A) Action Repetition Guard
+If the same action appears three times consecutively, the agent forces a reset (`look`).
+#### (B) Location-Aware Movement Failure Blocking
+Movement attempts are tracked per `(location, direction)` pair.
+If a direction fails multiple times from the same location, it is blocked.
+#### (C) Thought + Action + Location Blocking
+A normalized thought signature is computed.
+If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
+This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
+---
+### 2.4 Controlled Movement Policy
+The agent avoids random wandering by:
+- Encouraging local interaction before movement
+- Prioritizing dominant objects in the observation
+- Blocking repeated failed transitions
+This reduces wasted exploration steps.
+---
+## 3. Design Philosophy
+The key improvements are architectural rather than game-specific:
+- Clear separation between environment (MCP server) and reasoning (LLM agent)
+- Structured state exposure instead of raw narrative text
+- Defensive programming against repetition and invalid behavior
+- Heuristic generalization instead of hardcoded walkthrough logic
+The system is modular, interpretable, and extensible.
+---
+## 4. Conclusion
+Compared to the baseline template, this implementation introduces:
+- Structured memory representation
+- Robust location extraction
+- Intelligent map tracking
+- Inventory normalization
+- Multi-layer loop prevention
+- Location-aware movement validation
+- Thought-action repetition blocking
+- Controlled exploration policy
+The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
+## Files
+| File | Description |
+|------|-------------|
+| `agent.py` | ReAct agent with `StudentAgent` class |
+| `mcp_server.py` | MCP server with game interaction tools |
+| `app.py` | Gradio interface for HF Space |
+| `requirements.txt` | Additional dependencies |
+## How to Submit
+1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
+2. Clone your fork locally
+3. Implement your agent in `agent.py` and `mcp_server.py`
+4. Test locally (see below)
+5. Push your changes to your Space
+6. Submit your Space URL on the course platform
+## Local Testing
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Test the MCP server interactively
+fastmcp dev mcp_server.py
+# Run your agent on a game
+python run_agent.py --agent . --game lostpig -v -n 20
+# Run evaluation
+python -m evaluation.evaluate -s . -g lostpig -t 3
+```

.ipynb_checkpoints/agent-checkpoint.py ADDED Viewed

	@@ -0,0 +1,614 @@

+"""
+: MCP ReAct Agent (adapted for your MCP server)
+Key upgrades:
+- Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
+- Injects those outputs into the LLM prompt (LLM-friendly context)
+- Updates score from BOTH play_action output and memory output
+- Keeps loop detection + action normalization
+"""
+import json
+import os
+import re
+from dataclasses import dataclass, field
+from typing import Optional
+from dotenv import load_dotenv
+from huggingface_hub import InferenceClient
+load_dotenv()
+# =============================================================================
+# LLM Configuration - DO NOT MODIFY
+# =============================================================================
+LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
+def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    """Call the LLM with the given prompt."""
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": prompt},
+    ]
+    response = LLM_CLIENT.chat.completions.create(
+        model=LLM_MODEL,
+        messages=messages,
+        temperature=0.0,
+        max_tokens=max_tokens,
+        seed=seed,
+    )
+    return response.choices[0].message.content
+@dataclass
+class RunResult:
+    """Result of running the agent. Do not modify this class."""
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: set[str]
+    game_completed: bool
+    error: Optional[str] = None
+    history: list[tuple[str, str, str]] = field(default_factory=list)
+# =============================================================================
+# System Prompt
+# =============================================================================
+SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
+Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
+This game is small and objective-focused. Avoid unnecessary wandering.
+AVAILABLE TOOLS (use via MCP):
+1. play_action - Execute valid game commands.
+2. memory - Get structured summary of current state and recent actions.
+3. get_map - See explored locations.
+4. inventory - Check carried items.
+VALID ACTION STYLE:
+Movement:
+- north, south, east, west, up, down
+- n, s, e, w, u, d
+Core actions:
+- look
+- examine <thing>
+- take <item>, drop <item>
+- open <thing>, close <thing>
+- talk to <character>
+- give <item> to <character>
+- use specific verbs mentioned in observation
+AVOID:
+- generic verbs like "use"
+- random movement without purpose
+- repeating failed actions
+--------------------------------------------------
+CORE STRATEGY (IMPORTANT)
+--------------------------------------------------
+1) DOMINANT OBJECT RULE (VERY IMPORTANT):
+If a specific object or character is repeatedly mentioned in the observation,
+treat it as the main objective.
+Do NOT leave the area until you:
+- examine it
+- try multiple meaningful interactions
+- or confirm no new interaction is possible
+Stay focused before exploring elsewhere.
+2) PROBLEM-SOLVING PRIORITY:
+If the game clearly revolves around one main goal,
+prioritize actions that directly affect that goal instead of exploring new rooms.
+3) CONTROLLED MOVEMENT:
+Only move if:
+- you have exhausted interactions in the current room
+- or memory/map suggests a new unexplored path is necessary
+4) LIMITED RETRIES:
+If an action fails once, try a different verb.
+Do NOT repeat the same failed action more than once.
+5) OBJECT TRANSFORMATION FOCUS:
+If an object seems central, try actions that might change its state:
+- examine
+- open
+- give something
+- use appropriate verbs mentioned in text
+- interact from different angles
+--------------------------------------------------
+TOOL USAGE RULES
+--------------------------------------------------
+- Use memory() when uncertain or before repeating behavior.
+- Use get_map() only if navigation becomes necessary.
+- Use inventory() after obtaining items.
+--------------------------------------------------
+OUTPUT FORMAT (STRICT)
+--------------------------------------------------
+THOUGHT: <brief reasoning>
+TOOL: <tool_name>
+ARGS: <JSON arguments>
+Keep THOUGHT short (1-2 sentences).
+Do not repeat the same action multiple times.
+Prefer solving over wandering.
+"""
+# =============================================================================
+# Student Agent Implementation
+# =============================================================================
+class StudentAgent:
+    """
+    MCP ReAct Agent adapted to your MCP server outputs:
+    - memory() returns STATE / RECENT / OBSERVATION
+    - get_map() returns MAP ...
+    - inventory() returns INVENTORY ...
+    """
+    def __init__(self):
+        self.history: list[dict] = []
+        self.recent_actions: list[str] = []
+        self.score: int = 0
+        # Cached tool outputs
+        self.last_memory: str = ""
+        self.last_map: str = ""
+        self.last_inventory: str = ""
+        self.last_observation: str = ""
+        # Exploration / anti-loop state
+        self.visit_counts: dict[str, int] = {}
+        self.loc_move_failures: dict[tuple[str, str], int] = {}
+        self.pending_move: Optional[tuple[str, str]] = None
+        # NEW: prevent repeating same thought+action at same location
+        self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
+    # ------------------------------------------------------------
+    # Thought normalization helper
+    # ------------------------------------------------------------
+    def _thought_sig(self, thought: str) -> str:
+        t = (thought or "").lower()
+        t = re.sub(r"[^a-z0-9\s]", " ", t)
+        t = re.sub(r"\s+", " ", t).strip()
+        return " ".join(t.split()[:12])
+    async def run(
+        self,
+        client,
+        game: str,
+        max_steps: int,
+        seed: int,
+        verbose: bool = False,
+    ) -> RunResult:
+        locations_visited = set()
+        history = []
+        moves = 0
+        MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
+        # Available tools
+        tools = await client.list_tools()
+        tool_names = [t.name for t in tools]
+        # Initial observation
+        result = await client.call_tool("play_action", {"action": "look"})
+        observation = self._extract_result(result)
+        self.last_observation = observation
+        location = observation.split("\n")[0] if observation else "Unknown"
+        locations_visited.add(location)
+        self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
+        # Prime context (no moves)
+        if "memory" in tool_names:
+            self.last_memory = self._extract_result(await client.call_tool("memory", {}))
+            self._update_score(self.last_memory)
+        if "inventory" in tool_names:
+            self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
+        if verbose:
+            print(f"\n{observation}")
+        for step in range(1, max_steps + 1):
+            await self._refresh_context_tools(client, tool_names, step, verbose)
+            prompt = self._build_prompt()
+            response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
+            thought, tool_name, tool_args = self._parse_response(response, tool_names)
+            if verbose:
+                print(f"\n--- Step {step} ---")
+                print(f"[THOUGHT] {thought}")
+                print(f"[TOOL] {tool_name}({tool_args})")
+            tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
+            # ------------------------------------------------------------
+            # Block SAME (location + action + thought)
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                current_loc = (
+                    self.last_observation.split("\n")[0].strip()
+                    if self.last_observation else "Unknown"
+                )
+                action_norm = tool_args.get("action", "look").strip().lower()
+                t_sig = self._thought_sig(thought)
+                triple = (current_loc, action_norm, t_sig)
+                self.loc_action_thought_counts[triple] = (
+                    self.loc_action_thought_counts.get(triple, 0) + 1
+                )
+                if self.loc_action_thought_counts[triple] >= 2:
+                    if verbose:
+                        print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
+                    if "get_map" in tool_names:
+                        tool_name, tool_args = "get_map", {}
+                    elif "memory" in tool_names:
+                        tool_name, tool_args = "memory", {}
+                    else:
+                        tool_name, tool_args = "play_action", {"action": "look"}
+            # ------------------------------------------------------------
+            # Loop detection (same action spam)
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                action = tool_args.get("action", "look")
+                self.recent_actions.append(action)
+                if len(self.recent_actions) > 5:
+                    self.recent_actions = self.recent_actions[-5:]
+                if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
+                    if verbose:
+                        print("[WARNING] Loop detected - forcing 'look'")
+                    tool_args = {"action": "look"}
+            # ------------------------------------------------------------
+            # Anti-backtracking: block only FAILED moves
+            # ------------------------------------------------------------
+            self.pending_move = None
+            if tool_name == "play_action":
+                action_norm = tool_args.get("action", "look").strip().lower()
+                if action_norm in MOVE_CMDS:
+                    current_loc = (
+                        self.last_observation.split("\n")[0].strip()
+                        if self.last_observation else "Unknown"
+                    )
+                    key = (current_loc, action_norm)
+                    if self.loc_move_failures.get(key, 0) >= 2:
+                        if verbose:
+                            print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
+                        if "get_map" in tool_names:
+                            tool_name, tool_args = "get_map", {}
+                        elif "memory" in tool_names:
+                            tool_name, tool_args = "memory", {}
+                        else:
+                            tool_name, tool_args = "play_action", {"action": "look"}
+                    else:
+                        self.pending_move = (current_loc, action_norm)
+            # ------------------------------------------------------------
+            # Count moves
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                moves += 1
+            # ------------------------------------------------------------
+            # Execute tool
+            # ------------------------------------------------------------
+            try:
+                result = await client.call_tool(tool_name, tool_args)
+                out_text = self._extract_result(result)
+                if tool_name == "play_action":
+                    observation = out_text
+                    self.last_observation = observation
+                elif tool_name == "memory":
+                    self.last_memory = out_text
+                elif tool_name == "get_map":
+                    self.last_map = out_text
+                elif tool_name == "inventory":
+                    self.last_inventory = out_text
+                if verbose:
+                    print(f"[RESULT] {out_text[:200]}...")
+            except Exception as e:
+                out_text = f"Error: {e}"
+                observation = out_text
+                self.last_observation = observation
+                if verbose:
+                    print(f"[ERROR] {e}")
+            # ------------------------------------------------------------
+            # Post-move update
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                new_location = observation.split("\n")[0] if observation else "Unknown"
+                if self.pending_move is not None:
+                    prev_loc, prev_action = self.pending_move
+                    key = (prev_loc, prev_action)
+                    if new_location == prev_loc:
+                        self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
+                    else:
+                        self.loc_move_failures[key] = 0
+                    self.pending_move = None
+                location = new_location
+                locations_visited.add(location)
+                self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
+                self._update_score(observation)
+                if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
+                    if "inventory" in tool_names:
+                        self.last_inventory = self._extract_result(
+                            await client.call_tool("inventory", {})
+                        )
+            # ------------------------------------------------------------
+            # History
+            # ------------------------------------------------------------
+            self.history.append({
+                "step": step,
+                "thought": thought,
+                "tool": tool_name,
+                "args": tool_args,
+                "result": out_text[:200]
+            })
+            if len(self.history) > 10:
+                self.history = self.history[-10:]
+            history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
+            if self._is_game_over(observation):
+                if verbose:
+                    print("\n*** GAME OVER ***")
+                break
+        return RunResult(
+            final_score=self.score,
+            max_score=350,
+            moves=moves,
+            locations_visited=locations_visited,
+            game_completed=self._is_game_over(self.last_observation),
+            history=history,
+        )
+    async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
+        """
+        Pull structured context from MCP server without spending moves.
+        Tuned to your server outputs:
+        - memory() is the best single summary
+        - get_map() helps navigation
+        - inventory() helps object planning
+        """
+        # Memory: often (every 4 steps) so LLM doesn't forget state
+        if "memory" in tool_names and (step == 1 or step % 4 == 0):
+            try:
+                self.last_memory = self._extract_result(await client.call_tool("memory", {}))
+                self._update_score(self.last_memory)
+            except Exception:
+                pass
+        # Map: occasionally (every 6 steps), and also if we moved a lot recently
+        if "get_map" in tool_names and (step % 6 == 0):
+            try:
+                self.last_map = self._extract_result(await client.call_tool("get_map", {}))
+            except Exception:
+                pass
+        # Inventory: occasionally (every 10 steps)
+        if "inventory" in tool_names and (step == 1 or step % 10 == 0):
+            try:
+                self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
+            except Exception:
+                pass
+    def _build_prompt(self) -> str:
+        """
+        Build prompt that is aligned with your MCP server:
+        - memory() has STATE/RECENT/OBSERVATION
+        - get_map() starts with MAP
+        - inventory() starts with INVENTORY
+        """
+        parts = []
+        parts.append(f"Current best-known score: {self.score}")
+        # Give the model your server-side memory snapshot (truncate to keep prompt lean)
+        if self.last_memory:
+            mem = self._truncate(self.last_memory, 1200)
+            parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
+        if self.last_inventory:
+            inv = self._truncate(self.last_inventory, 400)
+            parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
+        if self.last_map:
+            mp = self._truncate(self.last_map, 700)
+            parts.append("\n=== MAP (from MCP server) ===\n" + mp)
+        # Recent local history (anti-loop)
+        if self.history:
+            parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
+            for entry in self.history[-3:]:
+                action = entry.get("args", {}).get("action", entry["tool"])
+                result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
+                parts.append(f"  > {action} -> {result_short}")
+            if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
+                parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
+        # Always include the most recent raw observation
+        parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
+        parts.append("\nWhat do you do next?")
+        return "\n".join(parts)
+    def _truncate(self, text: str, limit: int) -> str:
+        text = text or ""
+        if len(text) <= limit:
+            return text
+        return text[:limit] + "\n...[truncated]"
+    def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
+        thought = "No reasoning provided"
+        tool_name = "play_action"
+        tool_args = {"action": "look"}
+        lines = response.strip().split("\n")
+        for line in lines:
+            line_clean = line.strip()
+            line_upper = line_clean.upper()
+            if line_upper.startswith("THOUGHT:"):
+                thought = line_clean.split(":", 1)[1].strip()
+            elif line_upper.startswith("TOOL:"):
+                raw_tool = line_clean.split(":", 1)[1].strip().lower()
+                raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
+                raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
+                tool_name = raw_tool
+            elif line_upper.startswith("ARGS:"):
+                args_part = line_clean.split(":", 1)[1].strip()
+                if not args_part:
+                    tool_args = {}
+                    continue
+                try:
+                    args_part = args_part.replace("'", '"')
+                    tool_args = json.loads(args_part)
+                except json.JSONDecodeError:
+                    match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
+                    if match:
+                        tool_args = {"action": match.group(1)}
+                    else:
+                        tool_args = {"action": "look"}
+        return thought, tool_name, tool_args
+    def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
+        if tool_name not in valid_tools:
+            if tool_name in ["action", "do", "command"]:
+                tool_name = "play_action"
+            elif tool_name in ["map", "location"]:
+                tool_name = "get_map"
+            elif tool_name in ["mem", "state", "status"]:
+                tool_name = "memory"
+            elif tool_name in ["inv", "items"]:
+                tool_name = "inventory"
+            else:
+                tool_name = "play_action"
+        if tool_name == "play_action":
+            action = tool_args.get("action", "look")
+            invalid_verb_map = {
+                "check": "examine",
+                "inspect": "examine",
+                "search": "look",
+                "grab": "take",
+                "pick": "take",
+                "use": "examine",
+                "investigate": "examine",
+            }
+            words = action.lower().split()
+            if words and words[0] in invalid_verb_map:
+                words[0] = invalid_verb_map[words[0]]
+                action = " ".join(words)
+            action = action.lower().strip()
+            action = action.replace("**", "").replace("*", "").replace("`", "")
+            action = " ".join(action.split())
+            tool_args["action"] = action
+        return tool_name, tool_args
+    def _extract_result(self, result) -> str:
+        if hasattr(result, 'content') and result.content:
+            return result.content[0].text
+        if isinstance(result, list) and result:
+            return result[0].text if hasattr(result[0], 'text') else str(result[0])
+        return str(result)
+    def _update_score(self, text: str) -> None:
+        patterns = [
+            r'\[Score:\s*(\d+)',
+            r'Score:\s*(\d+)\b',
+        ]
+        for pattern in patterns:
+            match = re.search(pattern, text, re.IGNORECASE)
+            if match:
+                self.score = max(self.score, int(match.group(1)))
+    def _is_game_over(self, text: str) -> bool:
+        game_over_phrases = [
+            "game over",
+            "you have died",
+            "you are dead",
+            "*** you have died ***",
+        ]
+        text_lower = (text or "").lower()
+        return any(phrase in text_lower for phrase in game_over_phrases)
+# =============================================================================
+# Local Testing
+# =============================================================================
+async def test_agent():
+    from fastmcp import Client
+    agent = StudentAgent()
+    async with Client("mcp_server.py") as client:
+        result = await agent.run(
+            client=client,
+            game="zork1",
+            max_steps=20,
+            seed=42,
+            verbose=True,
+        )
+        print(f"\n{'=' * 50}")
+        print(f"Final Score: {result.final_score}")
+        print(f"Moves: {result.moves}")
+        print(f"Locations: {len(result.locations_visited)}")
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(test_agent())

.ipynb_checkpoints/app-checkpoint.py ADDED Viewed

	@@ -0,0 +1,36 @@

+"""
+Hugging Face Space - Text Adventure Agent Submission
+This is a code-only Space for submitting your agent implementation.
+The evaluation is run separately.
+Files in this submission:
+- agent.py: Your ReAct agent implementation
+- mcp_server.py: Your MCP server implementation
+- requirements.txt: Additional dependencies
+To test locally:
+    fastmcp dev mcp_server.py
+    python agent.py
+"""
+import gradio as gr
+from pathlib import Path
+# Create the Gradio interface
+with gr.Blocks(title="Text Adventure Agent Submission") as demo:
+    gr.Markdown("# Text Adventure Agent Submission")
+    gr.Markdown(
+        "This Space contains a template submission for the Text Adventure Agent assignment. "
+    )
+    gr.Markdown(
+        "---\n"
+        "**Note:** This is a code submission Space. "
+        "Evaluation is performed using the evaluation script.\n\n"
+        "[Back to main assignment page](https://huggingface.co/spaces/LLM-course/Agentic-zork)"
+    )
+if __name__ == "__main__":
+    demo.launch()

.ipynb_checkpoints/mcp_server-checkpoint.py ADDED Viewed

	@@ -0,0 +1,520 @@

+"""
+Student MCP Server for Text Adventure Games
+This is your MCP server submission. Implement the tools that your agent
+will use to play text adventure games.
+Required tool:
+    play_action(action: str) -> str
+        Execute a game command and return the result.
+Recommended tools:
+    memory() -> str
+        Return current game state, score, and recent history.
+    inventory() -> str
+        Return the player's current inventory.
+    get_map() -> str
+        Return a map of explored locations.
+Test your server with:
+    fastmcp dev submission_template/mcp_server.py
+Then open the MCP Inspector in your browser to test the tools interactively.
+"""
+import sys
+import os
+# Add parent directory to path to import games module
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from fastmcp import FastMCP
+from games.zork_env import TextAdventureEnv
+# =============================================================================
+# Create the MCP Server
+# =============================================================================
+mcp = FastMCP("Student Text Adventure Server")
+# =============================================================================
+# Game State Management
+# =============================================================================
+import re
+from typing import Optional
+class GameManager:
+    """
+    Manages the text adventure game state.
+    Extended tracking:
+    - Action history (for memory tool)
+    - Explored locations (for mapping)
+    - Current score and moves
+    - Current location (best-effort, robust across games)
+    """
+    # Lines that are often NOT room titles across many IF games
+    _HEADER_LIKE_PATTERNS = [
+        r"^\s*score\s*[:=]\s*\d+",
+        r"^\s*moves?\s*[:=]\s*\d+",
+        r"^\s*turns?\s*[:=]\s*\d+",
+        r"^\s*time\s*[:=]\s*",
+        r"^\s*health\s*[:=]\s*\d+",
+        r"^\s*location\s*[:=]\s*",
+        r"^\s*\[.*\]\s*$",            # bracket-only status lines
+        r"^\s*\(.*\)\s*$",            # parenthetical-only lines
+        r"^\s*you\s+(are|see|can)\b",  # narrative sentence starters
+    ]
+    # Movement commands we consider for mapping (Zork-style + abbreviations)
+    _MOVE_CMDS = {
+        "north", "south", "east", "west", "up", "down", "enter", "exit",
+        "n", "s", "e", "w", "u", "d"
+    }
+    # Common failure phrases when trying to move (best-effort, not perfect)
+    _MOVE_FAIL_PHRASES = [
+        "you can't go", "you cannot go", "can't go that way", "cannot go that way",
+        "you can't go that way", "you cannot go that way",
+        "you can't", "you cannot",
+        "there is no way", "you can't see any way", "you see no way",
+        "blocked", "closed", "won't open", "is locked", "locked",
+        "too dark", "pitch black"
+    ]
+    def _is_movement_action(self, action: str) -> bool:
+        """Return True if this action is a movement command we track."""
+        a = (action or "").strip().lower()
+        return a in self._MOVE_CMDS
+    def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
+        """
+        Decide whether a move likely succeeded.
+        Strong signal: location label changed.
+        Negative signal: failure phrases in observation.
+        """
+        if new_loc and old_loc and new_loc != old_loc:
+            return True
+        text = (observation or "").lower()
+        if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
+            return False
+        # If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
+        return False
+    def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
+        """Record a directed edge old_loc --action--> new_loc in explored_locations."""
+        if not old_loc or not new_loc:
+            return
+        self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
+    def __init__(self):
+        self.env: TextAdventureEnv = None
+        self.state = None
+        self.game_name: str = ""
+        # Tracking for agent-support tools
+        self.history: list[tuple[str, str]] = []
+        self.explored_locations: dict[str, set[str]] = {}
+        self.current_location: str = "Unknown"
+    def initialize(self, game: str = "zork1"):
+        """Initialize or reset the game."""
+        self.game_name = game
+        self.env = TextAdventureEnv(game)
+        self.state = self.env.reset()
+        # Reset tracking
+        self.history = []
+        self.explored_locations = {}
+        self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
+        return self.state.observation
+    def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
+        """
+        Best-effort location extraction from the observation text.
+        Strategy:
+        1) Split into lines, skip empties
+        2) Skip lines that look like status bars / headers / pure brackets
+        3) Prefer a short, title-like line (room name)
+        4) If nothing confident, return fallback (usually previous location)
+        """
+        if not observation:
+            return fallback or "Unknown"
+        lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
+        if not lines:
+            return fallback or "Unknown"
+        header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
+        def looks_like_header(line: str) -> bool:
+            return any(rx.search(line) for rx in header_res)
+        def looks_like_title(line: str) -> bool:
+            # Many room titles are short and not ending with punctuation.
+            if len(line) > 60:
+                return False
+            if line.endswith((".", "!", "?", ";", ":")):
+                return False
+            # Too many digits usually means a status line.
+            if sum(ch.isdigit() for ch in line) >= 3:
+                return False
+            return True
+        # First pass: first "title-like" line that isn't header-like
+        for line in lines[:8]:  # only inspect top chunk; titles are usually early
+            if looks_like_header(line):
+                continue
+            if looks_like_title(line):
+                return line
+        # Second pass: first non-header line
+        for line in lines[:8]:
+            if not looks_like_header(line):
+                return line
+        return fallback or "Unknown"
+    def step(self, action: str) -> str:
+        """Execute an action and return the result."""
+        if self.env is None:
+            self.initialize()
+        # Save old location before action
+        old_location = self.current_location
+        # Apply action to the real game
+        self.state = self.env.step(action)
+        obs = self.state.observation
+        # Track history (keep last 50)
+        self.history.append((action, obs))
+        if len(self.history) > 50:
+            self.history = self.history[-50:]
+        # Extract new location (fallback to old)
+        new_location = self._extract_location(obs, fallback=old_location)
+        # Update map only if it was a movement attempt AND it likely succeeded
+        action_norm = (action or "").strip().lower()
+        if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
+            self._update_map(action_norm, old_location, new_location)
+        # Finally update current location
+        self.current_location = new_location
+        return obs
+    def get_score(self) -> int:
+        """Get current score."""
+        return self.state.score if self.state else 0
+    def get_moves(self) -> int:
+        """Get number of moves taken."""
+        return self.state.moves if self.state else 0
+    def _extract_facts(self, observation: str) -> dict:
+        """
+        Best-effort extraction of useful 'facts' from the current observation text.
+        This is intentionally heuristic so it can work across many games.
+        """
+        obs = observation or ""
+        text = obs.strip()
+        lower = text.lower()
+        # --- Exits mentioned (simple direction scan) ---
+        directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
+        exits_found = []
+        for d in directions:
+            # We detect directions as whole words to reduce false matches
+            if re.search(rf"\b{re.escape(d)}\b", lower):
+                exits_found.append(d)
+        exits_found = sorted(set(exits_found))
+        # --- Visible things (very light heuristics) ---
+        # We look for common IF patterns like "You see ... here." / "There is ... here."
+        visible_candidates: list[str] = []
+        patterns = [
+            r"you see (.+?) here\.",
+            r"you can see (.+?) here\.",
+            r"there is (.+?) here\.",
+            r"there are (.+?) here\.",
+            r"you notice (.+?)\.",
+        ]
+        for pat in patterns:
+            for m in re.finditer(pat, lower):
+                chunk = m.group(1).strip()
+                if chunk:
+                    visible_candidates.append(chunk)
+        # Clean visible candidates a bit (split simple lists, avoid huge strings)
+        visible = []
+        for chunk in visible_candidates:
+            # Split on commas and "and" to get smaller pieces
+            parts = re.split(r",|\band\b", chunk)
+            for p in parts:
+                item = p.strip(" .;:!?\t")
+                if 1 <= len(item) <= 40:
+                    visible.append(item)
+        # Deduplicate and limit (so memory stays compact)
+        visible = sorted(set(visible))[:10]
+        return {
+            "exits_mentioned": exits_found,
+            "visible": visible,
+        }
+    def get_memory(self) -> str:
+        """
+        LLM-friendly summary of current game state.
+        Format: Facts first, then recent actions, then the raw observation.
+        """
+        game = self.game_name or "Unknown"
+        location = self.current_location or "Unknown"
+        score = self.get_score()
+        moves = self.get_moves()
+        # Recent actions (keep short and anti-loop)
+        recent = self.history[-5:] if self.history else []
+        if recent:
+            recent_lines = []
+            for a, r in recent:
+                snippet = (r or "").replace("\n", " ").strip()
+                if len(snippet) > 80:
+                    snippet = snippet[:80] + "..."
+                recent_lines.append(f"- {a} -> {snippet}")
+            recent_str = "\n".join(recent_lines)
+        else:
+            recent_str = "(none yet)"
+        # Facts extracted from current observation
+        obs = self.state.observation if self.state else ""
+        facts = self._extract_facts(obs)
+        exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
+        visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
+        return (
+            "STATE\n"
+            f"Game: {game}\n"
+            f"Location: {location}\n"
+            f"Score: {score}   Moves: {moves}\n"
+            f"Visible (best effort): {visible_txt}\n"
+            f"Exits mentioned (best effort): {exits_txt}\n"
+            "\n"
+            "RECENT\n"
+            f"{recent_str}\n"
+            "\n"
+            "OBSERVATION\n"
+            f"{obs}"
+        )
+    def get_map(self) -> str:
+        """
+        Return a readable map of explored locations.
+        Uses explored_locations built during movement actions.
+        Output is stable + compact for LLM use.
+        """
+        if not self.explored_locations:
+            return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
+        lines = ["MAP", "Explored locations and exits:"]
+        for loc in sorted(self.explored_locations.keys()):
+            exits = sorted(self.explored_locations[loc])
+            lines.append(f"\n* {loc}")
+            for e in exits:
+                lines.append(f"  - {e}")
+        lines.append(f"\n[Current] {self.current_location}")
+        return "\n".join(lines)
+    def get_inventory(self) -> str:
+        """
+        Return inventory in a robust way across different games/envs.
+        Strategy:
+        1) If state.inventory exists and is non-empty -> format it
+        2) Otherwise, fall back to issuing the command "inventory"
+           through the environment and return that observation
+        """
+        # 1) Try structured inventory if provided by env
+        items = []
+        if self.state is not None and hasattr(self.state, "inventory"):
+            inv = getattr(self.state, "inventory")
+            if inv:
+                # Normalize to strings
+                try:
+                    items = [str(x).strip() for x in inv if str(x).strip()]
+                except Exception:
+                    items = []
+        if items:
+            # Keep it simple and safe: just join a cleaned list
+            # (Avoid overly aggressive parsing that breaks across games)
+            items = sorted(set(items))
+            return "INVENTORY\n" + ", ".join(items)
+        # 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
+        # NOTE: We do not want to record this as agent history/map; this is a server-side query.
+        if self.env is None:
+            self.initialize()
+        try:
+            tmp_state = self.env.step("inventory")
+            inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
+        except Exception:
+            inv_text = "Inventory: (unable to retrieve)"
+        return "INVENTORY\n" + inv_text.strip()
+# Global game manager
+_game = GameManager()
+def get_game() -> GameManager:
+    """Get or initialize the game manager."""
+    global _game
+    if _game.env is None:
+        # Get game from environment variable (set by evaluator)
+        game = os.environ.get("GAME", "zork1")
+        _game.initialize(game)
+    return _game
+# =============================================================================
+# MCP Tools - IMPLEMENT THESE
+# =============================================================================
+@mcp.tool()
+def play_action(action: str) -> str:
+    """
+    Execute a game command and return the result.
+    This is the main tool for interacting with the game.
+    Args:
+        action: The command to execute (e.g., "north", "take lamp", "open mailbox")
+    Returns:
+        The game's response to the action
+    Valid commands include:
+        - Movement: north, south, east, west, up, down, enter, exit
+        - Objects: take <item>, drop <item>, open <thing>, examine <thing>
+        - Other: look, inventory, read <thing>, turn on lamp
+    """
+    game = get_game()
+    # TODO: You might want to add action validation here
+    # TODO: You might want to include score changes in the response
+    result = game.step(action)
+    # Append score/moves for clearer feedback (LLM-friendly, low noise)
+    result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
+    return result
+    # Optional: Append score info
+    # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
+@mcp.tool()
+def memory() -> str:
+    """
+    Return an LLM-friendly summary of the current game state.
+    """
+    game = get_game()
+    return game.get_memory()
+@mcp.tool()
+def get_map() -> str:
+    """
+    Return a map of explored locations and recorded exits.
+    """
+    game = get_game()
+    return game.get_map()
+@mcp.tool()
+def inventory() -> str:
+    """
+    Return the player's inventory in a robust way.
+    """
+    game = get_game()
+    return game.get_inventory()
+# TODO: Implement additional tools to help your agent
+# @mcp.tool()
+# def memory() -> str:
+#     """
+#     Get the current game state summary.
+#
+#     Returns:
+#         A summary including current location, score, moves, and recent history
+#     """
+#     game = get_game()
+#     # TODO: Return useful state information
+#     pass
+# @mcp.tool()
+# def inventory() -> str:
+#     """
+#     Check what the player is carrying.
+#
+#     Returns:
+#         List of items in the player's inventory
+#     """
+#     game = get_game()
+#     result = game.step("inventory")
+#     return result
+# @mcp.tool()
+# def get_map() -> str:
+#     """
+#     Get a map of explored locations.
+#
+#     Returns:
+#         A text representation of explored locations and connections
+#     """
+#     game = get_game()
+#     # TODO: Return map of explored locations
+#     pass
+# @mcp.tool()
+# def get_valid_actions() -> str:
+#     """
+#     Get a list of likely valid actions from the current location.
+#
+#     Returns:
+#         List of actions that might work here
+#     """
+#     # This is a hint: Jericho provides get_valid_actions()
+#     game = get_game()
+#     if game.env and game.env.env:
+#         valid = game.env.env.get_valid_actions()
+#         return "Valid actions: " + ", ".join(valid[:20])
+#     return "Could not determine valid actions"
+# =============================================================================
+# Run the server
+# =============================================================================
+if __name__ == "__main__":
+    # This runs the server with stdio transport (for MCP clients)
+    mcp.run()

README.md CHANGED Viewed

@@ -18,11 +18,164 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
 ## Approach
-<!-- Describe your approach here -->
-- What strategy does your agent use?
-- What tools did you implement in your MCP server?
-- Any interesting techniques or optimizations?
 ## Files

 ## Approach
+# My Report (MCP-Based Text Adventure Agent  )
+## Structured State Design, Guarded ReAct Reasoning, and Stability Improvements
+## Overview
+This project implements a fully functional MCP (Model Context Protocol) server and an LLM-driven ReAct agent for text adventure games. While a baseline was provided, this submission significantly extends and stabilizes that template by redesigning state exposure, improving tool structure, and introducing multiple guardrails against common LLM failure modes.
+The primary focus of this work was not brute-force performance tuning, but architectural improvement, robustness, and reasoning stability.
+---
+## 1. MCP Server Improvements
+The original template exposed minimal game interaction. I redesigned the MCP server to provide structured, reliable, and LLM-friendly state representations.
+### 1.1 Robust Location Extraction
+Instead of relying solely on the first line of the observation, the server now:
+- Filters out status-like lines (score, moves, headers, bracketed text)
+- Detects likely room titles heuristically
+- Falls back gracefully when uncertain
+This improves compatibility across different text adventure engines.
+---
+### 1.2 Structured Memory Output
+The `memory()` tool was redesigned to provide:
+- Current game
+- Location
+- Score and moves
+- Extracted visible objects (best-effort heuristics)
+- Mentioned exits
+- Recent action history
+- Full current observation
+This structured format reduces hallucination and anchors the LLM in grounded state information. It transforms raw narrative text into usable reasoning signals.
+---
+### 1.3 Intelligent Map Construction
+Movement tracking is no longer naive. A move is recorded only if:
+- The location actually changes, and
+- The observation does not contain known movement failure phrases.
+This prevents corrupt map edges and keeps spatial reasoning reliable.
+The resulting `get_map()` tool exposes clean directional transitions without noise from failed attempts.
+---
+### 1.4 Robust Inventory Handling
+Inventory retrieval now:
+- Uses structured state inventory when available
+- Falls back to issuing the `inventory` command
+- Cleans and normalizes item strings
+This ensures cross-game compatibility.
+---
+## 2. Agent-Side Stability and Reasoning Enhancements
+The ReAct loop was significantly extended to address common LLM failure modes.
+---
+### 2.1 Context Refresh Strategy
+The agent periodically refreshes:
+- `memory()` (state grounding)
+- `inventory()` (after item acquisition)
+- `get_map()` (navigation support)
+This improves decision consistency without consuming extra game moves.
+---
+### 2.2 Action Validation and Normalization
+Before execution:
+- Tool names are validated
+- Invalid verbs are mapped to supported equivalents
+- Formatting noise is removed
+- Actions are normalized to consistent lower-case grammar
+This dramatically reduces invalid command generation.
+---
+### 2.3 Multi-Layer Anti-Loop Mechanisms
+Several defensive layers were introduced:
+#### (A) Action Repetition Guard
+If the same action appears three times consecutively, the agent forces a reset (`look`).
+#### (B) Location-Aware Movement Failure Blocking
+Movement attempts are tracked per `(location, direction)` pair.
+If a direction fails multiple times from the same location, it is blocked.
+#### (C) Thought + Action + Location Blocking
+A normalized thought signature is computed.
+If the same thought leads to the same action in the same location more than once, the agent is forced to change strategy (memory/map call).
+This addresses the subtle ReAct issue where reasoning itself becomes cyclic.
+---
+### 2.4 Controlled Movement Policy
+The agent avoids random wandering by:
+- Encouraging local interaction before movement
+- Prioritizing dominant objects in the observation
+- Blocking repeated failed transitions
+This reduces wasted exploration steps.
+---
+## 3. Design Philosophy
+The key improvements are architectural rather than game-specific:
+- Clear separation between environment (MCP server) and reasoning (LLM agent)
+- Structured state exposure instead of raw narrative text
+- Defensive programming against repetition and invalid behavior
+- Heuristic generalization instead of hardcoded walkthrough logic
+The system is modular, interpretable, and extensible.
+---
+## 4. Conclusion
+Compared to the baseline template, this implementation introduces:
+- Structured memory representation
+- Robust location extraction
+- Intelligent map tracking
+- Inventory normalization
+- Multi-layer loop prevention
+- Location-aware movement validation
+- Thought-action repetition blocking
+- Controlled exploration policy
+The result is a significantly more stable, grounded, and architecturally improved MCP-based text adventure agent.
 ## Files

agent.py CHANGED Viewed

@@ -1,26 +1,11 @@
 """
-Student Agent for Text Adventure Games
-This is your submission file. Implement the StudentAgent class to play
-text adventure games using the MCP server you also implement.
-Your agent should:
-1. Connect to the MCP server via the provided client
-2. Use the ReAct pattern (Thought -> Action -> Observation)
-3. Call MCP tools to interact with the game
-4. Maximize the game score within the step limit
-Required method:
-    async def run(self, client, game, max_steps, seed, verbose) -> RunResult
-The 'client' is a FastMCP Client already connected to your MCP server.
-Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
-Tips:
-- Start by looking around and understanding your environment
-- Keep track of visited locations to avoid loops
-- Pick up useful items (lamp, sword, etc.)
-- The seed parameter should be used to set your LLM's seed for reproducibility
 """
 import json
@@ -32,79 +17,32 @@ from typing import Optional
 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
-# Load environment variables
 load_dotenv()
-# Set USE_LOCAL_MODEL=1 in your .env to use a locally downloaded model
-USE_LOCAL_MODEL = os.getenv("USE_LOCAL_MODEL", "0").strip() in ("1", "true", "yes")
-LOCAL_MODEL_ID = os.getenv("LOCAL_MODEL_ID", "Qwen/Qwen2.5-3B-Instruct")
 # =============================================================================
 # LLM Configuration - DO NOT MODIFY
 # =============================================================================
-# Model to use (fixed for fair evaluation)
 LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
-# Initialize the LLM client based on mode
-_local_pipeline = None
-if USE_LOCAL_MODEL:
-    import torch
-    from transformers import pipeline as _hf_pipeline
-    _local_pipeline = _hf_pipeline(
-        "text-generation",
-        model=LOCAL_MODEL_ID,
-        torch_dtype=torch.bfloat16,
-        device_map="auto",
-    )
-    LLM_CLIENT = None
-else:
-    _hf_token = os.getenv("HF_TOKEN")
-    if not _hf_token:
-        raise ValueError("HF_TOKEN not found. Set it in your .env file.")
-    LLM_CLIENT = InferenceClient(token=_hf_token)
 def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
-    """
-    Call the LLM with the given prompt. Use this function in your agent.
-    Args:
-        prompt: The user prompt (current game state, history, etc.)
-        system_prompt: The system prompt (instructions for the agent)
-        seed: Random seed for reproducibility
-        max_tokens: Maximum tokens in response (default: 300)
-    Returns:
-        The LLM's response text
-    Example:
-        response = call_llm(
-            prompt="You are in a forest. What do you do?",
-            system_prompt=SYSTEM_PROMPT,
-            seed=42,
-        )
-    """
     messages = [
         {"role": "system", "content": system_prompt},
         {"role": "user", "content": prompt},
     ]
-    if USE_LOCAL_MODEL and _local_pipeline is not None:
-        outputs = _local_pipeline(
-            messages,
-            max_new_tokens=max_tokens,
-            temperature=0.0001,  # Near-deterministic (0.0 unsupported by some backends)
-            do_sample=True,
-        )
-        return outputs[0]["generated_text"][-1]["content"]
     response = LLM_CLIENT.chat.completions.create(
         model=LLM_MODEL,
         messages=messages,
-        temperature=0.0,  # Deterministic for reproducibility
         max_tokens=max_tokens,
         seed=seed,
     )
@@ -125,179 +63,550 @@ class RunResult:
 # =============================================================================
-# System Prompt - Customize this for your agent
 # =============================================================================
-SYSTEM_PROMPT = """You are playing a classic text adventure game.
-GOAL: Explore the world, solve puzzles, and maximize your score.
 AVAILABLE TOOLS (use via MCP):
-- play_action: Execute a game command (north, take lamp, open mailbox, etc.)
-- memory: Get current game state and history (if implemented)
-- inventory: Check what you're carrying (if implemented)
-VALID GAME COMMANDS for play_action:
-- Movement: north, south, east, west, up, down, enter, exit
-- Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
-- Other: look, inventory, read <thing>, turn on lamp
-RESPOND IN THIS EXACT FORMAT (no markdown):
-THOUGHT: <your reasoning about what to do next>
 TOOL: <tool_name>
-ARGS: <JSON arguments, e.g., {"action": "look"}>
-Example:
-THOUGHT: I should look around to see where I am.
-TOOL: play_action
-ARGS: {"action": "look"}
 """
 # =============================================================================
-# Student Agent - IMPLEMENT THIS CLASS
 # =============================================================================
 class StudentAgent:
     """
-    Your ReAct agent implementation.
-    TODO:
-    1. Implement the run() method with the ReAct loop
-    2. Parse LLM responses to extract tool calls
-    3. Track state and avoid loops
-    Use the provided call_llm() function to interact with the LLM.
     """
     def __init__(self):
-        """Initialize your agent here."""
-        # TODO: Initialize any state tracking you need
-        # self.history = []
-        # self.visited_locations = set()
-        pass
     async def run(
         self,
-        client,  # FastMCP Client connected to your MCP server
         game: str,
         max_steps: int,
         seed: int,
         verbose: bool = False,
     ) -> RunResult:
-        """
-        Run the agent for a game session.
-        Args:
-            client: FastMCP Client connected to your MCP server
-            game: Name of the game being played (e.g., "zork1")
-            max_steps: Maximum number of steps to take
-            seed: Random seed for reproducibility (use for LLM calls)
-            verbose: Whether to print detailed output
-        Returns:
-            RunResult with final score and statistics
-        """
-        # TODO: Implement your ReAct loop here
-        #
-        # Basic structure:
-        # 1. Get initial observation (call play_action with "look")
-        # 2. Loop for max_steps:
-        #    a. Build prompt with current observation and history
-        #    b. Call LLM to get thought and action
-        #    c. Parse the response to extract tool and args
-        #    d. Call the tool via client.call_tool(tool_name, args)
-        #    e. Update history and state
-        #    f. Check for game over
-        # 3. Return RunResult with final statistics
-        # Example of calling a tool:
-        # result = await client.call_tool("play_action", {"action": "look"})
-        # observation = result[0].text if result else "No response"
-        # Example of calling the LLM:
-        # response = call_llm(
-        #     prompt="Current observation: " + observation,
-        #     system_prompt=SYSTEM_PROMPT,
-        #     seed=seed,
-        # )
-        # Placeholder implementation - replace with your code
         locations_visited = set()
         history = []
-        final_score = 0
         moves = 0
-        # TODO: Your implementation here
-        # ...
         return RunResult(
-            final_score=final_score,
-            max_score=350,  # Zork1 max score, adjust if needed
             moves=moves,
             locations_visited=locations_visited,
-            game_completed=False,
             history=history,
         )
-    def _build_prompt(self, observation: str, history: list) -> str:
-        """
-        Build the prompt for the LLM.
-        TODO: Implement this to create effective prompts
-        """
-        # TODO: Combine system prompt, history, and current observation
-        pass
-    def _parse_response(self, response: str) -> tuple[str, str, dict]:
         """
-        Parse LLM response to extract thought, tool name, and arguments.
-        TODO: Implement robust parsing
-        Returns:
-            Tuple of (thought, tool_name, args_dict)
         """
-        # TODO: Parse the response format:
-        # THOUGHT: ...
-        # TOOL: ...
-        # ARGS: {...}
-        pass
-    def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
         """
-        Call the LLM with the given prompt.
-        This is a convenience wrapper - you can also use call_llm() directly.
         """
-        return call_llm(prompt, system_prompt, seed)
 # =============================================================================
-# For local testing
 # =============================================================================
 async def test_agent():
-    """Test the agent locally."""
     from fastmcp import Client
-    # Path to your MCP server
-    server_path = "mcp_server.py"
     agent = StudentAgent()
-    async with Client(server_path) as client:
         result = await agent.run(
             client=client,
             game="zork1",
-            max_steps=10,
             seed=42,
             verbose=True,
         )
-        print(f"\nFinal Score: {result.final_score}")
         print(f"Moves: {result.moves}")
-        print(f"Locations: {result.locations_visited}")
 if __name__ == "__main__":

 """
+: MCP ReAct Agent (adapted for your MCP server)
+Key upgrades:
+- Actually calls memory/get_map/inventory periodically (doesn't cost "moves")
+- Injects those outputs into the LLM prompt (LLM-friendly context)
+- Updates score from BOTH play_action output and memory output
+- Keeps loop detection + action normalization
 """
 import json
 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
 load_dotenv()
 # =============================================================================
 # LLM Configuration - DO NOT MODIFY
 # =============================================================================
 LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
 def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    """Call the LLM with the given prompt."""
     messages = [
         {"role": "system", "content": system_prompt},
         {"role": "user", "content": prompt},
     ]
     response = LLM_CLIENT.chat.completions.create(
         model=LLM_MODEL,
         messages=messages,
+        temperature=0.0,
         max_tokens=max_tokens,
         seed=seed,
     )
 # =============================================================================
+# System Prompt
 # =============================================================================
+SYSTEM_PROMPT = """You are an intelligent text adventure game agent.
+Your goal is to solve the main problem of the game efficiently and maximize score within 100 moves.
+This game is small and objective-focused. Avoid unnecessary wandering.
 AVAILABLE TOOLS (use via MCP):
+1. play_action - Execute valid game commands.
+2. memory - Get structured summary of current state and recent actions.
+3. get_map - See explored locations.
+4. inventory - Check carried items.
+VALID ACTION STYLE:
+Movement:
+- north, south, east, west, up, down
+- n, s, e, w, u, d
+Core actions:
+- look
+- examine <thing>
+- take <item>, drop <item>
+- open <thing>, close <thing>
+- talk to <character>
+- give <item> to <character>
+- use specific verbs mentioned in observation
+AVOID:
+- generic verbs like "use"
+- random movement without purpose
+- repeating failed actions
+--------------------------------------------------
+CORE STRATEGY (IMPORTANT)
+--------------------------------------------------
+1) DOMINANT OBJECT RULE (VERY IMPORTANT):
+If a specific object or character is repeatedly mentioned in the observation,
+treat it as the main objective.
+Do NOT leave the area until you:
+- examine it
+- try multiple meaningful interactions
+- or confirm no new interaction is possible
+Stay focused before exploring elsewhere.
+2) PROBLEM-SOLVING PRIORITY:
+If the game clearly revolves around one main goal,
+prioritize actions that directly affect that goal instead of exploring new rooms.
+3) CONTROLLED MOVEMENT:
+Only move if:
+- you have exhausted interactions in the current room
+- or memory/map suggests a new unexplored path is necessary
+4) LIMITED RETRIES:
+If an action fails once, try a different verb.
+Do NOT repeat the same failed action more than once.
+5) OBJECT TRANSFORMATION FOCUS:
+If an object seems central, try actions that might change its state:
+- examine
+- open
+- give something
+- use appropriate verbs mentioned in text
+- interact from different angles
+--------------------------------------------------
+TOOL USAGE RULES
+--------------------------------------------------
+- Use memory() when uncertain or before repeating behavior.
+- Use get_map() only if navigation becomes necessary.
+- Use inventory() after obtaining items.
+--------------------------------------------------
+OUTPUT FORMAT (STRICT)
+--------------------------------------------------
+THOUGHT: <brief reasoning>
 TOOL: <tool_name>
+ARGS: <JSON arguments>
+Keep THOUGHT short (1-2 sentences).
+Do not repeat the same action multiple times.
+Prefer solving over wandering.
 """
 # =============================================================================
+# Student Agent Implementation
 # =============================================================================
 class StudentAgent:
     """
+    MCP ReAct Agent adapted to your MCP server outputs:
+    - memory() returns STATE / RECENT / OBSERVATION
+    - get_map() returns MAP ...
+    - inventory() returns INVENTORY ...
     """
     def __init__(self):
+        self.history: list[dict] = []
+        self.recent_actions: list[str] = []
+        self.score: int = 0
+        # Cached tool outputs
+        self.last_memory: str = ""
+        self.last_map: str = ""
+        self.last_inventory: str = ""
+        self.last_observation: str = ""
+        # Exploration / anti-loop state
+        self.visit_counts: dict[str, int] = {}
+        self.loc_move_failures: dict[tuple[str, str], int] = {}
+        self.pending_move: Optional[tuple[str, str]] = None
+        # NEW: prevent repeating same thought+action at same location
+        self.loc_action_thought_counts: dict[tuple[str, str, str], int] = {}
+    # ------------------------------------------------------------
+    # Thought normalization helper
+    # ------------------------------------------------------------
+    def _thought_sig(self, thought: str) -> str:
+        t = (thought or "").lower()
+        t = re.sub(r"[^a-z0-9\s]", " ", t)
+        t = re.sub(r"\s+", " ", t).strip()
+        return " ".join(t.split()[:12])
     async def run(
         self,
+        client,
         game: str,
         max_steps: int,
         seed: int,
         verbose: bool = False,
     ) -> RunResult:
         locations_visited = set()
         history = []
         moves = 0
+        MOVE_CMDS = {"north","south","east","west","up","down","enter","exit","n","s","e","w","u","d"}
+        # Available tools
+        tools = await client.list_tools()
+        tool_names = [t.name for t in tools]
+        # Initial observation
+        result = await client.call_tool("play_action", {"action": "look"})
+        observation = self._extract_result(result)
+        self.last_observation = observation
+        location = observation.split("\n")[0] if observation else "Unknown"
+        locations_visited.add(location)
+        self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
+        # Prime context (no moves)
+        if "memory" in tool_names:
+            self.last_memory = self._extract_result(await client.call_tool("memory", {}))
+            self._update_score(self.last_memory)
+        if "inventory" in tool_names:
+            self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
+        if verbose:
+            print(f"\n{observation}")
+        for step in range(1, max_steps + 1):
+            await self._refresh_context_tools(client, tool_names, step, verbose)
+            prompt = self._build_prompt()
+            response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
+            thought, tool_name, tool_args = self._parse_response(response, tool_names)
+            if verbose:
+                print(f"\n--- Step {step} ---")
+                print(f"[THOUGHT] {thought}")
+                print(f"[TOOL] {tool_name}({tool_args})")
+            tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
+            # ------------------------------------------------------------
+            # Block SAME (location + action + thought)
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                current_loc = (
+                    self.last_observation.split("\n")[0].strip()
+                    if self.last_observation else "Unknown"
+                )
+                action_norm = tool_args.get("action", "look").strip().lower()
+                t_sig = self._thought_sig(thought)
+                triple = (current_loc, action_norm, t_sig)
+                self.loc_action_thought_counts[triple] = (
+                    self.loc_action_thought_counts.get(triple, 0) + 1
+                )
+                if self.loc_action_thought_counts[triple] >= 2:
+                    if verbose:
+                        print(f"[ANTI-REPEAT] Blocking repeated thought+action at '{current_loc}'")
+                    if "get_map" in tool_names:
+                        tool_name, tool_args = "get_map", {}
+                    elif "memory" in tool_names:
+                        tool_name, tool_args = "memory", {}
+                    else:
+                        tool_name, tool_args = "play_action", {"action": "look"}
+            # ------------------------------------------------------------
+            # Loop detection (same action spam)
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                action = tool_args.get("action", "look")
+                self.recent_actions.append(action)
+                if len(self.recent_actions) > 5:
+                    self.recent_actions = self.recent_actions[-5:]
+                if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
+                    if verbose:
+                        print("[WARNING] Loop detected - forcing 'look'")
+                    tool_args = {"action": "look"}
+            # ------------------------------------------------------------
+            # Anti-backtracking: block only FAILED moves
+            # ------------------------------------------------------------
+            self.pending_move = None
+            if tool_name == "play_action":
+                action_norm = tool_args.get("action", "look").strip().lower()
+                if action_norm in MOVE_CMDS:
+                    current_loc = (
+                        self.last_observation.split("\n")[0].strip()
+                        if self.last_observation else "Unknown"
+                    )
+                    key = (current_loc, action_norm)
+                    if self.loc_move_failures.get(key, 0) >= 2:
+                        if verbose:
+                            print(f"[GUARD] Blocking failed move '{action_norm}' from '{current_loc}'")
+                        if "get_map" in tool_names:
+                            tool_name, tool_args = "get_map", {}
+                        elif "memory" in tool_names:
+                            tool_name, tool_args = "memory", {}
+                        else:
+                            tool_name, tool_args = "play_action", {"action": "look"}
+                    else:
+                        self.pending_move = (current_loc, action_norm)
+            # ------------------------------------------------------------
+            # Count moves
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                moves += 1
+            # ------------------------------------------------------------
+            # Execute tool
+            # ------------------------------------------------------------
+            try:
+                result = await client.call_tool(tool_name, tool_args)
+                out_text = self._extract_result(result)
+                if tool_name == "play_action":
+                    observation = out_text
+                    self.last_observation = observation
+                elif tool_name == "memory":
+                    self.last_memory = out_text
+                elif tool_name == "get_map":
+                    self.last_map = out_text
+                elif tool_name == "inventory":
+                    self.last_inventory = out_text
+                if verbose:
+                    print(f"[RESULT] {out_text[:200]}...")
+            except Exception as e:
+                out_text = f"Error: {e}"
+                observation = out_text
+                self.last_observation = observation
+                if verbose:
+                    print(f"[ERROR] {e}")
+            # ------------------------------------------------------------
+            # Post-move update
+            # ------------------------------------------------------------
+            if tool_name == "play_action":
+                new_location = observation.split("\n")[0] if observation else "Unknown"
+                if self.pending_move is not None:
+                    prev_loc, prev_action = self.pending_move
+                    key = (prev_loc, prev_action)
+                    if new_location == prev_loc:
+                        self.loc_move_failures[key] = self.loc_move_failures.get(key, 0) + 1
+                    else:
+                        self.loc_move_failures[key] = 0
+                    self.pending_move = None
+                location = new_location
+                locations_visited.add(location)
+                self.visit_counts[location] = self.visit_counts.get(location, 0) + 1
+                self._update_score(observation)
+                if re.search(r"\bTaken\b|\byou are now carrying\b", observation, re.IGNORECASE):
+                    if "inventory" in tool_names:
+                        self.last_inventory = self._extract_result(
+                            await client.call_tool("inventory", {})
+                        )
+            # ------------------------------------------------------------
+            # History
+            # ------------------------------------------------------------
+            self.history.append({
+                "step": step,
+                "thought": thought,
+                "tool": tool_name,
+                "args": tool_args,
+                "result": out_text[:200]
+            })
+            if len(self.history) > 10:
+                self.history = self.history[-10:]
+            history.append((thought, f"{tool_name}({tool_args})", out_text[:100]))
+            if self._is_game_over(observation):
+                if verbose:
+                    print("\n*** GAME OVER ***")
+                break
         return RunResult(
+            final_score=self.score,
+            max_score=350,
             moves=moves,
             locations_visited=locations_visited,
+            game_completed=self._is_game_over(self.last_observation),
             history=history,
         )
+    async def _refresh_context_tools(self, client, tool_names: list[str], step: int, verbose: bool) -> None:
         """
+        Pull structured context from MCP server without spending moves.
+        Tuned to your server outputs:
+        - memory() is the best single summary
+        - get_map() helps navigation
+        - inventory() helps object planning
         """
+        # Memory: often (every 4 steps) so LLM doesn't forget state
+        if "memory" in tool_names and (step == 1 or step % 4 == 0):
+            try:
+                self.last_memory = self._extract_result(await client.call_tool("memory", {}))
+                self._update_score(self.last_memory)
+            except Exception:
+                pass
+        # Map: occasionally (every 6 steps), and also if we moved a lot recently
+        if "get_map" in tool_names and (step % 6 == 0):
+            try:
+                self.last_map = self._extract_result(await client.call_tool("get_map", {}))
+            except Exception:
+                pass
+        # Inventory: occasionally (every 10 steps)
+        if "inventory" in tool_names and (step == 1 or step % 10 == 0):
+            try:
+                self.last_inventory = self._extract_result(await client.call_tool("inventory", {}))
+            except Exception:
+                pass
+    def _build_prompt(self) -> str:
         """
+        Build prompt that is aligned with your MCP server:
+        - memory() has STATE/RECENT/OBSERVATION
+        - get_map() starts with MAP
+        - inventory() starts with INVENTORY
         """
+        parts = []
+        parts.append(f"Current best-known score: {self.score}")
+        # Give the model your server-side memory snapshot (truncate to keep prompt lean)
+        if self.last_memory:
+            mem = self._truncate(self.last_memory, 1200)
+            parts.append("\n=== MEMORY (from MCP server) ===\n" + mem)
+        if self.last_inventory:
+            inv = self._truncate(self.last_inventory, 400)
+            parts.append("\n=== INVENTORY (from MCP server) ===\n" + inv)
+        if self.last_map:
+            mp = self._truncate(self.last_map, 700)
+            parts.append("\n=== MAP (from MCP server) ===\n" + mp)
+        # Recent local history (anti-loop)
+        if self.history:
+            parts.append("\n=== RECENT LOCAL ACTIONS (agent) ===")
+            for entry in self.history[-3:]:
+                action = entry.get("args", {}).get("action", entry["tool"])
+                result_short = entry["result"][:100] + "..." if len(entry["result"]) > 100 else entry["result"]
+                parts.append(f"  > {action} -> {result_short}")
+            if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
+                parts.append(f"\n[WARNING: repeated '{self.recent_actions[-1]}'. Choose a different action.]")
+        # Always include the most recent raw observation
+        parts.append("\n=== LATEST OBSERVATION (play_action) ===\n" + self._truncate(self.last_observation, 900))
+        parts.append("\nWhat do you do next?")
+        return "\n".join(parts)
+    def _truncate(self, text: str, limit: int) -> str:
+        text = text or ""
+        if len(text) <= limit:
+            return text
+        return text[:limit] + "\n...[truncated]"
+    def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
+        thought = "No reasoning provided"
+        tool_name = "play_action"
+        tool_args = {"action": "look"}
+        lines = response.strip().split("\n")
+        for line in lines:
+            line_clean = line.strip()
+            line_upper = line_clean.upper()
+            if line_upper.startswith("THOUGHT:"):
+                thought = line_clean.split(":", 1)[1].strip()
+            elif line_upper.startswith("TOOL:"):
+                raw_tool = line_clean.split(":", 1)[1].strip().lower()
+                raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
+                raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
+                tool_name = raw_tool
+            elif line_upper.startswith("ARGS:"):
+                args_part = line_clean.split(":", 1)[1].strip()
+                if not args_part:
+                    tool_args = {}
+                    continue
+                try:
+                    args_part = args_part.replace("'", '"')
+                    tool_args = json.loads(args_part)
+                except json.JSONDecodeError:
+                    match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
+                    if match:
+                        tool_args = {"action": match.group(1)}
+                    else:
+                        tool_args = {"action": "look"}
+        return thought, tool_name, tool_args
+    def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
+        if tool_name not in valid_tools:
+            if tool_name in ["action", "do", "command"]:
+                tool_name = "play_action"
+            elif tool_name in ["map", "location"]:
+                tool_name = "get_map"
+            elif tool_name in ["mem", "state", "status"]:
+                tool_name = "memory"
+            elif tool_name in ["inv", "items"]:
+                tool_name = "inventory"
+            else:
+                tool_name = "play_action"
+        if tool_name == "play_action":
+            action = tool_args.get("action", "look")
+            invalid_verb_map = {
+                "check": "examine",
+                "inspect": "examine",
+                "search": "look",
+                "grab": "take",
+                "pick": "take",
+                "use": "examine",
+                "investigate": "examine",
+            }
+            words = action.lower().split()
+            if words and words[0] in invalid_verb_map:
+                words[0] = invalid_verb_map[words[0]]
+                action = " ".join(words)
+            action = action.lower().strip()
+            action = action.replace("**", "").replace("*", "").replace("`", "")
+            action = " ".join(action.split())
+            tool_args["action"] = action
+        return tool_name, tool_args
+    def _extract_result(self, result) -> str:
+        if hasattr(result, 'content') and result.content:
+            return result.content[0].text
+        if isinstance(result, list) and result:
+            return result[0].text if hasattr(result[0], 'text') else str(result[0])
+        return str(result)
+    def _update_score(self, text: str) -> None:
+        patterns = [
+            r'\[Score:\s*(\d+)',
+            r'Score:\s*(\d+)\b',
+        ]
+        for pattern in patterns:
+            match = re.search(pattern, text, re.IGNORECASE)
+            if match:
+                self.score = max(self.score, int(match.group(1)))
+    def _is_game_over(self, text: str) -> bool:
+        game_over_phrases = [
+            "game over",
+            "you have died",
+            "you are dead",
+            "*** you have died ***",
+        ]
+        text_lower = (text or "").lower()
+        return any(phrase in text_lower for phrase in game_over_phrases)
 # =============================================================================
+# Local Testing
 # =============================================================================
 async def test_agent():
     from fastmcp import Client
     agent = StudentAgent()
+    async with Client("mcp_server.py") as client:
         result = await agent.run(
             client=client,
             game="zork1",
+            max_steps=20,
             seed=42,
             verbose=True,
         )
+        print(f"\n{'=' * 50}")
+        print(f"Final Score: {result.final_score}")
         print(f"Moves: {result.moves}")
+        print(f"Locations: {len(result.locations_visited)}")
 if __name__ == "__main__":

mcp_server.py CHANGED Viewed

@@ -45,53 +45,338 @@ mcp = FastMCP("Student Text Adventure Server")
 # Game State Management
 # =============================================================================
 class GameManager:
     """
     Manages the text adventure game state.
-    TODO: Extend this class to track:
     - Action history (for memory tool)
     - Explored locations (for mapping)
     - Current score and moves
     """
     def __init__(self):
         self.env: TextAdventureEnv = None
         self.state = None
         self.game_name: str = ""
-        # TODO: Add more state tracking
-        # self.history: list[tuple[str, str]] = []
-        # self.explored_locations: dict[str, set[str]] = {}
-        # self.current_location: str = ""
     def initialize(self, game: str = "zork1"):
         """Initialize or reset the game."""
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
-        # TODO: Reset your state tracking here
         return self.state.observation
     def step(self, action: str) -> str:
         """Execute an action and return the result."""
         if self.env is None:
             self.initialize()
         self.state = self.env.step(action)
-        # TODO: Update your state tracking here
-        # self.history.append((action, self.state.observation))
-        # Update location tracking, etc.
-        return self.state.observation
     def get_score(self) -> int:
         """Get current score."""
         return self.state.score if self.state else 0
     def get_moves(self) -> int:
         """Get number of moves taken."""
         return self.state.moves if self.state else 0
 # Global game manager
@@ -136,11 +421,37 @@ def play_action(action: str) -> str:
     # TODO: You might want to include score changes in the response
     result = game.step(action)
     # Optional: Append score info
     # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
-    return result
 # TODO: Implement additional tools to help your agent

 # Game State Management
 # =============================================================================
+import re
+from typing import Optional
 class GameManager:
     """
     Manages the text adventure game state.
+    Extended tracking:
     - Action history (for memory tool)
     - Explored locations (for mapping)
     - Current score and moves
+    - Current location (best-effort, robust across games)
     """
+    # Lines that are often NOT room titles across many IF games
+    _HEADER_LIKE_PATTERNS = [
+        r"^\s*score\s*[:=]\s*\d+",
+        r"^\s*moves?\s*[:=]\s*\d+",
+        r"^\s*turns?\s*[:=]\s*\d+",
+        r"^\s*time\s*[:=]\s*",
+        r"^\s*health\s*[:=]\s*\d+",
+        r"^\s*location\s*[:=]\s*",
+        r"^\s*\[.*\]\s*$",            # bracket-only status lines
+        r"^\s*\(.*\)\s*$",            # parenthetical-only lines
+        r"^\s*you\s+(are|see|can)\b",  # narrative sentence starters
+    ]
+    # Movement commands we consider for mapping (Zork-style + abbreviations)
+    _MOVE_CMDS = {
+        "north", "south", "east", "west", "up", "down", "enter", "exit",
+        "n", "s", "e", "w", "u", "d"
+    }
+    # Common failure phrases when trying to move (best-effort, not perfect)
+    _MOVE_FAIL_PHRASES = [
+        "you can't go", "you cannot go", "can't go that way", "cannot go that way",
+        "you can't go that way", "you cannot go that way",
+        "you can't", "you cannot",
+        "there is no way", "you can't see any way", "you see no way",
+        "blocked", "closed", "won't open", "is locked", "locked",
+        "too dark", "pitch black"
+    ]
+    def _is_movement_action(self, action: str) -> bool:
+        """Return True if this action is a movement command we track."""
+        a = (action or "").strip().lower()
+        return a in self._MOVE_CMDS
+    def _move_likely_succeeded(self, old_loc: str, new_loc: str, observation: str) -> bool:
+        """
+        Decide whether a move likely succeeded.
+        Strong signal: location label changed.
+        Negative signal: failure phrases in observation.
+        """
+        if new_loc and old_loc and new_loc != old_loc:
+            return True
+        text = (observation or "").lower()
+        if any(phrase in text for phrase in self._MOVE_FAIL_PHRASES):
+            return False
+        # If location didn't change and no clear failure phrase, treat as "not sure" → don't add edge
+        return False
+    def _update_map(self, action: str, old_loc: str, new_loc: str) -> None:
+        """Record a directed edge old_loc --action--> new_loc in explored_locations."""
+        if not old_loc or not new_loc:
+            return
+        self.explored_locations.setdefault(old_loc, set()).add(f"{action} -> {new_loc}")
     def __init__(self):
         self.env: TextAdventureEnv = None
         self.state = None
         self.game_name: str = ""
+        # Tracking for agent-support tools
+        self.history: list[tuple[str, str]] = []
+        self.explored_locations: dict[str, set[str]] = {}
+        self.current_location: str = "Unknown"
     def initialize(self, game: str = "zork1"):
         """Initialize or reset the game."""
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
+        # Reset tracking
+        self.history = []
+        self.explored_locations = {}
+        self.current_location = self._extract_location(self.state.observation, fallback="Unknown")
         return self.state.observation
+    def _extract_location(self, observation: str, fallback: Optional[str] = None) -> str:
+        """
+        Best-effort location extraction from the observation text.
+        Strategy:
+        1) Split into lines, skip empties
+        2) Skip lines that look like status bars / headers / pure brackets
+        3) Prefer a short, title-like line (room name)
+        4) If nothing confident, return fallback (usually previous location)
+        """
+        if not observation:
+            return fallback or "Unknown"
+        lines = [ln.strip() for ln in observation.splitlines() if ln.strip()]
+        if not lines:
+            return fallback or "Unknown"
+        header_res = [re.compile(pat, re.IGNORECASE) for pat in self._HEADER_LIKE_PATTERNS]
+        def looks_like_header(line: str) -> bool:
+            return any(rx.search(line) for rx in header_res)
+        def looks_like_title(line: str) -> bool:
+            # Many room titles are short and not ending with punctuation.
+            if len(line) > 60:
+                return False
+            if line.endswith((".", "!", "?", ";", ":")):
+                return False
+            # Too many digits usually means a status line.
+            if sum(ch.isdigit() for ch in line) >= 3:
+                return False
+            return True
+        # First pass: first "title-like" line that isn't header-like
+        for line in lines[:8]:  # only inspect top chunk; titles are usually early
+            if looks_like_header(line):
+                continue
+            if looks_like_title(line):
+                return line
+        # Second pass: first non-header line
+        for line in lines[:8]:
+            if not looks_like_header(line):
+                return line
+        return fallback or "Unknown"
     def step(self, action: str) -> str:
         """Execute an action and return the result."""
         if self.env is None:
             self.initialize()
+        # Save old location before action
+        old_location = self.current_location
+        # Apply action to the real game
         self.state = self.env.step(action)
+        obs = self.state.observation
+        # Track history (keep last 50)
+        self.history.append((action, obs))
+        if len(self.history) > 50:
+            self.history = self.history[-50:]
+        # Extract new location (fallback to old)
+        new_location = self._extract_location(obs, fallback=old_location)
+        # Update map only if it was a movement attempt AND it likely succeeded
+        action_norm = (action or "").strip().lower()
+        if self._is_movement_action(action_norm) and self._move_likely_succeeded(old_location, new_location, obs):
+            self._update_map(action_norm, old_location, new_location)
+        # Finally update current location
+        self.current_location = new_location
+        return obs
     def get_score(self) -> int:
         """Get current score."""
         return self.state.score if self.state else 0
     def get_moves(self) -> int:
         """Get number of moves taken."""
         return self.state.moves if self.state else 0
+    def _extract_facts(self, observation: str) -> dict:
+        """
+        Best-effort extraction of useful 'facts' from the current observation text.
+        This is intentionally heuristic so it can work across many games.
+        """
+        obs = observation or ""
+        text = obs.strip()
+        lower = text.lower()
+        # --- Exits mentioned (simple direction scan) ---
+        directions = ["north", "south", "east", "west", "up", "down", "in", "out"]
+        exits_found = []
+        for d in directions:
+            # We detect directions as whole words to reduce false matches
+            if re.search(rf"\b{re.escape(d)}\b", lower):
+                exits_found.append(d)
+        exits_found = sorted(set(exits_found))
+        # --- Visible things (very light heuristics) ---
+        # We look for common IF patterns like "You see ... here." / "There is ... here."
+        visible_candidates: list[str] = []
+        patterns = [
+            r"you see (.+?) here\.",
+            r"you can see (.+?) here\.",
+            r"there is (.+?) here\.",
+            r"there are (.+?) here\.",
+            r"you notice (.+?)\.",
+        ]
+        for pat in patterns:
+            for m in re.finditer(pat, lower):
+                chunk = m.group(1).strip()
+                if chunk:
+                    visible_candidates.append(chunk)
+        # Clean visible candidates a bit (split simple lists, avoid huge strings)
+        visible = []
+        for chunk in visible_candidates:
+            # Split on commas and "and" to get smaller pieces
+            parts = re.split(r",|\band\b", chunk)
+            for p in parts:
+                item = p.strip(" .;:!?\t")
+                if 1 <= len(item) <= 40:
+                    visible.append(item)
+        # Deduplicate and limit (so memory stays compact)
+        visible = sorted(set(visible))[:10]
+        return {
+            "exits_mentioned": exits_found,
+            "visible": visible,
+        }
+    def get_memory(self) -> str:
+        """
+        LLM-friendly summary of current game state.
+        Format: Facts first, then recent actions, then the raw observation.
+        """
+        game = self.game_name or "Unknown"
+        location = self.current_location or "Unknown"
+        score = self.get_score()
+        moves = self.get_moves()
+        # Recent actions (keep short and anti-loop)
+        recent = self.history[-5:] if self.history else []
+        if recent:
+            recent_lines = []
+            for a, r in recent:
+                snippet = (r or "").replace("\n", " ").strip()
+                if len(snippet) > 80:
+                    snippet = snippet[:80] + "..."
+                recent_lines.append(f"- {a} -> {snippet}")
+            recent_str = "\n".join(recent_lines)
+        else:
+            recent_str = "(none yet)"
+        # Facts extracted from current observation
+        obs = self.state.observation if self.state else ""
+        facts = self._extract_facts(obs)
+        exits_txt = ", ".join(facts["exits_mentioned"]) if facts["exits_mentioned"] else "(none detected)"
+        visible_txt = ", ".join(facts["visible"]) if facts["visible"] else "(none detected)"
+        return (
+            "STATE\n"
+            f"Game: {game}\n"
+            f"Location: {location}\n"
+            f"Score: {score}   Moves: {moves}\n"
+            f"Visible (best effort): {visible_txt}\n"
+            f"Exits mentioned (best effort): {exits_txt}\n"
+            "\n"
+            "RECENT\n"
+            f"{recent_str}\n"
+            "\n"
+            "OBSERVATION\n"
+            f"{obs}"
+        )
+    def get_map(self) -> str:
+        """
+        Return a readable map of explored locations.
+        Uses explored_locations built during movement actions.
+        Output is stable + compact for LLM use.
+        """
+        if not self.explored_locations:
+            return "MAP\n(no locations recorded yet — try moving with north/south/east/west/etc.)"
+        lines = ["MAP", "Explored locations and exits:"]
+        for loc in sorted(self.explored_locations.keys()):
+            exits = sorted(self.explored_locations[loc])
+            lines.append(f"\n* {loc}")
+            for e in exits:
+                lines.append(f"  - {e}")
+        lines.append(f"\n[Current] {self.current_location}")
+        return "\n".join(lines)
+    def get_inventory(self) -> str:
+        """
+        Return inventory in a robust way across different games/envs.
+        Strategy:
+        1) If state.inventory exists and is non-empty -> format it
+        2) Otherwise, fall back to issuing the command "inventory"
+           through the environment and return that observation
+        """
+        # 1) Try structured inventory if provided by env
+        items = []
+        if self.state is not None and hasattr(self.state, "inventory"):
+            inv = getattr(self.state, "inventory")
+            if inv:
+                # Normalize to strings
+                try:
+                    items = [str(x).strip() for x in inv if str(x).strip()]
+                except Exception:
+                    items = []
+        if items:
+            # Keep it simple and safe: just join a cleaned list
+            # (Avoid overly aggressive parsing that breaks across games)
+            items = sorted(set(items))
+            return "INVENTORY\n" + ", ".join(items)
+        # 2) Fallback: ask the game directly (does NOT change inventory, just prints it)
+        # NOTE: We do not want to record this as agent history/map; this is a server-side query.
+        if self.env is None:
+            self.initialize()
+        try:
+            tmp_state = self.env.step("inventory")
+            inv_text = tmp_state.observation if tmp_state else "Inventory: (no response)"
+        except Exception:
+            inv_text = "Inventory: (unable to retrieve)"
+        return "INVENTORY\n" + inv_text.strip()
 # Global game manager
     # TODO: You might want to include score changes in the response
     result = game.step(action)
+    # Append score/moves for clearer feedback (LLM-friendly, low noise)
+    result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
+    return result
     # Optional: Append score info
     # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
+@mcp.tool()
+def memory() -> str:
+    """
+    Return an LLM-friendly summary of the current game state.
+    """
+    game = get_game()
+    return game.get_memory()
+@mcp.tool()
+def get_map() -> str:
+    """
+    Return a map of explored locations and recorded exits.
+    """
+    game = get_game()
+    return game.get_map()
+@mcp.tool()
+def inventory() -> str:
+    """
+    Return the player's inventory in a robust way.
+    """
+    game = get_game()
+    return game.get_inventory()
 # TODO: Implement additional tools to help your agent