text-adventure-template

Sleeping

App Files Files Community

Chloé Court commited on Feb 22

Commit

9c0dbe0

1 Parent(s): 7a36b3c

Submission

Browse files

Files changed (5) hide show

README.md +61 -7
agent.py +597 -210
mcp_server.py +192 -119
requirements.txt +15 -7
utils.py +42 -0

README.md CHANGED Viewed

@@ -10,19 +10,72 @@ pinned: false
 license: mit
 ---
-# Text Adventure Agent Submission
 ## Overview
-This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
-## Approach
-<!-- Describe your approach here -->
-- What strategy does your agent use?
-- What tools did you implement in your MCP server?
-- Any interesting techniques or optimizations?
 ## Files
@@ -30,6 +83,7 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
 |------|-------------|
 | `agent.py` | ReAct agent with `StudentAgent` class |
 | `mcp_server.py` | MCP server with game interaction tools |
 | `app.py` | Gradio interface for HF Space |
 | `requirements.txt` | Additional dependencies |

 license: mit
 ---
+# Autonomous Text Adventure Agent
 ## Overview
+This project implements an autonomous text adventure agent designed to master parser-based interactive fiction (e.g., *Zork*). Unlike simple scripted bots, this agent utilizes a **ReAct-style reasoning loop** paired with an **MCP (Model Context Protocol) server** to manage structured memory and strategic planning.
+### Primary Objectives
+* **Systematic Exploration:** Map and traverse complex game worlds.
+* **Logic Puzzle Solving:** Interact with objects to unlock progression.
+* **Loop Prevention:** Identify and break repetitive cycles or stagnant states.
+* **State Consistency:** Maintain an accurate, persistent mental model of the world.
+* **Efficiency:** Maximize the game score while minimizing unnecessary moves.
+---
+## Core Architecture
+The agent operates on a three-layer decision model that ensures every action is grounded in observation and strategic intent.
+1.  **Observation Input:** Raw text from the game engine is parsed.
+2.  **Planner & Memory Update:** The LLM updates the cumulative world state.
+3.  **Tool Selection:** Reasoning logic picks the best tool/action based on policy constraints.
+4.  **Environment Interaction:** The command is executed via the MCP interface.
+---
+## Structured Memory System
+The agent treats each location as an independent world substate. Memory is **incremental**, meaning it evolves with the agent's discoveries rather than being wiped.
+### Location Memory Schema
+For every discovered room, the agent tracks:
+* **Objects:** Visible and interactable items.
+* **Action History:** Commands already attempted and their results.
+* **Topology:** Explored vs. unexplored directions.
+* **Context:** Cumulative summaries and strategic hints.
+> **Key Principle:** Preserve previously known facts unless an observation explicitly contradicts them (e.g., "The door is now open").
+---
+## Anti-Loop & Stagnation Policy
+To prevent getting "stuck," the agent follows strict rules:
+* **No Oscillation:** Tools cannot be toggled more than twice consecutively.
+* **Action Blacklisting:** Actions that have already been done are logged and avoided until the environment state changes.
+* **Stagnation Escape:** If progress halts, the agent is forced to switch interaction verbs or backtrack to the "least recently visited" area.
+---
+## MCP Tool Interface
+The agent interacts with the game through a standardized toolset:
+* `play_action`: Executes commands (e.g., "north", "take lamp").
+* `memory`: Retrieves the structured world state.
+* `inventory`: Lists currently held items.
+* `get_map`: Visualizes explored connections for navigation.
+* `get_valid_actions`: Filters plausible commands to reduce hallucinations.
+---
+## Performance Metrics
+Progress is measured by an **Efficiency Ratio**:
+$$Efficiency = \frac{Score}{\max(1, Moves)}$$
+The agent also tracks unique object discoveries and the total percentage of the map explored.
+---
 ## Files
 |------|-------------|
 | `agent.py` | ReAct agent with `StudentAgent` class |
 | `mcp_server.py` | MCP server with game interaction tools |
+| `utils.py` | Useful shared functions |
 | `app.py` | Gradio interface for HF Space |
 | `requirements.txt` | Additional dependencies |

agent.py CHANGED Viewed

@@ -24,256 +24,643 @@ Tips:
 """
 import json
-import os
 import re
 from dataclasses import dataclass, field
 from typing import Optional
-from dotenv import load_dotenv
-from huggingface_hub import InferenceClient
-# Load environment variables
-load_dotenv()
 # =============================================================================
-# LLM Configuration - DO NOT MODIFY
 # =============================================================================
-# Model to use (fixed for fair evaluation)
-LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
-# Initialize the LLM client (uses HF_TOKEN from environment)
-_hf_token = os.getenv("HF_TOKEN")
-if not _hf_token:
-    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
-LLM_CLIENT = InferenceClient(token=_hf_token)
-def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
-    """
-    Call the LLM with the given prompt. Use this function in your agent.
-    Args:
-        prompt: The user prompt (current game state, history, etc.)
-        system_prompt: The system prompt (instructions for the agent)
-        seed: Random seed for reproducibility
-        max_tokens: Maximum tokens in response (default: 300)
-    Returns:
-        The LLM's response text
-    Example:
-        response = call_llm(
-            prompt="You are in a forest. What do you do?",
-            system_prompt=SYSTEM_PROMPT,
-            seed=42,
-        )
-    """
-    messages = [
-        {"role": "system", "content": system_prompt},
-        {"role": "user", "content": prompt},
-    ]
-    response = LLM_CLIENT.chat.completions.create(
-        model=LLM_MODEL,
-        messages=messages,
-        temperature=0.0,  # Deterministic for reproducibility
-        max_tokens=max_tokens,
-        seed=seed,
-    )
-    return response.choices[0].message.content
 @dataclass
 class RunResult:
-    """Result of running the agent. Do not modify this class."""
     final_score: int
     max_score: int
     moves: int
     locations_visited: set[str]
     game_completed: bool
     error: Optional[str] = None
-    history: list[tuple[str, str, str]] = field(default_factory=list)
 # =============================================================================
-# System Prompt - Customize this for your agent
 # =============================================================================
-SYSTEM_PROMPT = """You are playing a classic text adventure game.
-GOAL: Explore the world, solve puzzles, and maximize your score.
-AVAILABLE TOOLS (use via MCP):
-- play_action: Execute a game command (north, take lamp, open mailbox, etc.)
-- memory: Get current game state and history (if implemented)
-- inventory: Check what you're carrying (if implemented)
-VALID GAME COMMANDS for play_action:
-- Movement: north, south, east, west, up, down, enter, exit
-- Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
-- Other: look, inventory, read <thing>, turn on lamp
-RESPOND IN THIS EXACT FORMAT (no markdown):
-THOUGHT: <your reasoning about what to do next>
 TOOL: <tool_name>
-ARGS: <JSON arguments, e.g., {"action": "look"}>
-Example:
-THOUGHT: I should look around to see where I am.
-TOOL: play_action
-ARGS: {"action": "look"}
 """
 # =============================================================================
-# Student Agent - IMPLEMENT THIS CLASS
 # =============================================================================
 class StudentAgent:
-    """
-    Your ReAct agent implementation.
-    TODO:
-    1. Implement the run() method with the ReAct loop
-    2. Parse LLM responses to extract tool calls
-    3. Track state and avoid loops
-    Use the provided call_llm() function to interact with the LLM.
-    """
     def __init__(self):
-        """Initialize your agent here."""
-        # TODO: Initialize any state tracking you need
-        # self.history = []
-        # self.visited_locations = set()
-        pass
-    async def run(
-        self,
-        client,  # FastMCP Client connected to your MCP server
-        game: str,
-        max_steps: int,
-        seed: int,
-        verbose: bool = False,
-    ) -> RunResult:
-        """
-        Run the agent for a game session.
-        Args:
-            client: FastMCP Client connected to your MCP server
-            game: Name of the game being played (e.g., "zork1")
-            max_steps: Maximum number of steps to take
-            seed: Random seed for reproducibility (use for LLM calls)
-            verbose: Whether to print detailed output
-        Returns:
-            RunResult with final score and statistics
-        """
-        # TODO: Implement your ReAct loop here
-        #
-        # Basic structure:
-        # 1. Get initial observation (call play_action with "look")
-        # 2. Loop for max_steps:
-        #    a. Build prompt with current observation and history
-        #    b. Call LLM to get thought and action
-        #    c. Parse the response to extract tool and args
-        #    d. Call the tool via client.call_tool(tool_name, args)
-        #    e. Update history and state
-        #    f. Check for game over
-        # 3. Return RunResult with final statistics
-        # Example of calling a tool:
-        # result = await client.call_tool("play_action", {"action": "look"})
-        # observation = result[0].text if result else "No response"
-        # Example of calling the LLM:
-        # response = call_llm(
-        #     prompt="Current observation: " + observation,
-        #     system_prompt=SYSTEM_PROMPT,
-        #     seed=seed,
-        # )
-        # Placeholder implementation - replace with your code
-        locations_visited = set()
         history = []
-        final_score = 0
-        moves = 0
-        # TODO: Your implementation here
-        # ...
         return RunResult(
-            final_score=final_score,
-            max_score=350,  # Zork1 max score, adjust if needed
             moves=moves,
-            locations_visited=locations_visited,
-            game_completed=False,
             history=history,
         )
-    def _build_prompt(self, observation: str, history: list) -> str:
-        """
-        Build the prompt for the LLM.
-        TODO: Implement this to create effective prompts
-        """
-        # TODO: Combine system prompt, history, and current observation
-        pass
-    def _parse_response(self, response: str) -> tuple[str, str, dict]:
-        """
-        Parse LLM response to extract thought, tool name, and arguments.
-        TODO: Implement robust parsing
-        Returns:
-            Tuple of (thought, tool_name, args_dict)
-        """
-        # TODO: Parse the response format:
-        # THOUGHT: ...
-        # TOOL: ...
-        # ARGS: {...}
-        pass
-    def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
         """
-        Call the LLM with the given prompt.
-        This is a convenience wrapper - you can also use call_llm() directly.
         """
-        return call_llm(prompt, system_prompt, seed)
-# =============================================================================
-# For local testing
-# =============================================================================
-async def test_agent():
-    """Test the agent locally."""
-    from fastmcp import Client
-    # Path to your MCP server
-    server_path = "mcp_server.py"
-    agent = StudentAgent()
-    async with Client(server_path) as client:
-        result = await agent.run(
-            client=client,
-            game="zork1",
-            max_steps=10,
-            seed=42,
-            verbose=True,
-        )
-        print(f"\nFinal Score: {result.final_score}")
-        print(f"Moves: {result.moves}")
-        print(f"Locations: {result.locations_visited}")
-if __name__ == "__main__":
-    import asyncio
-    asyncio.run(test_agent())

 """
 import json
 import re
 from dataclasses import dataclass, field
 from typing import Optional
+from utils import call_llm, extract_location, is_new_location
 # =============================================================================
+# LLM Configuration
 # =============================================================================
 @dataclass
 class RunResult:
     final_score: int
     max_score: int
     moves: int
     locations_visited: set[str]
     game_completed: bool
+    unique_objects: int = 0
+    puzzles_solved: int = 0
+    efficiency: float = 0.0
     error: Optional[str] = None
+    history: list[dict] = field(default_factory=list)
 # =============================================================================
+# System Prompt
 # =============================================================================
+SYSTEM_PROMPT = """
+You are an expert text adventure game player. Your objective is to explore efficiently, collect treasures, solve puzzles, and maximize your score.
+**Random movement is forbidden.** Always plan actions using context and memory.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+AVAILABLE TOOLS (exactly ONE per step)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+1. memory           - Check current state, items, objects, locations, and past actions.
+2. play_action      - Execute a game command.
+3. get_map          - Return to a previously visited location or get a map of explored areas.
+4. inventory        - Check current inventory.
+5. get_valid_actions - Get likely valid actions from the current location.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+TOOL PRIORITY RULE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Choose tool in this order:
+1. If local puzzle interaction is possible → play_action
+2. If interactable object is visible → play_action
+3. If inventory contains potentially useful item → inventory
+4. If location understanding is uncertain → memory
+5. If planning navigation to solve puzzle → get_map
+6. Exploration of world → play_action movement
+**CRITICAL:**
+- Do NOT use any tool other than play_action more than 2 times in a row.
+- **DO NOT repeat an action that has already been attempted in the current location, unless the state clearly changed and it is necessary.**
+━━━━━━━━━━━━━━━━━━━━━━━━━━━���━━━
+VALID GAME COMMANDS for play_action
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Movement:
+north, south, east, west, up, down, enter, exit
+Objects:
+take <item>, drop <item>, open <thing>, close <thing>, examine <thing>,
+push <thing>, pull <thing>, move <thing>, lift <thing>, turn <thing>, press <thing>
+Light:
+turn on lamp, turn off lamp
+Combat:
+attack <enemy> with <weapon>
+Other:
+inventory, look, read <thing>, wait
+Forbidden:
+check, inspect, search, grab, use, help
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STRATEGIC RULES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+• **Avoid repeating actions:**
+  - **NEVER** repeat an action that has already been attempted in the current location.
+  - If an action failed or produced no progress, **do not try it again** in the same context.
+  - Track failed actions per location to avoid loops.
+• Before leaving a location:
+    - Collect all useful items.
+    - Interact with all interesting objects (push/pull/move/lift/open) if "examine" yields nothing.
+    - Solve local puzzles before moving away.
+    - Check if there are valid actions related to visible objects or inventory items that haven't been tried yet.
+• **Systematic exploration > random movement.**
+• Avoid overusing "examine": if it yields nothing, try physical interactions (push/pull/move/lift/open/turn/press).
+• If the previous observation indicates a failed action, **avoid that action and similar ones** in the future.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+ANTI-REPETITION RULE (CRITICAL)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+**STRICT POLICY:**
+1. **Track all attempted actions per location** in memory.
+2. **Never repeat an action** that has already been tried in the current location.
+3. If an action fails (e.g., "The door is locked"), **do not attempt it again** unless new context suggests it might now work (e.g., you found a key).
+4. If no progress is made after 3 actions, **change strategy** (e.g., try a different object or direction).
+**Example:**
+- If "open door" fails, **do not try it again** unless you acquire a key or new information.
+- If "examine table" yields "nothing special," **try physical interactions** (push/pull/move) instead of repeating "examine."
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+INTERACTION STRATEGY
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+When you see an object:
+1. If it is a container → try **open** (only once).
+2. If large/fixed → try **move**, **push**, **pull**, or **lift** (only once each).
+3. If "examine" gives no useful info → try **one** physical interaction (e.g., turn/press).
+4. If enterable → try **enter** (only once).
+5. **Never repeat the same interaction** on the same object in the same location.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+EXPLORATION RULE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+- If no immediate objectives:
+  - Explore **unexplored directions systematically**.
+  - Prefer directions **not previously taken** from this location.
+- **Do not wander randomly**: Always have a reason for movement (e.g., "The path east was not explored yet").
+- Use **get_map** only to return to a location with unsolved puzzles or uncollected items.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+RESPONSE FORMAT (STRICT — NO MARKDOWN)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+THOUGHT: <brief reasoning referencing memory, map, or inventory if applicable>
 TOOL: <tool_name>
+ARGS: <JSON arguments>
 """
 # =============================================================================
+# StudentAgent
 # =============================================================================
 class StudentAgent:
     def __init__(self):
+        self.history = []
+        self.current_location = None
+        self.score = 0
+        self.recent_actions = []
+        self.last_tool = None
+        # structured memory
+        self.locations = {}
+    # =======================================
+    # Run
+    # =======================================
+    async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = False):
         history = []
+        tools = await client.list_tools()
+        tool_names = [t.name for t in tools]
+        # ---------------------------------
+        # Initial observation
+        # ---------------------------------
+        tool_name, tool_args = "play_action", {"action": "look"}
+        self.last_tool = tool_name
+        result = await client.call_tool(tool_name, tool_args)
+        observation = self._extract_result(result)
+        # Detect starting location
+        self.current_location = extract_location(observation)
+        # Initialize location memory
+        self.locations[self.current_location] = {
+            "objects_seen": set(),
+            "actions_done": set(),
+            "directions_explored": set(),
+            "promising_hints": set(),
+            "memory": observation,
+            "observations_seen": set(),
+            "valid_actions": set()
+        }
+        self.locations[self.current_location]["observations_seen"].add(observation)
+        # Fetch valid actions
+        valid_actions = await client.call_tool("get_valid_actions", {})
+        parsed = self._extract_result(valid_actions)
+        self.locations[self.current_location]["valid_actions"] = set(
+            a.strip() for a in parsed.split(",") if a.strip()
+        )
+        if verbose:
+            print(observation)
+        # =====================================
+        # MAIN LOOP
+        # =====================================
+        for step in range(1, max_steps + 1):
+            # -------------------------
+            # Location detection
+            # -------------------------
+            try:
+                if is_new_location(observation, set(self.locations.keys()), self.last_tool):
+                    new_location = extract_location(observation)
+                    self.locations[self.current_location]["directions_explored"].add(
+                        ("look", new_location)
+                    )
+                    self.current_location = new_location
+                    if new_location not in self.locations.keys():
+                        self.locations[new_location] = {
+                            "objects_seen": set(),
+                            "actions_done": set(),
+                            "directions_explored": set(),
+                            "promising_hints": set(),
+                            "memory": observation,
+                            "observations_seen": set(),
+                            "valid_actions": set(),
+                        }
+                # Fetch valid actions on entering location
+                try:
+                    valid_actions = await client.call_tool(
+                        "get_valid_actions",
+                        {}
+                    )
+                    parsed = self._extract_result(valid_actions)
+                    self.locations[self.current_location]["valid_actions"] = set(
+                        a.strip() for a in parsed.split(",") if a.strip()
+                    )
+                except Exception:
+                    pass
+            except Exception:
+                pass
+            # Prevent tool oscillation
+            if len(self.history) >= 2:
+                actions = ["memory", "get_map", "inventory"]
+                # avoid using one of the non-play_action tools more than 2 times in a row
+                if any(self.last_tool == a for a in actions):
+                    # Force exploration action instead of map query
+                    self.forced_prompt_hint = "\nYou should choose play_action to explore instead of using the same tool again."
+                else:
+                    self.forced_prompt_hint = ""
+            # -------------------------
+            # LLM decision step (pre-call for memory, objects, actions)
+            # -------------------------
+            if self.last_tool == "play_action":
+                planner_data = await self._call_planner_llm(observation)
+                print(f"\n[PLANNER LLM RESPONSE]\n{planner_data}\n")
+                print(f"[VALID ACTIONS]\n{self.locations[self.current_location]['valid_actions']}\n")
+                # Update memory with LLM-generated data
+                self.locations[self.current_location]["memory"] = planner_data["memory"]
+                actions = set(planner_data["promising_hints"])
+                actions -= self.locations[self.current_location]["actions_done"]
+                self.locations[self.current_location]["promising_hints"] = list(actions)
+                objects_seen_before = self.locations[self.current_location]["objects_seen"]
+                self.locations[self.current_location]["objects_seen"].update(planner_data["objects_seen"])
+                if objects_seen_before != self.locations[self.current_location]["objects_seen"]:
+                    # Update valid actions
+                    valid_actions = await client.call_tool("get_valid_actions", {})
+                    parsed = self._extract_result(valid_actions)
+                    self.locations[self.current_location]["valid_actions"] = set(
+                        a.strip() for a in parsed.split(",") if a.strip()
+                    )
+            # -------------------------
+            # Build prompt for tool selection (without calling LLM again)
+            # -------------------------
+            prompt = self._build_prompt(observation)
+            # Call LLM ONLY for tool selection (not for memory/objects/actions)
+            response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
+            thought, tool_name, tool_args = self._parse_response(response)
+            tool_name, tool_args = self._validate_tool_call(
+                tool_name,
+                tool_args,
+                tool_names
+            )
+            self.last_tool = tool_name
+            if tool_name == "play_action":
+                self.locations[self.current_location]["actions_done"].add(tool_args.get("action", "look"))
+            if verbose:
+                print(f"\nStep {step}")
+                print(f"Location: {self.current_location}")
+                print(f"Thought: {thought}")
+                print(f"Tool: {tool_name}")
+                print(f"Args: {tool_args}")
+            # -------------------------
+            # Tool execution
+            # -------------------------
+            try:
+                result = await client.call_tool(tool_name, tool_args)
+                observation = self._extract_result(result)
+                self.locations[self.current_location]["observations_seen"].add(observation)
+            except Exception as e:
+                observation = str(e)
+            # -------------------------
+            # Score tracking
+            # -------------------------
+            self._update_score(observation)
+            self.history.append({
+                "step": step,
+                "thought": thought,
+                "tool": tool_name,
+                "args": tool_args,
+                "result": observation
+            })
+            history.append((thought, f"{tool_name}({tool_args})", observation))
+            if len(self.history) > 10:
+                self.history = self.history[-10:]
+            if verbose:
+                print(f"[RESULT] {observation}")
+                print(f"[SCORE] {self.score}")
+            if self._is_game_over(observation):
+                break
+        # =====================================
+        # Final result
+        # =====================================
+        moves = len(self.history)
+        efficiency = self.score / max(1, moves)
         return RunResult(
+            final_score=self.score,
+            max_score=350,
             moves=moves,
+            locations_visited=self.locations,
+            game_completed=self._is_game_over(observation),
+            efficiency=efficiency,
             history=history,
         )
+    async def _call_planner_llm(self, observation: str) -> dict:
         """
+        Call the LLM to:
+        1. Update the location memory.
+        2. Extract interactable objects from the observation.
+        3. Generate promising actions grounded in the observation.
         """
+        current_data = self.locations.get(self.current_location, {})
+        prompt = """
+            You are an expert text adventure agent. Your **only** goal is to maximize progress by:
+            - Solving puzzles (e.g., "use <object> on <thing>").
+            - Collecting useful items (e.g., "take <object>").
+            - Exploring new areas (e.g., "enter").
+            - Avoiding redundant or vague actions.
+            ---
+            ### CURRENT CONTEXT
+            **Location:**
+            {location}
+            **Current Observation:**
+            {observation}
+            **Current Memory of this Location:**
+            {memory}
+            ---
+            ### STRICT INSTRUCTIONS
+            Your task is to:
+            1. **Update the memory**.
+            2. **Extract interactable objects** (only explicitly mentioned in the observation).
+            3. **Generate ≤5 promising actions** (strictly grounded in the observation + valid actions).
+            ---
+            #### 1. LOCATION MEMORY UPDATE
+            You are maintaining a cumulative memory of this location.
+            Goal:
+            Update the existing location description by merging it with the new observation,
+            while ensuring that the final description reflects the CURRENT STATE of the location.
+            Rules:
+            1. Preserve all previously known environmental facts unless explicitly contradicted.
+            2. Add any new information from the new observation.
+            3. Remove facts that are clearly invalidated by the new observation.
+            4. If an object is taken, it is no longer present in the location.
+            5. If an object is dropped, it becomes present in the location.
+            6. If an object changes state (opened, closed, locked, unlocked, broken, etc.), replace the old state with the new one.
+            7. Only the CURRENT state of each object should appear in the final description.
+            8. Do not keep outdated state history (e.g., do not keep both "closed" and "opened").
+            9. Do NOT rewrite stylistically.
+            10. Do not duplicate information.
+            11. Keep it concise while preserving all relevant environmental details.
+            The final description must represent the current true state of the location,
+            not a history of past states.
+            #### 2. OBJECTS SEEN
+            List **only** objects that are:
+            - Explicitly mentioned in the observation.
+            - The objects should be clearly interactable (e.g., "a shiny key on the table" → "key", "a path" → not an object).
+            - Required for puzzle-solving.
+            Keep only the name of the object, without adjectives or extra description.
+            #### 3. **PROMISING HINTS**:
+            - Suggest **strategic hints** (not direct actions) that are strictly supported by the current observation and the valid actions for this location.
+            - Do not suggest actions already done in this location: {actions_done}.
+            - Do not suggest actions that do not seem possible (e.g., "take key" if the key is not mentioned in the observation, "open locked door").
+            - Hints must be directly supported by the current observation.
+            - Each hint should be a concise suggestion of what to try next, grounded in the current context (e.g., "The door is open, maybe you can enter it" → "try entering the door").
+            - Use the following action verbs if applicable: take, open, close, push, pull, move, lift, turn, press, enter, ... with the relevant object.
+            - Focus on:
+                * Potential puzzle solutions
+                * Object interactions
+                * Hidden opportunities
+            - Forbidden:
+                - Vague hints ("There might be something interesting")
+                - Repeats of already done actions
+                - Random movement without reason
+            - Movement rules:
+                - Do NOT suggest movement if there are still meaningful interactions available in the current location.
+                - If all useful local interactions have been exhausted, suggest exploring an unexplored direction.
+                - Prefer unexplored directions over previously visited ones.
+            ### OUTPUT FORMAT (STRICT JSON) with no markdown or explanations:
+            {{
+            "memory": "<updated_memory>",
+            "promising_hints": ["<hint1>", "<hint2>"],
+            "objects_seen": ["<object1>", "<object2>"]
+            }}
+            """.format(
+            observation=observation,
+            location=self.current_location,
+            memory=current_data.get("memory", ""),
+            actions_done=list(current_data.get("actions_done", set())),
+        )
+        response = call_llm(prompt=prompt, seed=42)
+        try:
+            data = json.loads(response)
+            json_data = {
+                "memory": data.get("memory", ""),
+                "promising_hints": data.get("promising_hints", []),
+                "objects_seen": data.get("objects_seen", [])
+            }
+            # remove promising actions that are already done
+            done_actions = self.locations[self.current_location].get("actions_done", set())
+            json_data["promising_hints"] = list(
+                set(json_data["promising_hints"]) - set(done_actions)
+            )
+            return json_data
+        except json.JSONDecodeError:
+            return {
+                "memory": "",
+                "promising_hints": [],
+                "objects_seen": []
+            }
+    def _build_prompt(self, observation: str) -> str:
+        """Build the prompt for the LLM, using pre-filled memory/objects/actions."""
+        current_location_data = self.locations.get(self.current_location, {})
+        prompt = f"""
+        OBSERVATION:
+        {observation}
+        LOCATION:
+        {self.current_location}
+        LOCATION MEMORY:
+        {current_location_data.get("memory", "None")}
+        OBJECTS_SEEN:
+        {list(current_location_data.get("objects_seen", set()))}
+        PROMISING_HINTS:
+        {", ".join(current_location_data.get("promising_hints", []))}
+        VALID_ACTIONS:
+        {list(current_location_data.get("valid_actions", set()))}
+        ACTIONS ALREADY DONE IN THIS LOCATION:
+        {list(current_location_data.get("actions_done", set()))}
+        AVOID REPEATING THESE ACTIONS.
+        HINT:
+        {self.forced_prompt_hint if hasattr(self, 'forced_prompt_hint') else ""}
+        """
+        return prompt
+    def _parse_response(self, response: str) -> tuple[str, str, dict]:
+        thought = "No reasoning provided"
+        tool_name = "play_action"
+        tool_args = {"action": "look"}
+        lines = response.strip().split("\n")
+        for line in lines:
+            line_clean = line.strip()
+            line_upper = line_clean.upper()
+            if line_upper.startswith("THOUGHT:"):
+                thought = line_clean.split(":", 1)[1].strip()
+            elif line_upper.startswith("TOOL:"):
+                raw_tool = line_clean.split(":", 1)[1].strip().lower()
+                raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
+                raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
+                tool_name = raw_tool
+            elif line_upper.startswith("ARGS:"):
+                args_part = line_clean.split(":", 1)[1].strip()
+                try:
+                    args_part = args_part.replace("'", '"')
+                    tool_args = json.loads(args_part)
+                except json.JSONDecodeError:
+                    match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
+                    if match:
+                        tool_args = {"action": match.group(1)}
+                    else:
+                        tool_args = {"action": "look"}
+        return thought, tool_name, tool_args
+    def _validate_tool_call(self, tool_name: str, tool_args: dict, valid_tools: list[str]) -> tuple[str, dict]:
+        """Robust tool call validator."""
+        # --------------------------------------------------
+        # Ensure tool_args is a dictionary (LLM can hallucinate)
+        # --------------------------------------------------
+        if not isinstance(tool_args, dict):
+            tool_args = {}
+        # --------------------------------------------------
+        # Normalize tool name
+        # --------------------------------------------------
+        tool_name = str(tool_name).lower().strip()
+        tool_alias_map = {
+            "action": "play_action",
+            "do": "play_action",
+            "command": "play_action",
+            "map": "get_map",
+            "location": "get_map",
+            "mem": "memory",
+            "state": "memory",
+            "status": "memory",
+            "inv": "inventory",
+            "items": "inventory",
+        }
+        if tool_name in tool_alias_map:
+            tool_name = tool_alias_map[tool_name]
+        if tool_name not in valid_tools:
+            tool_name = "play_action"
+        # --------------------------------------------------
+        # Fix play_action argument schema
+        # --------------------------------------------------
+        if tool_name == "play_action":
+            action = tool_args.get("action")
+            if not isinstance(action, str) or not action:
+                action = "look"
+            action = action.lower()
+            # Normalize verb aliases
+            invalid_verb_map = {
+                "check": "examine",
+                "inspect": "examine",
+                "search": "look",
+                "grab": "take",
+                "pick": "take",
+                "use": "examine",
+                "investigate": "examine",
+            }
+            words = action.split()
+            if words and words[0] in invalid_verb_map:
+                words[0] = invalid_verb_map[words[0]]
+                action = " ".join(words)
+            # Remove markdown artifacts
+            action = action.replace("**", "").replace("*", "").replace("`", "")
+            # Normalize whitespace
+            action = " ".join(action.strip().split())
+            tool_args = {"action": action}
+        else:
+            # Non-action tools should have empty args
+            tool_args = {}
+        return tool_name, tool_args
+    def _extract_result(self, result) -> str:
+        if hasattr(result, 'content') and result.content:
+            return result.content[0].text
+        if isinstance(result, list) and result:
+            return result[0].text if hasattr(result[0], 'text') else str(result[0])
+        return str(result)
+    def _update_score(self, text: str) -> None:
+        patterns = [r'Score:\s*(\d+)', r'score[:\s]+(\d+)', r'\[Score:\s*(\d+)']
+        for pattern in patterns:
+            match = re.search(pattern, text, re.IGNORECASE)
+            if match:
+                self.score = max(self.score, int(match.group(1)))
+    def _is_game_over(self, text: str) -> bool:
+        phrases = ["game over","you have died","you are dead","*** you have died ***"]
+        text_lower = text.lower()
+        return any(p in text_lower for p in phrases)

mcp_server.py CHANGED Viewed

@@ -24,77 +24,121 @@ Test your server with:
 Then open the MCP Inspector in your browser to test the tools interactively.
 """
 import sys
 import os
-# Add parent directory to path to import games module
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastmcp import FastMCP
 from games.zork_env import TextAdventureEnv
-# =============================================================================
-# Create the MCP Server
-# =============================================================================
 mcp = FastMCP("Student Text Adventure Server")
-# =============================================================================
-# Game State Management
-# =============================================================================
 class GameManager:
-    """
-    Manages the text adventure game state.
-    TODO: Extend this class to track:
-    - Action history (for memory tool)
-    - Explored locations (for mapping)
-    - Current score and moves
-    """
     def __init__(self):
-        self.env: TextAdventureEnv = None
         self.state = None
-        self.game_name: str = ""
-        # TODO: Add more state tracking
-        # self.history: list[tuple[str, str]] = []
-        # self.explored_locations: dict[str, set[str]] = {}
-        # self.current_location: str = ""
-    def initialize(self, game: str = "zork1"):
-        """Initialize or reset the game."""
-        self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
-        # TODO: Reset your state tracking here
-        return self.state.observation
-    def step(self, action: str) -> str:
-        """Execute an action and return the result."""
-        if self.env is None:
-            self.initialize()
         self.state = self.env.step(action)
-        # TODO: Update your state tracking here
-        # self.history.append((action, self.state.observation))
-        # Update location tracking, etc.
-        return self.state.observation
-    def get_score(self) -> int:
-        """Get current score."""
         return self.state.score if self.state else 0
-    def get_moves(self) -> int:
-        """Get number of moves taken."""
         return self.state.moves if self.state else 0
-# Global game manager
 _game = GameManager()
@@ -107,10 +151,9 @@ def get_game() -> GameManager:
         _game.initialize(game)
     return _game
-# =============================================================================
-# MCP Tools - IMPLEMENT THESE
-# =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
@@ -133,77 +176,107 @@ def play_action(action: str) -> str:
     game = get_game()
     # TODO: You might want to add action validation here
-    # TODO: You might want to include score changes in the response
     result = game.step(action)
-    # Optional: Append score info
-    # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
     return result
-# TODO: Implement additional tools to help your agent
-# @mcp.tool()
-# def memory() -> str:
-#     """
-#     Get the current game state summary.
-#
-#     Returns:
-#         A summary including current location, score, moves, and recent history
-#     """
-#     game = get_game()
-#     # TODO: Return useful state information
-#     pass
-# @mcp.tool()
-# def inventory() -> str:
-#     """
-#     Check what the player is carrying.
-#
-#     Returns:
-#         List of items in the player's inventory
-#     """
-#     game = get_game()
-#     result = game.step("inventory")
-#     return result
-# @mcp.tool()
-# def get_map() -> str:
-#     """
-#     Get a map of explored locations.
-#
-#     Returns:
-#         A text representation of explored locations and connections
-#     """
-#     game = get_game()
-#     # TODO: Return map of explored locations
-#     pass
-# @mcp.tool()
-# def get_valid_actions() -> str:
-#     """
-#     Get a list of likely valid actions from the current location.
-#
-#     Returns:
-#         List of actions that might work here
-#     """
-#     # This is a hint: Jericho provides get_valid_actions()
-#     game = get_game()
-#     if game.env and game.env.env:
-#         valid = game.env.env.get_valid_actions()
-#         return "Valid actions: " + ", ".join(valid[:20])
-#     return "Could not determine valid actions"
-# =============================================================================
-# Run the server
-# =============================================================================
 if __name__ == "__main__":
-    # This runs the server with stdio transport (for MCP clients)
-    mcp.run()

 Then open the MCP Inspector in your browser to test the tools interactively.
 """
 import sys
 import os
+import re
+from utils import is_new_location, extract_location
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastmcp import FastMCP
 from games.zork_env import TextAdventureEnv
+# =========================================================
+# Server Initialization
+# =========================================================
 mcp = FastMCP("Student Text Adventure Server")
+# =========================================================
+# Game State Manager
+# =========================================================
 class GameManager:
     def __init__(self):
+        self.env: TextAdventureEnv | None = None
         self.state = None
+        self.history = []
+        self.locations = {}
+        self.current_location = ""
+        self.inventory = set()
+    # -----------------------------------------------------
+    def initialize(self, game="zork1"):
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
+        self.history.clear()
+        self.locations.clear()
+        # Initial observation
+        self.state = self.env.step("look")
+        obs = self.state.observation
+        self.current_location = extract_location(obs)
+        self.locations[self.current_location] = {
+            "objects": set(),
+            "actions": set(),
+            "directions": set(),
+            "observations": set(),
+            "summary": ""
+        }
+        self.inventory=set()
+        return obs
+    # -----------------------------------------------------
+    def step(self, action: str):
+        if not self.env:
+            return "Game not initialized."
         self.state = self.env.step(action)
+        obs = self.state.observation
+        action_lower = action.lower()
+        # Location detection
+        if is_new_location(obs, set(self.locations.keys()), "play_action") and action != "inventory":
+            previous_location = self.current_location
+            self.current_location = extract_location(obs)
+            self.locations[previous_location]["directions"].add(
+                (action_lower, self.current_location)
+            )
+            self.locations[self.current_location] = {
+                "objects": set(),
+                "actions": set(),
+                "directions": set(),
+                "observations": set(),
+                "summary": ""
+            }
+        # Track action history (server level only)
+        self.history.append((action, obs))
+        if len(self.history) > 20:
+            self.history = self.history[-20:]
+        return obs
+    # -----------------------------------------------------
+    def get_score(self):
         return self.state.score if self.state else 0
+    def get_moves(self):
         return self.state.moves if self.state else 0
+# =========================================================
+# Global Game Instance
+# =========================================================
 _game = GameManager()
         _game.initialize(game)
     return _game
+# =========================================================
+# Tools (Execution Only)
+# =========================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     game = get_game()
     # TODO: You might want to add action validation here
+    # Execute the action
     result = game.step(action)
+    # TODO: You might want to include score changes in the response
+    # Optional: Append score info
+    return f"{result}\n\n[Score: {game.get_score()}, Moves: {game.get_moves()}]"
     return result
+# ---------------------------------------------------------
+@mcp.tool()
+def memory(query: str = "") -> str:
+    """
+    State viewer only.
+    No LLM inference.
+    """
+    game = get_game()
+    if not game.state:
+        return "Game not initialized."
+    loc = game.current_location
+    data = game.locations.get(loc, {})
+    return f"""
+STATE
+Location: {loc}
+Score: {game.get_score()}
+Moves: {game.get_moves()}
+RECENT HISTORY
+{game.history[-10:]}
+""".strip()
+# ---------------------------------------------------------
+@mcp.tool()
+def get_map() -> str:
+    """
+    Exploration graph dump.
+    """
+    game = get_game()
+    if not game.locations:
+        return "No map discovered."
+    text = "EXPLORED MAP\n"
+    for loc, data in game.locations.items():
+        text += f"\n[{loc}]\n"
+        for direction, dest in data.get("directions", set()):
+            text += f"  {direction} -> {dest}\n"
+    return text.strip()
+# ---------------------------------------------------------
+@mcp.tool()
+def inventory() -> str:
+    """
+    Inventory viewer using the game command.
+    """
+    game = get_game()
+    if not game.env:
+        return "Game not initialized."
+    try:
+        state = game.env.step("inventory")
+        return state.observation
+    except Exception:
+        return "Unable to retrieve inventory."
+# ---------------------------------------------------------
+@mcp.tool()
+def get_valid_actions() -> str:
+    """
+    Environment hint helper.
+    """
+    game = get_game()
+    if game.env and game.env.env:
+        valid = game.env.env.get_valid_actions()
+        return ", ".join(valid) if valid else "No valid actions."
+    return "Environment not available."
+# =========================================================
+# Run Server
+# =========================================================
 if __name__ == "__main__":
+    mcp.run()

requirements.txt CHANGED Viewed

@@ -1,9 +1,17 @@
-# HF Spaces already has gradio and huggingface_hub pre-installed
-# Do not add them here or you may get version conflicts
-# Agent dependencies (these are provided by the evaluation infrastructure)
-# Do not add jericho, fastmcp here - they are installed during evaluation
-# Add any additional packages your agent needs below:
-# numpy
-# requests

+# Core dependencies
+jericho
+python-dotenv
+spacy
+torch
+spaces
+transformers
+accelerate
+# MCP Server
+fastmcp
+# Function calling (optional, for the alternative approach)
+langchain-core
+huggingface_hub

utils.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from huggingface_hub import InferenceClient
+import os
+from dotenv import load_dotenv
+load_dotenv()
+LLM_MODEL = "Qwen/Qwen2.5-7B-Instruct"
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
+def call_llm(prompt: str, system_prompt: str = "", seed: int = 0, max_tokens: int = 300) -> str:
+    messages = []
+    if system_prompt.strip():
+        messages.append({"role": "system", "content": system_prompt})
+    messages.append({"role": "user", "content": prompt})
+    response = LLM_CLIENT.chat.completions.create(
+        model=LLM_MODEL,
+        messages=messages,
+        temperature=0.0,
+        max_tokens=max_tokens,
+        seed=seed,
+    )
+    return response.choices[0].message.content
+def is_new_location(observation: str, known_locations: set, last_tool:str) -> bool:
+    if last_tool != "play_action":
+        return False
+    location = extract_location(observation)
+    if location.strip().endswith(('.', '!', '?', ')')) or location in known_locations:
+        return False
+    return True
+def extract_location(observation: str) -> str:
+    return observation.lower().split("\n")[0].strip()