Spaces:

LLM-course
/

Agentic-zork

Running

App Files Files Community

nathanael-fijalkow commited on Feb 2

Commit

450ea3f

1 Parent(s): 8562e41

Major refactoring

Browse files

Files changed (28) hide show

.gitignore +4 -8
README.md +54 -109
agents/__init__.py +0 -9
agents/base_agent.py +0 -78
agents/react_agent.py +0 -243
app.py +12 -6
evaluation/__init__.py +14 -0
evaluation/evaluate.py +559 -0
evaluation/metrics.py +151 -0
evaluation/runner.py +188 -0
example_submission/README.md +28 -0
agents/mcp_react_agent.py → example_submission/agent.py +190 -263
mcp_server/zork_server.py → example_submission/mcp_server.py +16 -240
function_calling/controller.py +0 -291
function_calling/simple_controller.py +0 -268
function_calling/tools.py +0 -127
mcp_server/README.md +0 -83
mcp_server/__init__.py +0 -1
mcp_server/mcp_config.json +0 -9
requirements.txt +4 -1
run_agent.py +125 -251
submission_template/README.md +31 -0
submission_template/agent.py +279 -0
submission_template/app.py +71 -0
templates/mcp_server_template.py → submission_template/mcp_server.py +117 -55
submission_template/requirements.txt +8 -0
templates/README.md +0 -129
templates/react_agent_template.py +0 -303

.gitignore CHANGED Viewed

@@ -1,5 +1,5 @@
-master.zip
 .github/
 # Byte-compiled / optimized / DLL files
 __pycache__/
@@ -140,12 +140,8 @@ dmypy.json
 *.swo
 *~
-# Game files
-z-machine-games-master/
-*.z3
-*.z4
-*.z5
-*.z8
 # Temp files
 .mcp_config_temp.json

 .github/
+hidden_submission/
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.swo
 *~
 # Temp files
 .mcp_config_temp.json
+# Z-machine game files
+z-machine-games-master/

README.md CHANGED Viewed

@@ -20,8 +20,9 @@ This project provides:
 1. **MCP Server** - Exposes text adventure games as MCP tools using FastMCP
 2. **ReAct Agent** - An agent that uses MCP tools to play games with reasoning
-3. **Templates** - Starter code for students to implement their own solutions
-4. **57 Games** - Zork trilogy, Infocom classics, and many more Z-machine games
 ## Architecture
@@ -63,109 +64,78 @@ Get your HuggingFace token at: https://huggingface.co/settings/tokens
 ### 2. Run an Agent
 ```bash
-# MCP mode (recommended) - uses FastMCP Client
-python run_agent.py --mode mcp
-# Basic ReAct agent (direct game interaction)
-python run_agent.py --mode react
-# Function calling mode
-python run_agent.py --mode function --simple
 ```
 ## Project Structure
 ```
 .
-+-- run_agent.py              # Unified agent runner
-+-- mcp_server/
-|   +-- zork_server.py        # Full MCP server with all tools
-+-- agents/
-|   +-- base_agent.py         # Abstract base class
-|   +-- react_agent.py        # Basic ReAct agent (no MCP)
-|   +-- mcp_react_agent.py    # MCP-enabled ReAct agent
-+-- templates/                # Student templates
 |   +-- README.md             # Assignment instructions
-|   +-- mcp_server_template.py    # MCP server starter
-|   +-- react_agent_template.py   # Agent starter
-+-- function_calling/         # Alternative: function calling
-|   +-- controller.py
-|   +-- simple_controller.py
-|   +-- tools.py
 +-- games/
 |   +-- zork_env.py           # Jericho wrapper
 +-- z-machine-games-master/   # Game files
 ```
-## Agent Modes
-| Mode | Description | Command |
-|------|-------------|---------|
-| `mcp` | MCP ReAct agent (FastMCP Client) | `--mode mcp` |
-| `react` | Basic ReAct (direct game) | `--mode react` |
-| `function` | Function calling (API) | `--mode function` |
-| `function --simple` | Function calling (text) | `--mode function --simple` |
-### Examples
-```bash
-# Run MCP agent with verbose output
-python run_agent.py --mode mcp -v
-# Run with different model
-python run_agent.py --mode mcp --model google/gemma-2-2b-it
-# Limit steps
-python run_agent.py --mode mcp -n 50
-# Play different games
-python run_agent.py --mode mcp --game zork2
-python run_agent.py --mode mcp --game advent     # Colossal Cave Adventure
-python run_agent.py --mode mcp --game enchanter  # Infocom classic
-python run_agent.py --mode mcp --game hhgg       # Hitchhiker's Guide
-# List all 57 available games
-python run_agent.py --list-games
-```
-## MCP Server Tools
-The MCP server exposes these tools:
-| Tool | Description |
-|------|-------------|
-| `play_action(action)` | Execute a game command (north, take lamp, etc.) |
-| `memory()` | Get current state (location, score, history) |
-| `get_map()` | View explored locations and connections |
-| `inventory()` | Check items you're carrying |
-| `valid_actions()` | Get command hints |
-| `reset_game(game)` | Start over or switch games |
-| `list_games()` | See all 57 available games |
-| `hint()` | Get contextual hints |
-### Testing the MCP Server
 ```bash
-# Run server directly (stdio transport) - default game is zork1
-python mcp_server/zork_server.py
-# Run with a specific game
-GAME=advent python mcp_server/zork_server.py
-# Use MCP Inspector for interactive testing
-npx @modelcontextprotocol/inspector python mcp_server/zork_server.py
-# Use FastMCP dev mode
-fastmcp dev mcp_server/zork_server.py
 ```
-## Student Assignment
-See [templates/README.md](templates/README.md) for the assignment.
-Students implement:
-1. **MCP Server** (`mcp_server_template.py`) - Expose game functionality as MCP tools
-2. **ReAct Agent** (`react_agent_template.py`) - Play text adventures using MCP
 ## Configuration
@@ -176,39 +146,14 @@ Create `.env` from `.env.example`:
 ```bash
 # Required: HuggingFace token
 HF_TOKEN=hf_your_token_here
-# Optional: Model override (default: meta-llama/Llama-3.2-3B-Instruct)
-HF_MODEL=meta-llama/Llama-3.2-3B-Instruct
 ```
-### Recommended Models
-| Model | Notes |
-|-------|-------|
-| `meta-llama/Llama-3.2-3B-Instruct` | Default, good balance |
-| `google/gemma-2-2b-it` | Smaller, faster |
-| `Qwen/Qwen2.5-7B-Instruct` | Good instruction following |
-## Evaluation
-Run the evaluator to test agent performance:
-```bash
-python evaluate.py --mode mcp --games zork1 --runs 3
-```
-Metrics:
-- **Score**: Points earned in-game
-- **Score %**: Score / Max possible score
-- **Steps**: Number of actions taken
-- **Time**: Elapsed time
-## Resources
-- [FastMCP Documentation](https://gofastmcp.com/)
-- [MCP Protocol](https://modelcontextprotocol.io/)
-- [Jericho (Text Adventures)](https://github.com/microsoft/jericho)
-- [HuggingFace Inference API](https://huggingface.co/docs/huggingface_hub/guides/inference)
 ## License

 1. **MCP Server** - Exposes text adventure games as MCP tools using FastMCP
 2. **ReAct Agent** - An agent that uses MCP tools to play games with reasoning
+3. **Submission Template** - Starter code for students to implement their own solutions
+4. **Evaluation System** - Deterministic evaluation with seeded runs
+5. **57 Games** - Zork trilogy, Infocom classics, and many more Z-machine games
 ## Architecture
 ### 2. Run an Agent
 ```bash
+# Run the example MCP agent
+python run_agent.py
+# Play a different game
+python run_agent.py --game advent
+# Verbose output
+python run_agent.py -v
+# Limit steps
+python run_agent.py -n 50
+# List all 57 games
+python run_agent.py --list-games
 ```
 ## Project Structure
 ```
 .
++-- run_agent.py              # Agent runner
++-- app.py                    # Gradio interface
++-- evaluation/               # Evaluation system
+|   +-- evaluate.py           # Main CLI script
+|   +-- runner.py             # Agent execution
+|   +-- metrics.py            # Result tracking
++-- example_submission/       # Working example submission
+|   +-- agent.py              # Full ReAct agent implementation
+|   +-- mcp_server.py         # Full MCP server implementation
++-- submission_template/      # Student templates
 |   +-- README.md             # Assignment instructions
+|   +-- agent.py              # Agent starter code
+|   +-- mcp_server.py         # MCP server starter code
+|   +-- app.py                # HF Spaces app
 +-- games/
 |   +-- zork_env.py           # Jericho wrapper
 +-- z-machine-games-master/   # Game files
 ```
+## Assignment
+See [submission_template/README.md](submission_template/README.md) for the assignment instructions.
+You need to implement:
+1. **MCP Server** (`mcp_server.py`) - Expose game functionality as MCP tools
+2. **ReAct Agent** (`agent.py`) - Play text adventures using MCP tools
+A working example is provided in `example_submission/`.
+## Evaluation
+Run the evaluator to test submissions:
 ```bash
+# Evaluate a submission
+python evaluation/evaluate.py -s ./submission_template -g zork1 -t 5
+# Evaluate the example
+python evaluation/evaluate.py -s ./example_submission -g zork1 -t 3
+# Evaluate multiple games
+python evaluation/evaluate.py -s ./example_submission -g zork1 advent enchanter -t 3
+# Save results to JSON
+python evaluation/evaluate.py -s ./example_submission -g zork1 -t 3 -o results.json
 ```
+Metrics:
+- **Score**: Points earned in-game (averaged over trials)
+- **Score %**: Score / Max possible score
+- **Steps**: Number of actions taken
+- **Time**: Elapsed time
 ## Configuration
 ```bash
 # Required: HuggingFace token
 HF_TOKEN=hf_your_token_here
 ```
+### Fixed Model
+All submissions use the same model for fairness:
+- **Model**: `Qwen/Qwen2.5-72B-Instruct`
+- **Temperature**: `0.0` (deterministic)
+- **Seed**: Provided for reproducibility
 ## License

agents/__init__.py DELETED Viewed

@@ -1,9 +0,0 @@
-from .base_agent import BaseAgent, AgentConfig
-from .react_agent import ReActAgent, ReActConfig
-from .mcp_react_agent import MCPReActAgent, MCPAgentConfig
-__all__ = [
-    "BaseAgent", "AgentConfig",
-    "ReActAgent", "ReActConfig",
-    "MCPReActAgent", "MCPAgentConfig",
-]

agents/base_agent.py DELETED Viewed

@@ -1,78 +0,0 @@
-"""
-Base Agent Abstract Class
-Defines the interface that all text adventure agents must implement.
-"""
-from abc import ABC, abstractmethod
-from dataclasses import dataclass
-from games.zork_env import GameState
-@dataclass
-class AgentConfig:
-    """Configuration for an agent."""
-    name: str = "BaseAgent"
-    max_history: int = 20  # Maximum number of past interactions to remember
-    verbose: bool = False
-class BaseAgent(ABC):
-    """
-    Abstract base class for text adventure agents.
-    Students should extend this class and implement the `choose_action` method.
-    """
-    def __init__(self, config: AgentConfig = None):
-        self.config = config or AgentConfig()
-        self.history: list[tuple[str, str, GameState]] = []  # (action, observation, state)
-    @abstractmethod
-    def choose_action(self, observation: str, game_state: GameState) -> str:
-        """
-        Choose the next action based on the current observation and game state.
-        Args:
-            observation: The text observation from the game
-            game_state: The current GameState object with score, inventory, etc.
-        Returns:
-            A string action to take in the game (e.g., "go north", "take lamp")
-        """
-        pass
-    def update_history(self, action: str, observation: str, game_state: GameState):
-        """
-        Update the agent's history after taking an action.
-        Args:
-            action: The action that was taken
-            observation: The resulting observation
-            game_state: The resulting game state
-        """
-        self.history.append((action, observation, game_state))
-        # Keep history bounded
-        if len(self.history) > self.config.max_history:
-            self.history = self.history[-self.config.max_history:]
-    def reset(self):
-        """Reset the agent's internal state for a new game."""
-        self.history = []
-    def get_history_text(self) -> str:
-        """Get a text summary of recent history for context."""
-        if not self.history:
-            return "No previous actions taken."
-        lines = []
-        for action, observation, state in self.history[-10:]:  # Last 10 actions
-            lines.append(f"> {action}")
-            # Truncate long observations
-            obs_preview = observation[:200] + "..." if len(observation) > 200 else observation
-            lines.append(obs_preview)
-            lines.append(f"[Score: {state.score}, Moves: {state.moves}]")
-            lines.append("")
-        return "\n".join(lines)

agents/react_agent.py DELETED Viewed

@@ -1,243 +0,0 @@
-"""
-ReAct Agent for Text Adventure Games
-Implements a ReAct (Reasoning + Acting) loop using an LLM to play text adventures.
-The agent thinks about its situation, decides on an action, and learns from the result.
-"""
-import os
-from dataclasses import dataclass
-from huggingface_hub import InferenceClient
-from dotenv import load_dotenv
-from agents.base_agent import BaseAgent, AgentConfig
-from games.zork_env import GameState
-@dataclass
-class ReActConfig(AgentConfig):
-    """Configuration for the ReAct agent."""
-    name: str = "ReActAgent"
-    model: str = "meta-llama/Llama-3.2-3B-Instruct"
-    temperature: float = 0.7
-    max_tokens: int = 300
-    max_history: int = 15
-SYSTEM_PROMPT = """You are playing a classic text adventure game.
-GOAL: Explore the world, solve puzzles, collect treasures, and maximize your score.
-VALID COMMANDS:
-- Movement: north, south, east, west, up, down, enter, exit
-- Looking: look, examine <thing>, read <thing>
-- Objects: take <item>, drop <item>, open <thing>, close <thing>
-- Light: turn on lamp, light match
-- Combat: attack <enemy> with <weapon>
-- Other: inventory, wait, push <thing>, move <thing>
-INVALID COMMANDS (do NOT use): check, inspect, search, grab, use, help
-TIPS:
-- Explore systematically - try all directions
-- Examine interesting objects and read documents
-- Pick up useful items (lamp, keys, weapons)
-- Open containers to find hidden items
-You MUST respond in EXACTLY this format (no markdown, no extra text):
-THOUGHT: <your reasoning in one sentence>
-ACTION: <one valid command>
-Example response:
-THOUGHT: I see a container here, I should check what is inside.
-ACTION: open container"""
-class ReActAgent(BaseAgent):
-    """
-    A ReAct (Reasoning + Acting) agent that uses an LLM to play text adventures.
-    Uses Hugging Face Hub's Inference API.
-    """
-    def __init__(self, config: ReActConfig = None, token: str = None):
-        super().__init__(config or ReActConfig())
-        self.config: ReActConfig = self.config
-        # Load token from environment if not provided
-        load_dotenv()
-        token = token or os.getenv("HF_TOKEN")
-        if not token:
-            raise ValueError("HF_TOKEN not found. Set HF_TOKEN environment variable or pass token parameter.")
-        # Override model from environment if set
-        env_model = os.getenv("HF_MODEL")
-        if env_model:
-            self.config.model = env_model
-        self.client = InferenceClient(token=token)
-        self.thoughts: list[str] = []  # Store reasoning history
-    def choose_action(self, observation: str, game_state: GameState) -> str:
-        """
-        Use the LLM to reason about the situation and choose an action.
-        """
-        # Build the prompt with context
-        prompt = self._build_prompt(observation, game_state)
-        # Call the LLM
-        response = self._call_llm(prompt)
-        # Parse the response
-        thought, action = self._parse_response(response)
-        # Store the thought for history
-        self.thoughts.append(thought)
-        if self.config.verbose:
-            print(f"\n[Thought] {thought}")
-            print(f"[Action] {action}")
-        return action
-    def _build_prompt(self, observation: str, game_state: GameState) -> str:
-        """Build the prompt for the LLM with current context."""
-        parts = []
-        # Current status (compact for small models)
-        parts.append(f"Score: {game_state.score}/{game_state.max_score} | Moves: {game_state.moves}")
-        if game_state.inventory:
-            parts.append(f"Inventory: {', '.join(game_state.inventory)}")
-        # Recent history (only last 3 for small models)
-        if self.history:
-            parts.append("\nRecent:")
-            recent_actions = []
-            for action, obs, state in self.history[-3:]:
-                obs_short = obs[:150] + "..." if len(obs) > 150 else obs
-                parts.append(f"> {action}\n{obs_short}")
-                recent_actions.append(action)
-            # Warn about repeated actions
-            if len(recent_actions) >= 2 and len(set(recent_actions)) == 1:
-                parts.append(f"\n[WARNING: You've done '{recent_actions[0]}' multiple times. Try something different!]")
-        # Current observation
-        parts.append(f"\nNow:\n{observation}")
-        parts.append("\nWhat do you do next? (Try a NEW action)")
-        return "\n".join(parts)
-    def _call_llm(self, prompt: str) -> str:
-        """Call the Hugging Face Inference API."""
-        try:
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": prompt}
-            ]
-            response = self.client.chat.completions.create(
-                model=self.config.model,
-                messages=messages,
-                temperature=self.config.temperature,
-                max_tokens=self.config.max_tokens,
-            )
-            return response.choices[0].message.content
-        except Exception as e:
-            print(f"Error calling LLM: {e}")
-            return "THOUGHT: Error occurred, trying a safe action.\nACTION: look"
-    def _parse_response(self, response: str) -> tuple[str, str]:
-        """Parse the LLM response to extract thought and action."""
-        thought = ""
-        action = "look"  # Default fallback action
-        lines = response.strip().split("\n")
-        for i, line in enumerate(lines):
-            line_upper = line.upper().strip()
-            if line_upper.startswith("THOUGHT:"):
-                # Extract thought (may span multiple lines until ACTION)
-                thought_parts = [line.split(":", 1)[1].strip()]
-                for j in range(i + 1, len(lines)):
-                    if lines[j].upper().strip().startswith("ACTION:"):
-                        break
-                    thought_parts.append(lines[j].strip())
-                thought = " ".join(thought_parts).strip()
-            elif line_upper.startswith("ACTION:"):
-                action = line.split(":", 1)[1].strip().lower()
-                # Clean up the action - remove quotes, markdown, and extra whitespace
-                action = action.strip('"\'')
-                # Remove markdown bold/italic markers
-                action = action.replace("**", "").replace("*", "").replace("__", "").replace("_", " ")
-                # Remove backticks
-                action = action.replace("`", "")
-                # Clean up whitespace
-                action = " ".join(action.split())
-                break
-        # Validate action isn't empty
-        if not action or action.isspace():
-            action = "look"
-        return thought, action
-    def reset(self):
-        """Reset the agent for a new game."""
-        super().reset()
-        self.thoughts = []
-    def get_summary(self) -> str:
-        """Get a summary of the agent's reasoning."""
-        if not self.thoughts:
-            return "No thoughts recorded yet."
-        return "\n---\n".join(self.thoughts[-5:])
-# Example usage and testing
-if __name__ == "__main__":
-    import sys
-    from games.zork_env import TextAdventureEnv
-    # Use command line arg or default to zork1
-    game = sys.argv[1] if len(sys.argv) > 1 else "zork1"
-    # Quick test
-    config = ReActConfig(verbose=True)
-    try:
-        agent = ReActAgent(config)
-        env = TextAdventureEnv(game)
-        state = env.reset()
-        print("=" * 50)
-        print(f"{game.upper()} (using {agent.config.model})")
-        print("=" * 50)
-        print(state.observation)
-        # Run a few steps
-        for step in range(5):
-            print(f"\n{'=' * 50}")
-            print(f"Step {step + 1}")
-            print("=" * 50)
-            action = agent.choose_action(state.observation, state)
-            print(f"\n> {action}")
-            state = env.step(action)
-            print(f"\n{state.observation}")
-            print(f"\nScore: {state.score}/{state.max_score}")
-            agent.update_history(action, state.observation, state)
-            if state.done:
-                print("\nGAME OVER!")
-                break
-    except ValueError as e:
-        print(f"Setup error: {e}")
-        print("Make sure to set your HF_TOKEN in .env file")

app.py CHANGED Viewed

@@ -49,16 +49,22 @@ Get your HuggingFace token at: https://huggingface.co/settings/tokens
 ### 4. Explore the Templates
-The templates are in the `templates/` folder:
-- `mcp_server_template.py` - MCP server starter code
-- `react_agent_template.py` - ReAct agent starter code
 ### 5. Test Your Implementation
 ```bash
-# Run your agent
-python run_agent.py --mode mcp -n 20
 # List available games (57 total!)
 python run_agent.py --list-games
@@ -66,7 +72,7 @@ python run_agent.py --list-games
 ## Resources
-- [Assignment Instructions](templates/README.md)
 - [FastMCP Documentation](https://gofastmcp.com/)
 - [MCP Protocol](https://modelcontextprotocol.io/)
 """

 ### 4. Explore the Templates
+The submission template is in the `submission_template/` folder:
+- `agent.py` - Your agent implementation (implement the StudentAgent class)
+- `mcp_server.py` - Your MCP server implementation (add tools)
+- `README.md` - Detailed instructions
+A working example is in `examples/mcp_react/`.
 ### 5. Test Your Implementation
 ```bash
+# Run the example agent
+python run_agent.py
+# Run with a different game
+python run_agent.py --game advent
 # List available games (57 total!)
 python run_agent.py --list-games
 ## Resources
+- [Submission Instructions](submission_template/README.md)
 - [FastMCP Documentation](https://gofastmcp.com/)
 - [MCP Protocol](https://modelcontextprotocol.io/)
 """

evaluation/__init__.py ADDED Viewed

	@@ -0,0 +1,14 @@

+"""
+Evaluation package for Text Adventure Agents.
+"""
+from evaluation.metrics import EvaluationResult, TrialResult
+from evaluation.runner import RunConfig, RunResult, run_agent_with_server
+__all__ = [
+    "EvaluationResult",
+    "TrialResult",
+    "RunConfig",
+    "RunResult",
+    "run_agent_with_server",
+]

evaluation/evaluate.py ADDED Viewed

	@@ -0,0 +1,559 @@

+#!/usr/bin/env python3
+"""
+Evaluation Script for Text Adventure Agents
+Evaluates student submissions by running their agent + MCP server
+on a text adventure game for multiple trials and averaging scores.
+Usage:
+    # Evaluate a student submission
+    python evaluation/evaluate.py \\
+        --submission path/to/student/submission \\
+        --game zork1 \\
+        --trials 5 \\
+        --max-steps 100
+    # Evaluate with reference agent comparison
+    python evaluation/evaluate.py \\
+        --submission path/to/student/submission \\
+        --game zork1 \\
+        --reference
+    # Evaluate from a Hugging Face Space
+    python evaluation/evaluate.py \\
+        --hf-space username/space-name \\
+        --game zork1
+    # Batch evaluate multiple submissions
+    python evaluation/evaluate.py \\
+        --submissions-dir path/to/all/submissions \\
+        --game zork1 \\
+        --output results.json
+Examples:
+    # Quick test with 3 trials
+    python evaluation/evaluate.py -s ./submission_template -g zork1 -t 3
+    # Full evaluation for grading
+    python evaluation/evaluate.py -s ./submission_template -g advent -t 5 --max-steps 150
+"""
+import argparse
+import asyncio
+import json
+import os
+import random
+import sys
+import tempfile
+import warnings
+from datetime import datetime
+from pathlib import Path
+# Suppress asyncio subprocess cleanup warnings
+warnings.filterwarnings("ignore", message=".*Event loop is closed.*")
+warnings.filterwarnings("ignore", category=UserWarning, module="multiprocessing.resource_tracker")
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from evaluation.metrics import EvaluationResult, TrialResult
+from evaluation.runner import RunConfig, run_agent_with_server, run_reference_agent
+from games.zork_env import list_available_games
+def generate_seeds(base_seed: int, num_trials: int) -> list[int]:
+    """Generate deterministic seeds for each trial."""
+    random.seed(base_seed)
+    return [random.randint(0, 2**32 - 1) for _ in range(num_trials)]
+async def evaluate_submission(
+    submission_path: Path,
+    game: str,
+    num_trials: int = 5,
+    max_steps: int = 100,
+    base_seed: int = 42,
+    verbose: bool = False,
+) -> EvaluationResult:
+    """
+    Evaluate a student submission across multiple trials.
+    Args:
+        submission_path: Path to student's submission directory
+        game: Name of the game to evaluate on
+        num_trials: Number of trials to run (default: 5)
+        max_steps: Maximum steps per trial (default: 100)
+        base_seed: Base seed for reproducibility (default: 42)
+        verbose: Print detailed output
+    Returns:
+        EvaluationResult with aggregated metrics
+    """
+    # Locate agent and server files
+    agent_path = submission_path / "agent.py"
+    server_path = submission_path / "mcp_server.py"
+    # Extract student ID from path or README
+    student_id = submission_path.name
+    readme_path = submission_path / "README.md"
+    if readme_path.exists():
+        content = readme_path.read_text()
+        # Try to extract student name from README
+        for line in content.split("\n"):
+            if line.startswith("# ") or "name:" in line.lower():
+                student_id = line.replace("#", "").replace("name:", "").strip()[:50]
+                break
+    # Initialize results
+    result = EvaluationResult(
+        student_id=student_id,
+        game=game,
+        num_trials=num_trials,
+        max_steps=max_steps,
+    )
+    # Generate deterministic seeds
+    seeds = generate_seeds(base_seed, num_trials)
+    print(f"\nEvaluating: {student_id}")
+    print(f"Game: {game}")
+    print(f"Trials: {num_trials}")
+    print(f"Max steps: {max_steps}")
+    print(f"Seeds: {seeds}")
+    print("-" * 50)
+    for i, seed in enumerate(seeds):
+        trial_num = i + 1
+        print(f"\nTrial {trial_num}/{num_trials} (seed={seed})...")
+        config = RunConfig(
+            agent_path=agent_path,
+            server_path=server_path,
+            game=game,
+            max_steps=max_steps,
+            seed=seed,
+            verbose=verbose,
+        )
+        try:
+            run_result = await run_agent_with_server(config)
+            trial = TrialResult(
+                trial_number=trial_num,
+                final_score=run_result.final_score,
+                max_score=run_result.max_score,
+                moves=run_result.moves,
+                locations_visited=len(run_result.locations_visited),
+                game_completed=run_result.game_completed,
+                error=run_result.error,
+            )
+            if run_result.error:
+                print(f"  Error: {run_result.error[:100]}...")
+            else:
+                print(f"  Score: {run_result.final_score}")
+                print(f"  Moves: {run_result.moves}")
+                print(f"  Locations: {len(run_result.locations_visited)}")
+        except Exception as e:
+            trial = TrialResult(
+                trial_number=trial_num,
+                final_score=0,
+                max_score=0,
+                moves=0,
+                locations_visited=0,
+                game_completed=False,
+                error=str(e),
+            )
+            print(f"  Exception: {e}")
+        result.add_trial(trial)
+    return result
+async def evaluate_with_reference(
+    submission_path: Path,
+    game: str,
+    num_trials: int = 5,
+    max_steps: int = 100,
+    base_seed: int = 42,
+    verbose: bool = False,
+) -> tuple[EvaluationResult, EvaluationResult]:
+    """
+    Evaluate student submission and compare with reference agent.
+    Returns:
+        Tuple of (student_result, reference_result)
+    """
+    # Evaluate student
+    student_result = await evaluate_submission(
+        submission_path=submission_path,
+        game=game,
+        num_trials=num_trials,
+        max_steps=max_steps,
+        base_seed=base_seed,
+        verbose=verbose,
+    )
+    # Evaluate reference agent (from examples/mcp_react)
+    print("\n" + "=" * 50)
+    print("Running reference agent for comparison...")
+    print("=" * 50)
+    seeds = generate_seeds(base_seed, num_trials)
+    reference_result = EvaluationResult(
+        student_id="reference_agent",
+        game=game,
+        num_trials=num_trials,
+        max_steps=max_steps,
+    )
+    for i, seed in enumerate(seeds):
+        trial_num = i + 1
+        print(f"\nReference Trial {trial_num}/{num_trials} (seed={seed})...")
+        try:
+            run_result = await run_reference_agent(
+                game=game,
+                max_steps=max_steps,
+                seed=seed,
+                verbose=verbose,
+            )
+            trial = TrialResult(
+                trial_number=trial_num,
+                final_score=run_result.final_score,
+                max_score=run_result.max_score,
+                moves=run_result.moves,
+                locations_visited=len(run_result.locations_visited),
+                game_completed=run_result.game_completed,
+                error=run_result.error,
+            )
+            if run_result.error:
+                print(f"  Error: {run_result.error[:100]}...")
+            else:
+                print(f"  Score: {run_result.final_score}")
+        except Exception as e:
+            trial = TrialResult(
+                trial_number=trial_num,
+                final_score=0,
+                max_score=0,
+                moves=0,
+                locations_visited=0,
+                game_completed=False,
+                error=str(e),
+            )
+            print(f"  Exception: {e}")
+        reference_result.add_trial(trial)
+    return student_result, reference_result
+def clone_hf_space(space_id: str, target_dir: Path) -> Path:
+    """Clone a Hugging Face Space to local directory."""
+    import subprocess
+    # HF Spaces are git repos at huggingface.co/spaces/
+    repo_url = f"https://huggingface.co/spaces/{space_id}"
+    print(f"Cloning {repo_url}...")
+    subprocess.run(
+        ["git", "clone", "--depth", "1", repo_url, str(target_dir)],
+        check=True,
+        capture_output=True,
+    )
+    return target_dir
+async def batch_evaluate(
+    submissions_dir: Path,
+    game: str,
+    num_trials: int = 5,
+    max_steps: int = 100,
+    base_seed: int = 42,
+    output_path: Path = None,
+    verbose: bool = False,
+) -> list[EvaluationResult]:
+    """Evaluate all submissions in a directory."""
+    results = []
+    # Find all submission directories (those containing agent.py)
+    submission_dirs = [
+        d for d in submissions_dir.iterdir()
+        if d.is_dir() and (d / "agent.py").exists()
+    ]
+    print(f"Found {len(submission_dirs)} submissions")
+    for submission_path in sorted(submission_dirs):
+        try:
+            result = await evaluate_submission(
+                submission_path=submission_path,
+                game=game,
+                num_trials=num_trials,
+                max_steps=max_steps,
+                base_seed=base_seed,
+                verbose=verbose,
+            )
+            results.append(result)
+        except Exception as e:
+            print(f"Failed to evaluate {submission_path}: {e}")
+    # Sort by mean score (descending)
+    results.sort(key=lambda r: r.mean_score, reverse=True)
+    # Save results
+    if output_path:
+        output_data = {
+            "evaluation_date": datetime.now().isoformat(),
+            "game": game,
+            "num_trials": num_trials,
+            "max_steps": max_steps,
+            "base_seed": base_seed,
+            "results": [r.to_dict() for r in results],
+            "leaderboard": [
+                {
+                    "rank": i + 1,
+                    "student_id": r.student_id,
+                    "mean_score": round(r.mean_score, 2),
+                    "std_score": round(r.std_score, 2),
+                }
+                for i, r in enumerate(results)
+            ],
+        }
+        with open(output_path, "w") as f:
+            json.dump(output_data, f, indent=2)
+        print(f"\nResults saved to {output_path}")
+    return results
+def print_comparison(student: EvaluationResult, reference: EvaluationResult):
+    """Print a comparison between student and reference results."""
+    print("\n" + "=" * 60)
+    print("EVALUATION COMPARISON")
+    print("=" * 60)
+    print(f"\n{'Metric':<25} {'Student':<15} {'Reference':<15}")
+    print("-" * 55)
+    print(f"{'Mean Score':<25} {student.mean_score:<15.2f} {reference.mean_score:<15.2f}")
+    print(f"{'Std Score':<25} {student.std_score:<15.2f} {reference.std_score:<15.2f}")
+    print(f"{'Min Score':<25} {student.min_score:<15} {reference.min_score:<15}")
+    print(f"{'Max Score':<25} {student.max_score_achieved:<15} {reference.max_score_achieved:<15}")
+    print(f"{'Mean Moves':<25} {student.mean_moves:<15.1f} {reference.mean_moves:<15.1f}")
+    print(f"{'Mean Locations':<25} {student.mean_locations:<15.1f} {reference.mean_locations:<15.1f}")
+    print(f"{'Successful Trials':<25} {student.successful_trials:<15} {reference.successful_trials:<15}")
+    # Performance ratio
+    if reference.mean_score > 0:
+        ratio = student.mean_score / reference.mean_score * 100
+        print(f"\nStudent performance: {ratio:.1f}% of reference")
+def main():
+    parser = argparse.ArgumentParser(
+        description="Evaluate text adventure agent submissions",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=__doc__,
+    )
+    # Input options (mutually exclusive)
+    input_group = parser.add_mutually_exclusive_group(required=True)
+    input_group.add_argument(
+        "-s", "--submission",
+        type=Path,
+        help="Path to student submission directory",
+    )
+    input_group.add_argument(
+        "--hf-space",
+        type=str,
+        help="Hugging Face Space ID (e.g., username/space-name)",
+    )
+    input_group.add_argument(
+        "--submissions-dir",
+        type=Path,
+        help="Directory containing multiple submissions (for batch evaluation)",
+    )
+    # Evaluation parameters
+    parser.add_argument(
+        "-g", "--game",
+        type=str,
+        default="lostpig",
+        help="Game to evaluate on (default: lostpig)",
+    )
+    parser.add_argument(
+        "-t", "--trials",
+        type=int,
+        default=5,
+        help="Number of trials to run (default: 5)",
+    )
+    parser.add_argument(
+        "--max-steps",
+        type=int,
+        default=100,
+        help="Maximum steps per trial (default: 100)",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+        help="Base random seed for reproducibility (default: 42)",
+    )
+    # Reference comparison
+    parser.add_argument(
+        "-r", "--reference",
+        action="store_true",
+        help="Also run reference agent (from examples/mcp_react) for comparison",
+    )
+    # Output options
+    parser.add_argument(
+        "-o", "--output",
+        type=Path,
+        help="Output file for results (JSON)",
+    )
+    parser.add_argument(
+        "-v", "--verbose",
+        action="store_true",
+        help="Print detailed output",
+    )
+    parser.add_argument(
+        "--list-games",
+        action="store_true",
+        help="List available games and exit",
+    )
+    args = parser.parse_args()
+    # List games if requested
+    if args.list_games:
+        games = list_available_games()
+        print(f"Available games ({len(games)}):")
+        for game in games:
+            print(f"  - {game}")
+        return
+    # Validate game
+    available_games = list_available_games()
+    if args.game not in available_games:
+        print(f"Error: Unknown game '{args.game}'")
+        print(f"Available: {', '.join(available_games[:10])}...")
+        sys.exit(1)
+    # Handle HF Space input
+    if args.hf_space:
+        with tempfile.TemporaryDirectory() as tmpdir:
+            submission_path = clone_hf_space(args.hf_space, Path(tmpdir) / "submission")
+            if args.reference:
+                student_result, reference_result = asyncio.run(
+                    evaluate_with_reference(
+                        submission_path=submission_path,
+                        game=args.game,
+                        num_trials=args.trials,
+                        max_steps=args.max_steps,
+                        base_seed=args.seed,
+                        verbose=args.verbose,
+                    )
+                )
+                print_comparison(student_result, reference_result)
+            else:
+                result = asyncio.run(
+                    evaluate_submission(
+                        submission_path=submission_path,
+                        game=args.game,
+                        num_trials=args.trials,
+                        max_steps=args.max_steps,
+                        base_seed=args.seed,
+                        verbose=args.verbose,
+                    )
+                )
+                print("\n" + result.summary_str())
+    # Handle batch evaluation
+    elif args.submissions_dir:
+        results = asyncio.run(
+            batch_evaluate(
+                submissions_dir=args.submissions_dir,
+                game=args.game,
+                num_trials=args.trials,
+                max_steps=args.max_steps,
+                base_seed=args.seed,
+                output_path=args.output,
+                verbose=args.verbose,
+            )
+        )
+        # Print leaderboard
+        print("\n" + "=" * 60)
+        print("LEADERBOARD")
+        print("=" * 60)
+        print(f"\n{'Rank':<6} {'Student':<30} {'Mean Score':<12} {'Std':<10}")
+        print("-" * 58)
+        for i, r in enumerate(results):
+            print(f"{i+1:<6} {r.student_id:<30} {r.mean_score:<12.2f} {r.std_score:<10.2f}")
+    # Handle single submission
+    else:
+        submission_path = args.submission
+        if not submission_path.exists():
+            print(f"Error: Submission path not found: {submission_path}")
+            sys.exit(1)
+        if args.reference:
+            student_result, reference_result = asyncio.run(
+                evaluate_with_reference(
+                    submission_path=submission_path,
+                    game=args.game,
+                    num_trials=args.trials,
+                    max_steps=args.max_steps,
+                    base_seed=args.seed,
+                    verbose=args.verbose,
+                )
+            )
+            print_comparison(student_result, reference_result)
+            # Save results if output specified
+            if args.output:
+                output_data = {
+                    "evaluation_date": datetime.now().isoformat(),
+                    "student": student_result.to_dict(),
+                    "reference": reference_result.to_dict(),
+                }
+                with open(args.output, "w") as f:
+                    json.dump(output_data, f, indent=2)
+                print(f"\nResults saved to {args.output}")
+        else:
+            result = asyncio.run(
+                evaluate_submission(
+                    submission_path=submission_path,
+                    game=args.game,
+                    num_trials=args.trials,
+                    max_steps=args.max_steps,
+                    base_seed=args.seed,
+                    verbose=args.verbose,
+                )
+            )
+            print("\n" + result.summary_str())
+            # Save results if output specified
+            if args.output:
+                with open(args.output, "w") as f:
+                    json.dump(result.to_dict(), f, indent=2)
+                print(f"\nResults saved to {args.output}")
+if __name__ == "__main__":
+    main()

evaluation/metrics.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""
+Evaluation Metrics for Text Adventure Agents
+Tracks scores across multiple trials and computes statistics.
+"""
+import statistics
+from dataclasses import dataclass, field
+from typing import Optional
+@dataclass
+class TrialResult:
+    """Result of a single evaluation trial."""
+    trial_number: int
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: int
+    game_completed: bool
+    error: Optional[str] = None
+    @property
+    def score_percentage(self) -> float:
+        """Score as percentage of max possible."""
+        if self.max_score == 0:
+            return 0.0
+        return (self.final_score / self.max_score) * 100
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "trial_number": self.trial_number,
+            "final_score": self.final_score,
+            "max_score": self.max_score,
+            "score_percentage": round(self.score_percentage, 2),
+            "moves": self.moves,
+            "locations_visited": self.locations_visited,
+            "game_completed": self.game_completed,
+            "error": self.error,
+        }
+@dataclass
+class EvaluationResult:
+    """Aggregated results across all trials."""
+    student_id: str
+    game: str
+    num_trials: int
+    max_steps: int
+    trials: list[TrialResult] = field(default_factory=list)
+    @property
+    def scores(self) -> list[int]:
+        """List of final scores from all trials."""
+        return [t.final_score for t in self.trials if t.error is None]
+    @property
+    def mean_score(self) -> float:
+        """Average score across trials."""
+        if not self.scores:
+            return 0.0
+        return statistics.mean(self.scores)
+    @property
+    def std_score(self) -> float:
+        """Standard deviation of scores."""
+        if len(self.scores) < 2:
+            return 0.0
+        return statistics.stdev(self.scores)
+    @property
+    def min_score(self) -> int:
+        """Minimum score achieved."""
+        if not self.scores:
+            return 0
+        return min(self.scores)
+    @property
+    def max_score_achieved(self) -> int:
+        """Maximum score achieved."""
+        if not self.scores:
+            return 0
+        return max(self.scores)
+    @property
+    def successful_trials(self) -> int:
+        """Number of trials that completed without error."""
+        return len([t for t in self.trials if t.error is None])
+    @property
+    def mean_moves(self) -> float:
+        """Average number of moves across trials."""
+        moves = [t.moves for t in self.trials if t.error is None]
+        if not moves:
+            return 0.0
+        return statistics.mean(moves)
+    @property
+    def mean_locations(self) -> float:
+        """Average number of locations visited."""
+        locs = [t.locations_visited for t in self.trials if t.error is None]
+        if not locs:
+            return 0.0
+        return statistics.mean(locs)
+    def add_trial(self, trial: TrialResult) -> None:
+        """Add a trial result."""
+        self.trials.append(trial)
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "student_id": self.student_id,
+            "game": self.game,
+            "num_trials": self.num_trials,
+            "max_steps": self.max_steps,
+            "successful_trials": self.successful_trials,
+            "summary": {
+                "mean_score": round(self.mean_score, 2),
+                "std_score": round(self.std_score, 2),
+                "min_score": self.min_score,
+                "max_score": self.max_score_achieved,
+                "mean_moves": round(self.mean_moves, 2),
+                "mean_locations": round(self.mean_locations, 2),
+            },
+            "trials": [t.to_dict() for t in self.trials],
+        }
+    def summary_str(self) -> str:
+        """Human-readable summary."""
+        lines = [
+            f"Evaluation Results: {self.student_id}",
+            f"{'=' * 50}",
+            f"Game: {self.game}",
+            f"Trials: {self.successful_trials}/{self.num_trials} successful",
+            f"Max steps per trial: {self.max_steps}",
+            f"",
+            f"Score Statistics:",
+            f"  Mean:  {self.mean_score:.2f}",
+            f"  Std:   {self.std_score:.2f}",
+            f"  Min:   {self.min_score}",
+            f"  Max:   {self.max_score_achieved}",
+            f"",
+            f"Exploration:",
+            f"  Mean moves:     {self.mean_moves:.1f}",
+            f"  Mean locations: {self.mean_locations:.1f}",
+            f"",
+            f"Per-Trial Scores: {self.scores}",
+        ]
+        return "\n".join(lines)

evaluation/runner.py ADDED Viewed

	@@ -0,0 +1,188 @@

+"""
+Agent Runner for Evaluation
+Handles spawning the MCP server subprocess and running the agent.
+Provides isolation between trials and proper cleanup.
+"""
+import asyncio
+import importlib.util
+import os
+import subprocess
+import sys
+import time
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Optional
+from fastmcp import Client
+from fastmcp.client.transports import StdioTransport
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from games.zork_env import list_available_games
+@dataclass
+class RunConfig:
+    """Configuration for a single agent run."""
+    agent_path: Path
+    server_path: Path
+    game: str
+    max_steps: int
+    seed: int
+    verbose: bool = False
+@dataclass
+class RunResult:
+    """Result of a single agent run."""
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: set[str]
+    game_completed: bool
+    error: Optional[str] = None
+    history: list[tuple[str, str, str]] = None  # (thought, action, result)
+    def __post_init__(self):
+        if self.history is None:
+            self.history = []
+def load_agent_class(agent_path: Path):
+    """
+    Dynamically load the agent class from student's agent.py.
+    Expects the student file to define a class called 'StudentAgent'
+    with an async method 'run(client, game, max_steps, seed)'.
+    """
+    spec = importlib.util.spec_from_file_location("student_agent", agent_path)
+    module = importlib.util.module_from_spec(spec)
+    # Add the submission directory to path so relative imports work
+    submission_dir = str(agent_path.parent)
+    if submission_dir not in sys.path:
+        sys.path.insert(0, submission_dir)
+    spec.loader.exec_module(module)
+    if not hasattr(module, "StudentAgent"):
+        raise ValueError(
+            f"Agent file {agent_path} must define a 'StudentAgent' class"
+        )
+    return module.StudentAgent
+async def run_agent_with_server(config: RunConfig) -> RunResult:
+    """
+    Run the student's agent with their MCP server.
+    1. Spawns the MCP server as a subprocess
+    2. Connects the agent via FastMCP Client
+    3. Runs the agent for max_steps
+    4. Collects and returns results
+    """
+    # Validate paths
+    if not config.agent_path.exists():
+        return RunResult(
+            final_score=0,
+            max_score=0,
+            moves=0,
+            locations_visited=set(),
+            game_completed=False,
+            error=f"Agent file not found: {config.agent_path}"
+        )
+    if not config.server_path.exists():
+        return RunResult(
+            final_score=0,
+            max_score=0,
+            moves=0,
+            locations_visited=set(),
+            game_completed=False,
+            error=f"Server file not found: {config.server_path}"
+        )
+    # Validate game
+    available_games = list_available_games()
+    if config.game not in available_games:
+        return RunResult(
+            final_score=0,
+            max_score=0,
+            moves=0,
+            locations_visited=set(),
+            game_completed=False,
+            error=f"Unknown game: {config.game}. Available: {available_games[:10]}..."
+        )
+    try:
+        # Load the student's agent class
+        AgentClass = load_agent_class(config.agent_path)
+        agent = AgentClass()
+        # Create transport for the MCP server
+        # Set environment variable for the game
+        env = os.environ.copy()
+        env["GAME"] = config.game
+        transport = StdioTransport(
+            command=sys.executable,
+            args=[str(config.server_path)],
+            env=env,
+        )
+        # Connect to the server and run the agent
+        async with Client(transport) as client:
+            result = await agent.run(
+                client=client,
+                game=config.game,
+                max_steps=config.max_steps,
+                seed=config.seed,
+                verbose=config.verbose,
+            )
+            return result
+    except Exception as e:
+        import traceback
+        return RunResult(
+            final_score=0,
+            max_score=0,
+            moves=0,
+            locations_visited=set(),
+            game_completed=False,
+            error=f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}"
+        )
+async def run_reference_agent(
+    game: str,
+    max_steps: int,
+    seed: int,
+    verbose: bool = False,
+) -> RunResult:
+    """
+    Run the reference agent (from example_submission) for baseline comparison.
+    """
+    # Use the example as the reference
+    examples_dir = Path(__file__).parent.parent / "example_submission"
+    agent_path = examples_dir / "agent.py"
+    server_path = examples_dir / "mcp_server.py"
+    config = RunConfig(
+        agent_path=agent_path,
+        server_path=server_path,
+        game=game,
+        max_steps=max_steps,
+        seed=seed,
+        verbose=verbose,
+    )
+    return await run_agent_with_server(config)
+def run_single_trial(config: RunConfig) -> RunResult:
+    """Synchronous wrapper for running a single trial."""
+    return asyncio.run(run_agent_with_server(config))

example_submission/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+# Example: MCP ReAct Agent
+This is a complete, working example submission that demonstrates a ReAct agent using MCP.
+## Approach
+This agent uses the full ReAct pattern:
+1. **Thought**: Reason about the current situation
+2. **Tool**: Choose and call an MCP tool
+3. **Observation**: Process the result
+Features:
+- Loop detection (avoids repeating the same action)
+- Action validation (fixes common invalid verbs)
+- Score tracking
+- History management
+## Files
+- `agent.py` - ReAct agent with full implementation
+- `mcp_server.py` - MCP server with memory, map, and inventory tools
+## Testing
+```bash
+# Test locally
+python agent.py
+```

agents/mcp_react_agent.py → example_submission/agent.py RENAMED Viewed

@@ -1,40 +1,68 @@
 """
-MCP ReAct Agent for Text Adventure Games
-A production-ready ReAct agent that uses FastMCP Client to play text adventures via MCP tools.
-This agent connects to the Text Adventure MCP server and uses the LLM to reason and act.
-Features:
-- FastMCP Client integration for MCP server communication
-- ReAct loop (Thought -> Tool -> Observation)
-- Loop detection and action validation
-- History tracking and memory management
-- Score tracking and game over detection
 """
-import asyncio
 import json
 import os
 import re
-import sys
 from dataclasses import dataclass, field
-from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
-from fastmcp import Client
-from fastmcp.client.transports import StdioTransport
 @dataclass
-class MCPAgentConfig:
-    """Configuration for the MCP ReAct agent."""
-    model: str = "meta-llama/Llama-3.2-3B-Instruct"
-    game: str = "zork1"  # Default game to play
-    temperature: float = 0.7
-    max_tokens: int = 300
-    max_history: int = 10
-    verbose: bool = True
 SYSTEM_PROMPT = """You are an expert text adventure game player. Your goal is to explore, collect treasures, and maximize your score.
 AVAILABLE TOOLS (use these via MCP):
@@ -42,9 +70,6 @@ AVAILABLE TOOLS (use these via MCP):
 2. memory - Get current game state, score, and recent history
 3. get_map - See explored locations and connections
 4. inventory - Check what you're carrying
-5. hint - Get a hint if stuck
-6. list_games - See available games
-7. reset_game - Switch to a different game
 VALID GAME COMMANDS for play_action:
 - Movement: north, south, east, west, up, down, enter, exit
@@ -84,176 +109,145 @@ STRATEGY:
 DO NOT repeat the same action multiple times in a row."""
-class MCPReActAgent:
     """
-    A ReAct agent that plays text adventure games using MCP tools via FastMCP Client.
-    This is the robust/production version with:
-    - Full MCP integration
     - Loop detection
     - Action validation
-    - Score tracking
     """
-    def __init__(self, mcp_server_path: str, config: MCPAgentConfig = None):
-        """
-        Initialize the MCP ReAct agent.
-        Args:
-            mcp_server_path: Path to the MCP server script
-            config: Agent configuration
-        """
-        load_dotenv()
-        self.mcp_server_path = mcp_server_path
-        self.config = config or MCPAgentConfig()
-        # Override model from environment if set
-        env_model = os.getenv("HF_MODEL")
-        if env_model:
-            self.config.model = env_model
-        # Initialize LLM client
-        token = os.getenv("HF_TOKEN")
-        if not token:
-            raise ValueError("HF_TOKEN not found. Set it in your .env file.")
-        self.llm = InferenceClient(token=token)
-        # Agent state
         self.history: list[dict] = []
-        self.thoughts: list[str] = []
         self.score: int = 0
-        self.max_score: int = 350
-        self.recent_actions: list[str] = []  # For loop detection
-    async def run(self, max_steps: int = 100) -> dict:
-        """
-        Run the ReAct agent loop.
-        Args:
-            max_steps: Maximum number of steps to run
-        Returns:
-            Dictionary with game results
-        """
-        import time
-        start_time = time.time()
-        step = 0
-        game_over = False
-        game_name = self.config.game
-        print("=" * 60)
-        print(f"MCP ReAct Agent - Playing {game_name.upper()}")
-        print(f"Model: {self.config.model}")
-        print("=" * 60)
-        # Set game as environment variable for the server
-        env = os.environ.copy()
-        env["GAME"] = game_name
-        # Create transport with environment variables
-        transport = StdioTransport(
-            command=sys.executable,
-            args=[self.mcp_server_path],
-            env=env,
-        )
-        # Connect to MCP server with game environment
-        async with Client(transport) as client:
-            # List available tools
-            tools = await client.list_tools()
-            tool_names = [t.name for t in tools]
-            print(f"\nConnected to MCP server. Tools: {tool_names}")
-            # Get initial observation
-            result = await client.call_tool("play_action", {"action": "look"})
-            observation = self._extract_result(result)
-            print(f"\n{observation}\n")
-            # Parse initial score
-            self._update_score(observation)
-            # Main ReAct loop
-            for step in range(1, max_steps + 1):
-                print(f"\n{'─' * 50}")
-                print(f"Step {step}/{max_steps} | Score: {self.score}")
-                print("─" * 50)
-                # Build prompt with context
-                prompt = self._build_prompt(observation)
-                # Call LLM for reasoning
-                response = self._call_llm(prompt)
-                # Parse response
-                thought, tool_name, tool_args = self._parse_response(response, tool_names)
-                self.thoughts.append(thought)
-                if self.config.verbose:
-                    print(f"\n[THOUGHT] {thought}")
-                    print(f"[TOOL] {tool_name}({tool_args})")
-                # Validate and fix common issues
-                tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
-                # Check for loops
-                if tool_name == "play_action":
-                    action = tool_args.get("action", "look")
-                    self.recent_actions.append(action)
-                    if len(self.recent_actions) > 5:
-                        self.recent_actions = self.recent_actions[-5:]
-                    # Detect loops
-                    if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
-                        print(f"\n[WARNING] Loop detected - repeating '{action}'")
-                        # Force a different action
-                        tool_args = {"action": "look"}
-                        self.recent_actions.append("look")
-                # Execute tool via MCP
-                try:
-                    result = await client.call_tool(tool_name, tool_args)
-                    observation = self._extract_result(result)
-                    print(f"\n{observation}")
-                except Exception as e:
-                    observation = f"Error executing tool: {e}"
-                    print(f"\n[ERROR] {e}")
-                # Update history
-                self.history.append({
-                    "step": step,
-                    "thought": thought,
-                    "tool": tool_name,
-                    "args": tool_args,
-                    "result": observation[:200]
-                })
-                if len(self.history) > self.config.max_history:
-                    self.history = self.history[-self.config.max_history:]
-                # Update score
-                self._update_score(observation)
-                # Check for game over
-                if self._is_game_over(observation):
-                    game_over = True
-                    print("\n" + "=" * 60)
-                    print("GAME OVER!")
-                    break
-        elapsed_time = time.time() - start_time
-        # Print summary
-        return self._print_summary(step, elapsed_time, game_over)
     def _build_prompt(self, observation: str) -> str:
         """Build the prompt for the LLM with context."""
         parts = []
-        # Score info
-        parts.append(f"Current Score: {self.score}/{self.max_score}")
-        # Recent history (compact)
         if self.history:
             parts.append("\nRecent actions:")
             for entry in self.history[-3:]:
@@ -265,31 +259,11 @@ class MCPReActAgent:
             if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
                 parts.append(f"\n[WARNING: You've been doing '{self.recent_actions[-1]}' repeatedly. TRY SOMETHING DIFFERENT!]")
-        # Current observation
         parts.append(f"\nCurrent situation:\n{observation}")
         parts.append("\nWhat do you do next?")
         return "\n".join(parts)
-    def _call_llm(self, prompt: str) -> str:
-        """Call the LLM for reasoning."""
-        try:
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": prompt}
-            ]
-            response = self.llm.chat.completions.create(
-                model=self.config.model,
-                messages=messages,
-                temperature=self.config.temperature,
-                max_tokens=self.config.max_tokens,
-            )
-            return response.choices[0].message.content
-        except Exception as e:
-            print(f"[LLM Error] {e}")
-            return "THOUGHT: LLM error, trying look.\nTOOL: play_action\nARGS: {\"action\": \"look\"}"
     def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
         """Parse the LLM response to extract thought, tool, and arguments."""
         thought = "No reasoning provided"
@@ -298,7 +272,7 @@ class MCPReActAgent:
         lines = response.strip().split("\n")
-        for i, line in enumerate(lines):
             line_clean = line.strip()
             line_upper = line_clean.upper()
@@ -307,7 +281,6 @@ class MCPReActAgent:
             elif line_upper.startswith("TOOL:"):
                 raw_tool = line_clean.split(":", 1)[1].strip().lower()
-                # Clean up common issues
                 raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
                 raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
                 tool_name = raw_tool
@@ -315,16 +288,13 @@ class MCPReActAgent:
             elif line_upper.startswith("ARGS:"):
                 args_part = line_clean.split(":", 1)[1].strip()
                 try:
-                    # Handle various JSON formats
                     args_part = args_part.replace("'", '"')
                     tool_args = json.loads(args_part)
                 except json.JSONDecodeError:
-                    # Try to extract action from text
                     match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
                     if match:
                         tool_args = {"action": match.group(1)}
                     else:
-                        # Fallback: try to use the whole thing as action
                         tool_args = {"action": "look"}
         return thought, tool_name, tool_args
@@ -333,7 +303,6 @@ class MCPReActAgent:
         """Validate and fix common tool call issues."""
         # Fix tool name
         if tool_name not in valid_tools:
-            # Try common alternatives
             if tool_name in ["action", "do", "command"]:
                 tool_name = "play_action"
             elif tool_name in ["map", "location"]:
@@ -345,11 +314,10 @@ class MCPReActAgent:
             else:
                 tool_name = "play_action"
-        # Fix action in args
         if tool_name == "play_action":
             action = tool_args.get("action", "look")
-            # Fix invalid verbs
             invalid_verb_map = {
                 "check": "examine",
                 "inspect": "examine",
@@ -365,7 +333,6 @@ class MCPReActAgent:
                 words[0] = invalid_verb_map[words[0]]
                 action = " ".join(words)
-            # Clean up action
             action = action.lower().strip()
             action = action.replace("**", "").replace("*", "").replace("`", "")
             action = " ".join(action.split())
@@ -378,25 +345,22 @@ class MCPReActAgent:
         """Extract text from MCP tool result."""
         if hasattr(result, 'content') and result.content:
             return result.content[0].text
         return str(result)
     def _update_score(self, text: str) -> None:
         """Update score from game text."""
-        # Look for score patterns
         patterns = [
-            r'\+(\d+) points',
             r'Score:\s*(\d+)',
-            r'Total:\s*(\d+)',
         ]
         for pattern in patterns:
             match = re.search(pattern, text, re.IGNORECASE)
             if match:
-                score = int(match.group(1))
-                if "+" in pattern:
-                    self.score += score
-                else:
-                    self.score = max(self.score, score)
     def _is_game_over(self, text: str) -> bool:
         """Check if the game is over."""
@@ -408,70 +372,33 @@ class MCPReActAgent:
         ]
         text_lower = text.lower()
         return any(phrase in text_lower for phrase in game_over_phrases)
-    def _print_summary(self, step: int, elapsed_time: float, game_over: bool) -> dict:
-        """Print game summary and return results."""
-        print("\n" + "=" * 60)
-        print("GAME SUMMARY")
-        print("=" * 60)
-        print(f"Final Score: {self.score}/{self.max_score} ({100*self.score/self.max_score:.1f}%)")
-        print(f"Steps Taken: {step}")
-        print(f"Time Elapsed: {elapsed_time:.1f} seconds")
-        print(f"Game Over: {game_over}")
-        print("=" * 60)
-        return {
-            "final_score": self.score,
-            "max_score": self.max_score,
-            "score_percentage": 100 * self.score / self.max_score,
-            "steps": step,
-            "elapsed_time": elapsed_time,
-            "game_over": game_over,
-        }
 # =============================================================================
-# Main
 # =============================================================================
-async def main():
-    """Run the MCP ReAct agent."""
-    import argparse
-    parser = argparse.ArgumentParser(description="Run the MCP ReAct Text Adventure Agent")
-    parser.add_argument(
-        "--server", "-s",
-        default="mcp_server/zork_server.py",
-        help="Path to the MCP server script"
-    )
-    parser.add_argument(
-        "--max-steps", "-n",
-        type=int,
-        default=100,
-        help="Maximum steps to run"
-    )
-    parser.add_argument(
-        "--model",
-        type=str,
-        default=None,
-        help="HuggingFace model to use"
-    )
-    parser.add_argument(
-        "--verbose", "-v",
-        action="store_true",
-        default=True,
-        help="Show detailed output"
-    )
-    args = parser.parse_args()
-    config = MCPAgentConfig(verbose=args.verbose)
-    if args.model:
-        config.model = args.model
-    agent = MCPReActAgent(args.server, config)
-    return await agent.run(max_steps=args.max_steps)
 if __name__ == "__main__":
-    asyncio.run(main())

 """
+Example: MCP ReAct Agent
+A complete ReAct agent that uses MCP tools to play text adventure games.
+This is a working example students can learn from.
 """
 import json
 import os
 import re
 from dataclasses import dataclass, field
+from typing import Optional
 from dotenv import load_dotenv
+from huggingface_hub import InferenceClient
+load_dotenv()
+# =============================================================================
+# LLM Configuration - DO NOT MODIFY
+# =============================================================================
+LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
+def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    """Call the LLM with the given prompt."""
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": prompt},
+    ]
+    response = LLM_CLIENT.chat.completions.create(
+        model=LLM_MODEL,
+        messages=messages,
+        temperature=0.0,
+        max_tokens=max_tokens,
+        seed=seed,
+    )
+    return response.choices[0].message.content
 @dataclass
+class RunResult:
+    """Result of running the agent. Do not modify this class."""
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: set[str]
+    game_completed: bool
+    error: Optional[str] = None
+    history: list[tuple[str, str, str]] = field(default_factory=list)
+# =============================================================================
+# System Prompt
+# =============================================================================
 SYSTEM_PROMPT = """You are an expert text adventure game player. Your goal is to explore, collect treasures, and maximize your score.
 AVAILABLE TOOLS (use these via MCP):
 2. memory - Get current game state, score, and recent history
 3. get_map - See explored locations and connections
 4. inventory - Check what you're carrying
 VALID GAME COMMANDS for play_action:
 - Movement: north, south, east, west, up, down, enter, exit
 DO NOT repeat the same action multiple times in a row."""
+# =============================================================================
+# Student Agent Implementation
+# =============================================================================
+class StudentAgent:
     """
+    MCP ReAct Agent - A complete working example.
+    This agent demonstrates:
+    - ReAct loop (Thought -> Tool -> Observation)
     - Loop detection
     - Action validation
+    - Score tracking via memory tool
     """
+    def __init__(self):
+        """Initialize the agent state."""
         self.history: list[dict] = []
+        self.recent_actions: list[str] = []
         self.score: int = 0
+    async def run(
+        self,
+        client,
+        game: str,
+        max_steps: int,
+        seed: int,
+        verbose: bool = False,
+    ) -> RunResult:
+        """Run the agent for a game session."""
+        locations_visited = set()
+        history = []
+        moves = 0
+        # Get list of available tools
+        tools = await client.list_tools()
+        tool_names = [t.name for t in tools]
+        # Get initial observation
+        result = await client.call_tool("play_action", {"action": "look"})
+        observation = self._extract_result(result)
+        # Track initial location
+        location = observation.split("\n")[0] if observation else "Unknown"
+        locations_visited.add(location)
+        if verbose:
+            print(f"\n{observation}")
+        # Main ReAct loop
+        for step in range(1, max_steps + 1):
+            # Build prompt with context
+            prompt = self._build_prompt(observation)
+            # Call LLM for reasoning (use step-based seed for variety)
+            response = call_llm(prompt, SYSTEM_PROMPT, seed + step)
+            # Parse the response
+            thought, tool_name, tool_args = self._parse_response(response, tool_names)
+            if verbose:
+                print(f"\n--- Step {step} ---")
+                print(f"[THOUGHT] {thought}")
+                print(f"[TOOL] {tool_name}({tool_args})")
+            # Validate and fix common issues
+            tool_name, tool_args = self._validate_tool_call(tool_name, tool_args, tool_names)
+            # Loop detection
+            if tool_name == "play_action":
+                action = tool_args.get("action", "look")
+                self.recent_actions.append(action)
+                if len(self.recent_actions) > 5:
+                    self.recent_actions = self.recent_actions[-5:]
+                # Detect loops - if same action 3 times, force "look"
+                if len(self.recent_actions) >= 3 and len(set(self.recent_actions[-3:])) == 1:
+                    if verbose:
+                        print(f"[WARNING] Loop detected - forcing 'look'")
+                    tool_args = {"action": "look"}
+                    self.recent_actions.append("look")
+                moves += 1
+            # Execute the tool
+            try:
+                result = await client.call_tool(tool_name, tool_args)
+                observation = self._extract_result(result)
+                if verbose:
+                    print(f"[RESULT] {observation[:200]}...")
+            except Exception as e:
+                observation = f"Error: {e}"
+                if verbose:
+                    print(f"[ERROR] {e}")
+            # Track location
+            location = observation.split("\n")[0] if observation else "Unknown"
+            locations_visited.add(location)
+            # Update history
+            self.history.append({
+                "step": step,
+                "thought": thought,
+                "tool": tool_name,
+                "args": tool_args,
+                "result": observation[:200]
+            })
+            if len(self.history) > 10:
+                self.history = self.history[-10:]
+            # Track score from observation
+            self._update_score(observation)
+            # Record in result history
+            history.append((thought, f"{tool_name}({tool_args})", observation[:100]))
+            # Check for game over
+            if self._is_game_over(observation):
+                if verbose:
+                    print("\n*** GAME OVER ***")
+                break
+        return RunResult(
+            final_score=self.score,
+            max_score=350,
+            moves=moves,
+            locations_visited=locations_visited,
+            game_completed=self._is_game_over(observation),
+            history=history,
+        )
     def _build_prompt(self, observation: str) -> str:
         """Build the prompt for the LLM with context."""
         parts = []
+        parts.append(f"Current Score: {self.score}")
+        # Recent history
         if self.history:
             parts.append("\nRecent actions:")
             for entry in self.history[-3:]:
             if self.recent_actions and len(set(self.recent_actions[-3:])) == 1:
                 parts.append(f"\n[WARNING: You've been doing '{self.recent_actions[-1]}' repeatedly. TRY SOMETHING DIFFERENT!]")
         parts.append(f"\nCurrent situation:\n{observation}")
         parts.append("\nWhat do you do next?")
         return "\n".join(parts)
     def _parse_response(self, response: str, valid_tools: list[str]) -> tuple[str, str, dict]:
         """Parse the LLM response to extract thought, tool, and arguments."""
         thought = "No reasoning provided"
         lines = response.strip().split("\n")
+        for line in lines:
             line_clean = line.strip()
             line_upper = line_clean.upper()
             elif line_upper.startswith("TOOL:"):
                 raw_tool = line_clean.split(":", 1)[1].strip().lower()
                 raw_tool = raw_tool.replace("**", "").replace("*", "").replace("`", "")
                 raw_tool = raw_tool.split()[0] if raw_tool else "play_action"
                 tool_name = raw_tool
             elif line_upper.startswith("ARGS:"):
                 args_part = line_clean.split(":", 1)[1].strip()
                 try:
                     args_part = args_part.replace("'", '"')
                     tool_args = json.loads(args_part)
                 except json.JSONDecodeError:
                     match = re.search(r'"action"\s*:\s*"([^"]+)"', args_part)
                     if match:
                         tool_args = {"action": match.group(1)}
                     else:
                         tool_args = {"action": "look"}
         return thought, tool_name, tool_args
         """Validate and fix common tool call issues."""
         # Fix tool name
         if tool_name not in valid_tools:
             if tool_name in ["action", "do", "command"]:
                 tool_name = "play_action"
             elif tool_name in ["map", "location"]:
             else:
                 tool_name = "play_action"
+        # Fix action verbs
         if tool_name == "play_action":
             action = tool_args.get("action", "look")
             invalid_verb_map = {
                 "check": "examine",
                 "inspect": "examine",
                 words[0] = invalid_verb_map[words[0]]
                 action = " ".join(words)
             action = action.lower().strip()
             action = action.replace("**", "").replace("*", "").replace("`", "")
             action = " ".join(action.split())
         """Extract text from MCP tool result."""
         if hasattr(result, 'content') and result.content:
             return result.content[0].text
+        if isinstance(result, list) and result:
+            return result[0].text if hasattr(result[0], 'text') else str(result[0])
         return str(result)
     def _update_score(self, text: str) -> None:
         """Update score from game text."""
         patterns = [
             r'Score:\s*(\d+)',
+            r'score[:\s]+(\d+)',
+            r'\[Score:\s*(\d+)',
         ]
         for pattern in patterns:
             match = re.search(pattern, text, re.IGNORECASE)
             if match:
+                self.score = max(self.score, int(match.group(1)))
     def _is_game_over(self, text: str) -> bool:
         """Check if the game is over."""
         ]
         text_lower = text.lower()
         return any(phrase in text_lower for phrase in game_over_phrases)
 # =============================================================================
+# Local Testing
 # =============================================================================
+async def test_agent():
+    """Test the agent locally."""
+    from fastmcp import Client
+    agent = StudentAgent()
+    async with Client("mcp_server.py") as client:
+        result = await agent.run(
+            client=client,
+            game="zork1",
+            max_steps=20,
+            seed=42,
+            verbose=True,
+        )
+        print(f"\n{'=' * 50}")
+        print(f"Final Score: {result.final_score}")
+        print(f"Moves: {result.moves}")
+        print(f"Locations: {len(result.locations_visited)}")
 if __name__ == "__main__":
+    import asyncio
+    asyncio.run(test_agent())

mcp_server/zork_server.py → example_submission/mcp_server.py RENAMED Viewed

@@ -1,27 +1,8 @@
 """
-Text Adventure MCP Server - Exposes text adventure games via Model Context Protocol.
-This server allows any MCP-compatible agent to play Zork and other text adventure
-games using tools for game actions, memory, mapping, and inventory.
-Uses FastMCP for simple, Pythonic MCP server implementation.
-Usage:
-    # Run directly (stdio transport) - default game is zork1
-    python mcp_server/zork_server.py
-    # Run with a different game
-    GAME=zork2 python mcp_server/zork_server.py
-    GAME=advent python mcp_server/zork_server.py
-    GAME=enchanter python mcp_server/zork_server.py
-    # Use with FastMCP dev tools
-    fastmcp dev mcp_server/zork_server.py
-    # Connect from an MCP client
-    from fastmcp import Client
-    async with Client("mcp_server/zork_server.py") as client:
-        result = await client.call_tool("play_action", {"action": "look"})
 """
 import sys
@@ -49,7 +30,7 @@ class GameState:
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
         self.history: list[tuple[str, str]] = []
-        self.explored_locations: dict[str, set[str]] = {}  # location -> set of exits
         self.current_location: str = self._extract_location(self.state.observation)
     def _extract_location(self, observation: str) -> str:
@@ -82,7 +63,7 @@ class GameState:
     def get_memory(self) -> str:
         """Get a summary of current game state."""
         recent = self.history[-5:] if self.history else []
-        recent_str = "\n".join([f"  > {a} → {r[:60]}..." for a, r in recent]) if recent else "  (none yet)"
         return f"""Current State:
 - Location: {self.current_location}
@@ -120,13 +101,10 @@ Current Observation:
         item_names = []
         for item in items:
             item_str = str(item)
-            # Handle Jericho's object format: "leaflet Parent4 Sibling0..."
-            # Look for "Parent" (case-insensitive) to find where metadata starts
             item_lower = item_str.lower()
             if "parent" in item_lower:
                 idx = item_lower.index("parent")
                 name = item_str[:idx].strip()
-                # Remove leading "obj123: " if present
                 if ":" in name:
                     name = name.split(":", 1)[1].strip()
                 item_names.append(name)
@@ -137,19 +115,9 @@ Current Observation:
                 item_names.append(item_str)
         return f"Inventory: {', '.join(item_names)}"
-    def get_valid_actions(self) -> str:
-        """Get list of valid actions in current state."""
-        try:
-            valid = self.env.get_valid_actions() if hasattr(self.env, 'get_valid_actions') else []
-            if valid:
-                return f"Valid actions: {', '.join(valid[:20])}"
-        except Exception:
-            pass
-        return "Valid actions: Try standard commands like look, north, south, east, west, take <item>, open <thing>"
-# Global game state (initialized on first use)
 _game_state: GameState | None = None
@@ -161,23 +129,15 @@ def get_game() -> GameState:
     return _game_state
-# ============================================================================
 # MCP Tools
-# ============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
     Execute a game action in the text adventure.
-    Common commands:
-    - Movement: north, south, east, west, up, down, enter, exit (or n, s, e, w, u, d)
-    - Objects: take <item>, drop <item>, open <thing>, close <thing>, put <item> in <container>
-    - Look: look, examine <thing>, read <thing>
-    - Combat: attack <enemy> with <weapon>
-    - Light: turn on lamp, light match
-    - Other: wait, score, inventory
     Args:
         action: The command to execute (e.g., 'north', 'take lamp', 'open mailbox')
@@ -187,8 +147,9 @@ def play_action(action: str) -> str:
     game = get_game()
     result = game.take_action(action)
-    # Add score info if points were earned
-    score_info = ""
     if game.state.reward > 0:
         score_info = f"\n\n+{game.state.reward} points! (Total: {game.state.score})"
@@ -204,9 +165,7 @@ def memory() -> str:
     """
     Get a summary of the current game state.
-    Returns your location, score, moves, recent actions, and current observation.
-    Use this to understand where you are and what happened recently.
-    Very useful for avoiding loops and tracking progress.
     """
     return get_game().get_memory()
@@ -214,10 +173,9 @@ def memory() -> str:
 @mcp.tool()
 def get_map() -> str:
     """
-    Get a map showing all locations you have explored and the connections between them.
-    Useful for navigation and planning routes back to previous locations.
-    The map builds up as you explore more of the game world.
     """
     return get_game().get_map()
@@ -226,195 +184,13 @@ def get_map() -> str:
 def inventory() -> str:
     """
     Check what items you are currently carrying.
-    Essential before trying to use, drop, or interact with items.
-    Most games have an inventory limit, so manage your items wisely.
     """
     return get_game().get_inventory()
-@mcp.tool()
-def valid_actions() -> str:
-    """
-    Get a list of valid actions available in the current game state.
-    Helpful when stuck or unsure what commands the game accepts.
-    Note: This may not include all possible actions, just common ones.
-    """
-    return get_game().get_valid_actions()
-@mcp.tool()
-def reset_game(game: str = "zork1") -> str:
-    """
-    Reset the game to the beginning or switch to a different game.
-    Use this to start over if you get stuck, die, or want to try a different game.
-    Args:
-        game: Game name (e.g., 'zork1', 'zork2', 'advent', 'enchanter')
-              Use list_games() to see available options.
-    Returns:
-        The initial game text
-    """
-    global _game_state
-    try:
-        _game_state = GameState(game)
-        return f"Game reset to {game}.\n\n{_game_state.state.observation}"
-    except ValueError as e:
-        return f"Error: {e}"
-@mcp.tool()
-def list_games() -> str:
-    """
-    List all available text adventure games.
-    Returns:
-        List of game names that can be passed to reset_game()
-    """
-    games = list_available_games()
-    return f"Available games ({len(games)} total):\n" + ", ".join(games)
-@mcp.tool()
-def hint() -> str:
-    """
-    Get a hint about what to do next based on your current situation.
-    Provides general guidance without spoiling puzzle solutions.
-    """
-    game = get_game()
-    location = game.current_location.lower()
-    inv = game.get_inventory().lower()
-    observation = game.state.observation.lower()
-    hints = []
-    # Darkness detection (common in many games)
-    if "dark" in location or "dark" in observation or "pitch black" in observation:
-        hints.append("It's dangerous in the dark! You need a light source.")
-        hints.append("If you have a lamp, try 'turn on lamp'.")
-    # Common items to look for
-    if "lamp" in observation and "lamp" not in inv:
-        hints.append("There's a lamp here - light sources are essential!")
-    if "lantern" in observation and "lantern" not in inv:
-        hints.append("There's a lantern here - you'll need light for dark areas!")
-    if "sword" in observation and "sword" not in inv:
-        hints.append("A sword might be useful for combat encounters.")
-    if "key" in observation and "key" not in inv:
-        hints.append("A key might unlock something important.")
-    # Container hints
-    if any(word in observation for word in ["mailbox", "chest", "box", "container", "cabinet"]):
-        hints.append("Try opening containers to find hidden items.")
-    # Door/window hints
-    if "door" in observation or "window" in observation:
-        hints.append("There might be a way in or out here. Try 'open' commands.")
-    # General hints if nothing specific found
-    if not hints:
-        hints.append("Explore all directions: north, south, east, west, up, down.")
-        hints.append("Examine interesting objects with 'examine <thing>'.")
-        hints.append("Pick up useful items with 'take <item>'.")
-        hints.append("Open containers and read documents for clues.")
-    return "Hints:\\n" + "\\n".join(f"  - {h}" for h in hints)
-# ============================================================================
-# MCP Resources
-# ============================================================================
-@mcp.resource("game://state")
-def get_state_resource() -> str:
-    """Current game state as a resource."""
-    return get_game().get_memory()
-@mcp.resource("game://history")
-def get_history_resource() -> str:
-    """Complete action history as a resource."""
-    game = get_game()
-    if not game.history:
-        return "No actions taken yet."
-    lines = [f"{i+1}. {action} -> {result[:80]}..." for i, (action, result) in enumerate(game.history)]
-    return "\n".join(lines)
-@mcp.resource("game://map")
-def get_map_resource() -> str:
-    """Explored map as a resource."""
-    return get_game().get_map()
-# ============================================================================
-# Game Prompt (for agents)
-# ============================================================================
-GAME_PROMPT = """You are playing a classic text adventure game.
-## YOUR GOAL
-Explore the world, solve puzzles, collect treasures, and maximize your score.
-## VALID COMMANDS (use ONLY these exact verbs)
-Movement:
-  north, south, east, west, up, down (or n, s, e, w, u, d)
-  enter, exit, climb, cross, go <direction>
-Looking:
-  look, examine <thing>, look at <thing>, look in <thing>, read <thing>
-Objects:
-  take <item>, drop <item>, pick up <item>
-  open <thing>, close <thing>, unlock <thing> with <key>
-  put <item> in <container>, give <item> to <person>
-Light:
-  turn on lamp, turn off lamp, light match
-Combat:
-  attack <enemy> with <weapon>, kill <enemy> with <weapon>
-Other:
-  inventory (or i), wait (or z), score
-  push <thing>, pull <thing>, move <thing>
-  tie <rope> to <thing>, eat <food>, wave <item>
-## FORBIDDEN VERBS (these will NOT work):
-  check, inspect, search, investigate, grab, pick, use, interact,
-  go to, walk to, head to, travel, proceed
-## STRATEGY TIPS
-1. Explore systematically - check all directions
-2. Read everything - open containers, read documents, examine objects
-3. Use get_map() to track explored locations
-4. Light is essential - find a light source before dark areas!
-5. Manage inventory - you can only carry limited items
-## GETTING STARTED
-1. Call memory() to see your current state
-2. Explore your starting area thoroughly
-3. Pick up useful items (light sources, weapons, keys)
-Good luck!
-"""
-def get_game_prompt(game: str = "zork1") -> str:
-    """Get the system prompt for playing text adventures."""
-    prompt = GAME_PROMPT
-    prompt += f"\n\nNote: Currently playing {game}. Use list_games() to see all 57 available games."
-    return prompt
-# ============================================================================
 # Main
-# ============================================================================
 if __name__ == "__main__":
     mcp.run()

 """
+Example: MCP Server for Text Adventures
+A complete MCP server that exposes text adventure games via tools.
+This demonstrates a full-featured server with memory, mapping, and inventory.
 """
 import sys
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
         self.history: list[tuple[str, str]] = []
+        self.explored_locations: dict[str, set[str]] = {}
         self.current_location: str = self._extract_location(self.state.observation)
     def _extract_location(self, observation: str) -> str:
     def get_memory(self) -> str:
         """Get a summary of current game state."""
         recent = self.history[-5:] if self.history else []
+        recent_str = "\n".join([f"  > {a} -> {r[:60]}..." for a, r in recent]) if recent else "  (none yet)"
         return f"""Current State:
 - Location: {self.current_location}
         item_names = []
         for item in items:
             item_str = str(item)
             item_lower = item_str.lower()
             if "parent" in item_lower:
                 idx = item_lower.index("parent")
                 name = item_str[:idx].strip()
                 if ":" in name:
                     name = name.split(":", 1)[1].strip()
                 item_names.append(name)
                 item_names.append(item_str)
         return f"Inventory: {', '.join(item_names)}"
+# Global game state
 _game_state: GameState | None = None
     return _game_state
+# =============================================================================
 # MCP Tools
+# =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
     Execute a game action in the text adventure.
     Args:
         action: The command to execute (e.g., 'north', 'take lamp', 'open mailbox')
     game = get_game()
     result = game.take_action(action)
+    # Add score info
+    score_info = f"\n\n[Score: {game.state.score} | Moves: {game.state.moves}]"
     if game.state.reward > 0:
         score_info = f"\n\n+{game.state.reward} points! (Total: {game.state.score})"
     """
     Get a summary of the current game state.
+    Returns location, score, moves, recent actions, and current observation.
     """
     return get_game().get_memory()
 @mcp.tool()
 def get_map() -> str:
     """
+    Get a map showing explored locations and connections.
+    Useful for navigation and avoiding getting lost.
     """
     return get_game().get_map()
 def inventory() -> str:
     """
     Check what items you are currently carrying.
     """
     return get_game().get_inventory()
+# =============================================================================
 # Main
+# =============================================================================
 if __name__ == "__main__":
     mcp.run()

function_calling/controller.py DELETED Viewed

@@ -1,291 +0,0 @@
-"""
-Function-Calling Controller for Zork (API-Based)
-This controller uses the HuggingFace API's native function calling feature.
-The model is given tool schemas and can call them via the tools API.
-Model: Llama 3.2 3B Instruct (supports native function calling)
-Compare with simple_controller.py which uses text-based "parsing" approach.
-"""
-import os
-import json
-from dotenv import load_dotenv
-from huggingface_hub import InferenceClient
-from tools import ALL_TOOLS, set_game_state, add_to_history
-# Add parent directory to path to import games module
-import sys
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from games.zork_env import ZorkEnvironment
-# System prompt for the agent
-SYSTEM_PROMPT = """You are playing Zork, a classic text adventure game.
-## YOUR GOAL
-Explore, collect treasures (bring them to the trophy case), and maximize your score.
-## VALID COMMANDS (use ONLY these exact verbs)
-Movement:
-  north, south, east, west, up, down (or n, s, e, w, u, d)
-  enter, exit, climb, cross, go <direction>
-Looking:
-  look, examine <thing>, look at <thing>, look in <thing>, read <thing>
-Objects:
-  take <item>, drop <item>, pick up <item>
-  open <thing>, close <thing>, unlock <thing> with <key>
-  put <item> in <container>, give <item> to <person>
-Light:
-  turn on lamp, turn off lamp, light match
-Combat:
-  attack <enemy> with <weapon>, kill <enemy> with <weapon>
-Other:
-  inventory (or i), wait (or z), score, save, restore
-  push <thing>, pull <thing>, move <thing>, tie <rope> to <thing>
-  eat <food>, drink <liquid>, wave <item>
-## FORBIDDEN (these will NOT work):
-  check, inspect, search, investigate, grab, pick, use, interact,
-  go to, walk to, head to, travel, proceed
-## YOUR TOOLS
-  memory()    - See current state and recent actions
-  get_map()   - See explored locations
-  inventory() - Check what you're carrying
-## RESPONSE FORMAT
-When you want to take a game action, respond with:
-  ACTION: <command>
-Examples:
-  ACTION: open mailbox
-  ACTION: north
-  ACTION: take lamp
-  ACTION: examine leaflet"""
-# Valid Zork command verbs for validation
-VALID_VERBS = {
-    "north", "south", "east", "west", "up", "down", "n", "s", "e", "w", "u", "d",
-    "look", "l", "examine", "x", "read",
-    "take", "get", "drop", "put", "give",
-    "open", "close", "unlock", "lock",
-    "turn", "light", "extinguish", "blow",
-    "attack", "kill", "fight", "hit",
-    "enter", "exit", "go", "climb", "jump",
-    "inventory", "i", "wait", "z", "score",
-    "move", "push", "pull", "tie", "untie",
-    "eat", "drink", "smell", "touch", "rub",
-    "wave", "raise", "lower", "pour",
-    "say", "answer", "yes", "no",
-    "pray", "odysseus", "echo", "hello",
-}
-def validate_action(action: str) -> str:
-    """Validate and potentially fix an action."""
-    action = action.strip().lower()
-    if not action:
-        return "look"
-    verb = action.split()[0]
-    if verb in VALID_VERBS:
-        return action
-    # Common corrections
-    corrections = {
-        "check": "examine",
-        "inspect": "examine",
-        "search": "examine",
-        "grab": "take",
-        "pick": "take",
-        "see": "look",
-        "view": "look",
-        "walk": "go",
-    }
-    if verb in corrections:
-        return corrections[verb] + action[len(verb):]
-    return "look"  # Default fallback
-def build_tool_schemas():
-    """Convert LangChain tools to OpenAI function schemas."""
-    schemas = []
-    for tool in ALL_TOOLS:
-        schema = {
-            "type": "function",
-            "function": {
-                "name": tool.name,
-                "description": tool.description,
-                "parameters": {
-                    "type": "object",
-                    "properties": {},
-                    "required": []
-                }
-            }
-        }
-        schemas.append(schema)
-    return schemas
-def run_tool(tool_name: str) -> str:
-    """Execute a tool by name and return its result."""
-    for tool in ALL_TOOLS:
-        if tool.name == tool_name:
-            return tool.invoke({})
-    return f"Unknown tool: {tool_name}"
-class FunctionCallingController:
-    """Controller using LLM API-based function calling."""
-    def __init__(self, model: str = "meta-llama/Llama-3.2-3B-Instruct"):
-        load_dotenv()
-        token = os.getenv("HF_TOKEN")
-        if not token:
-            raise ValueError("HF_TOKEN not set in environment")
-        self.client = InferenceClient(token=token)
-        self.model = os.getenv("HF_MODEL", model)
-        self.tool_schemas = build_tool_schemas()
-    def get_action(self, observation: str, game_state) -> str:
-        """Get the next action from the LLM."""
-        # Update tool state
-        set_game_state(
-            observation=observation,
-            inventory=list(game_state.inventory) if game_state.inventory else [],
-            score=game_state.score,
-            moves=game_state.moves
-        )
-        # Build messages fresh each time (simpler than managing tool history)
-        messages = [
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user", "content": f"Game output:\n{observation}\n\nWhat do you do?"}
-        ]
-        # Allow up to 3 tool calls before requiring action
-        for _ in range(3):
-            response = self.client.chat.completions.create(
-                model=self.model,
-                messages=messages,
-                tools=self.tool_schemas,
-                tool_choice="auto",
-                max_tokens=300,
-            )
-            message = response.choices[0].message
-            # Check if model wants to use a tool
-            if message.tool_calls:
-                tool_call = message.tool_calls[0]
-                tool_name = tool_call.function.name
-                print(f"  [Tool] {tool_name}")
-                tool_result = run_tool(tool_name)
-                print(f"  {tool_result[:100]}...")
-                # Add tool interaction to messages for next iteration
-                messages.append({
-                    "role": "assistant",
-                    "content": None,
-                    "tool_calls": [{
-                        "id": tool_call.id,
-                        "type": "function",
-                        "function": {"name": tool_name, "arguments": "{}"}
-                    }]
-                })
-                messages.append({
-                    "role": "tool",
-                    "tool_call_id": tool_call.id,
-                    "content": tool_result
-                })
-                # Continue to get the actual action
-                continue
-            # Model responded with text - extract action
-            content = message.content or ""
-            # Look for ACTION: in response
-            if "ACTION:" in content.upper():
-                for line in content.split('\n'):
-                    if "ACTION:" in line.upper():
-                        action = line.split(":", 1)[1].strip().lower()
-                        validated = validate_action(action)
-                        if validated:
-                            return validated
-                        else:
-                            print(f"  [Warning] Invalid action '{action}', defaulting to 'look'")
-                            return "look"
-            # If no ACTION found, try to extract a command from the response
-            content_lower = content.lower().strip()
-            validated = validate_action(content_lower)
-            if validated:
-                return validated
-            # Default
-            return "look"
-        # After 3 tool calls, just return look
-        return "look"
-def main():
-    """Run the API-based function-calling controller."""
-    print("=" * 60)
-    print("Zork - API Function Calling Controller")
-    print("   (using Llama 3.2 3B with native tool calling)")
-    print("=" * 60)
-    controller = FunctionCallingController()
-    env = ZorkEnvironment("zork1")
-    state = env.reset()
-    print(f"\n{state.observation}\n")
-    max_steps = 30
-    for step in range(max_steps):
-        print(f"\n{'─' * 50}")
-        print(f"Step {step + 1}/{max_steps} | Score: {state.score}")
-        print("─" * 50)
-        action = controller.get_action(state.observation, state)
-        print(f"\n> ACTION: {action}")
-        # Take action in game
-        state = env.step(action)
-        add_to_history(action, state.observation)
-        print(f"\n{state.observation}")
-        if state.reward > 0:
-            print(f"\n+{state.reward} points!")
-        if state.done:
-            print("\nGAME OVER!")
-            break
-    print(f"\n{'=' * 60}")
-    print(f"Final Score: {state.score}")
-    print("=" * 60)
-if __name__ == "__main__":
-    main()

function_calling/simple_controller.py DELETED Viewed

@@ -1,268 +0,0 @@
-"""
-Function-Calling Controller for Zork (Text-Based)
-This controller uses text-based "function calling" - the LLM outputs
-TOOL: <name> or ACTION: <command> and we parse the text response.
-Model: Qwen 2.5 7B Instruct (any chat model works)
-This approach is:
-- Simpler and more reliable than API-based function calling
-- Works with any chat model (no special support needed)
-Compare with controller.py which uses API-based tool calling.
-"""
-import os
-import re
-from dotenv import load_dotenv
-from huggingface_hub import InferenceClient
-from tools import ALL_TOOLS, set_game_state, add_to_history
-# Add parent directory to path
-import sys
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from games.zork_env import ZorkEnvironment
-SYSTEM_PROMPT = """You are playing Zork, a classic text adventure game.
-## YOUR GOAL
-Explore, collect treasures (bring them to the trophy case), and maximize your score.
-## VALID COMMANDS (use ONLY these exact verbs)
-Movement:
-  north, south, east, west, up, down (or n, s, e, w, u, d)
-  enter, exit, climb, cross, go <direction>
-Looking:
-  look, examine <thing>, look at <thing>, look in <thing>, read <thing>
-Objects:
-  take <item>, drop <item>, pick up <item>
-  open <thing>, close <thing>, unlock <thing> with <key>
-  put <item> in <container>, give <item> to <person>
-Light:
-  turn on lamp, turn off lamp, light match
-Combat:
-  attack <enemy> with <weapon>, kill <enemy> with <weapon>
-Other:
-  inventory (or i), wait (or z), score, save, restore
-  push <thing>, pull <thing>, move <thing>, tie <rope> to <thing>
-  eat <food>, drink <liquid>, wave <item>
-## FORBIDDEN (these will NOT work):
-  check, inspect, search, investigate, grab, pick, use, interact,
-  go to, walk to, head to, travel, proceed
-## YOUR TOOLS
-  TOOL: memory    - See current state and recent actions
-  TOOL: get_map   - See explored locations
-  TOOL: inventory - Check what you're carrying
-## RESPONSE FORMAT
-Either use a tool:
-  TOOL: memory
-Or take a game action:
-  ACTION: open mailbox
-Always respond with TOOL: or ACTION: followed by your choice."""
-# Valid Zork command verbs for validation
-VALID_VERBS = {
-    "north", "south", "east", "west", "up", "down", "n", "s", "e", "w", "u", "d",
-    "look", "l", "examine", "x", "read",
-    "take", "get", "drop", "put", "give",
-    "open", "close", "unlock", "lock",
-    "turn", "light", "extinguish", "blow",
-    "attack", "kill", "fight", "hit",
-    "enter", "exit", "go", "climb", "jump",
-    "inventory", "i", "wait", "z", "score",
-    "move", "push", "pull", "tie", "untie",
-    "eat", "drink", "smell", "touch", "rub",
-    "wave", "raise", "lower", "pour",
-    "say", "answer", "yes", "no",
-    "pray", "odysseus", "echo", "hello",
-}
-def run_tool(tool_name: str) -> str:
-    """Execute a tool by name."""
-    tool_name = tool_name.strip().lower().replace(" ", "_")
-    for tool in ALL_TOOLS:
-        if tool.name == tool_name:
-            return tool.invoke({})
-    return f"Unknown tool: {tool_name}. Available: memory, get_map, inventory"
-class SimpleController:
-    """Controller using text-based tool calling."""
-    def __init__(self, model: str = "Qwen/Qwen2.5-7B-Instruct"):
-        load_dotenv()
-        token = os.getenv("HF_TOKEN")
-        if not token:
-            raise ValueError("HF_TOKEN not set in environment")
-        self.client = InferenceClient(token=token)
-        self.model = os.getenv("HF_MODEL", model)
-        self.messages = []
-    def _call_llm(self, user_message: str) -> str:
-        """Call the LLM and get response."""
-        self.messages.append({"role": "user", "content": user_message})
-        # Keep conversation short
-        if len(self.messages) > 15:
-            self.messages = self.messages[-15:]
-        response = self.client.chat.completions.create(
-            model=self.model,
-            messages=[{"role": "system", "content": SYSTEM_PROMPT}] + self.messages,
-            max_tokens=150,
-            temperature=0.7,
-        )
-        reply = response.choices[0].message.content or ""
-        self.messages.append({"role": "assistant", "content": reply})
-        return reply
-    def _validate_action(self, action: str) -> str | None:
-        """Validate and potentially fix an action. Returns None if invalid."""
-        action = action.strip().lower()
-        if not action:
-            return None
-        # Get the first word (verb)
-        verb = action.split()[0]
-        # Check if it's a valid verb
-        if verb in VALID_VERBS:
-            return action
-        # Try common corrections
-        corrections = {
-            "check": "examine",
-            "inspect": "examine",
-            "search": "examine",
-            "grab": "take",
-            "pick": "take",  # "pick up" -> "take"
-            "see": "look",
-            "view": "look",
-            "walk": "go",
-        }
-        if verb in corrections:
-            fixed = corrections[verb] + action[len(verb):]
-            print(f"  [Correcting] '{verb}' -> '{corrections[verb]}'")
-            return fixed
-        return None
-    def get_action(self, observation: str, game_state) -> str:
-        """Get the next action, allowing tool use."""
-        # Update tool state
-        set_game_state(
-            observation=observation,
-            inventory=list(game_state.inventory) if game_state.inventory else [],
-            score=game_state.score,
-            moves=game_state.moves
-        )
-        prompt = f"Game:\n{observation}\n\nRespond with TOOL: or ACTION:"
-        # Allow up to 3 tool calls before requiring an action
-        for _ in range(3):
-            response = self._call_llm(prompt)
-            # Check for TOOL:
-            tool_match = re.search(r'TOOL:\s*(\w+)', response, re.IGNORECASE)
-            if tool_match:
-                tool_name = tool_match.group(1)
-                print(f"  [Tool] {tool_name}")
-                result = run_tool(tool_name)
-                print(f"  {result[:80]}...")
-                # Feed result back
-                prompt = f"Tool result:\n{result}\n\nNow respond with TOOL: or ACTION:"
-                continue
-            # Check for ACTION:
-            action_match = re.search(r'ACTION:\s*(.+)', response, re.IGNORECASE)
-            if action_match:
-                action = action_match.group(1).strip().lower()
-                # Clean up action (remove quotes, extra text)
-                action = action.split('\n')[0].strip('"\'')
-                # Validate the action
-                validated = self._validate_action(action)
-                if validated:
-                    return validated
-                else:
-                    print(f"  [Warning] Invalid action '{action}', asking for retry...")
-                    prompt = f"'{action}' is not a valid Zork command. Use verbs like: look, examine, take, open, north, south, etc.\n\nRespond with ACTION:"
-                    continue
-            # If neither, try to extract a command
-            words = response.lower().split()
-            for cmd in ["north", "south", "east", "west", "up", "down",
-                       "look", "take", "open", "enter", "examine"]:
-                if cmd in words:
-                    idx = words.index(cmd)
-                    return " ".join(words[idx:idx+3])
-            return "look"
-        return "look"
-def main():
-    """Run the simple controller."""
-    print("=" * 60)
-    print("Zork - Simple Function Calling Demo")
-    print("=" * 60)
-    controller = SimpleController()
-    env = ZorkEnvironment("zork1")
-    state = env.reset()
-    print(f"\n{state.observation}\n")
-    max_steps = 30
-    for step in range(max_steps):
-        print(f"\n{'─' * 50}")
-        print(f"Step {step + 1}/{max_steps} | Score: {state.score}")
-        print("─" * 50)
-        action = controller.get_action(state.observation, state)
-        print(f"\n> ACTION: {action}")
-        state = env.step(action)
-        add_to_history(action, state.observation)
-        print(f"\n{state.observation}")
-        if state.reward > 0:
-            print(f"\n+{state.reward} points!")
-        if state.done:
-            print("\nGAME OVER!")
-            break
-    print(f"\n{'=' * 60}")
-    print(f"Final Score: {state.score}")
-    print("=" * 60)
-if __name__ == "__main__":
-    main()

function_calling/tools.py DELETED Viewed

@@ -1,127 +0,0 @@
-"""
-Simple tools for the Zork agent using LangChain's tool decorator.
-"""
-from langchain_core.tools import tool
-# Game state that tools can access (set by the controller)
-_game_state = {
-    "observation": "",
-    "inventory": [],
-    "score": 0,
-    "moves": 0,
-    "history": [],  # List of (action, result) tuples
-}
-def set_game_state(observation: str, inventory: list, score: int, moves: int):
-    """Update the game state (called by controller after each action)."""
-    _game_state["observation"] = observation
-    _game_state["inventory"] = inventory
-    _game_state["score"] = score
-    _game_state["moves"] = moves
-def add_to_history(action: str, result: str):
-    """Add an action and its result to history."""
-    _game_state["history"].append((action, result))
-    # Keep only last 10 actions
-    if len(_game_state["history"]) > 10:
-        _game_state["history"] = _game_state["history"][-10:]
-@tool
-def memory() -> str:
-    """Get a summary of the current game state including location, score, and recent actions."""
-    obs = _game_state["observation"]
-    score = _game_state["score"]
-    moves = _game_state["moves"]
-    # Extract location (first line of observation)
-    lines = obs.strip().split('\n')
-    location = lines[0] if lines else "Unknown"
-    # Recent actions
-    recent = _game_state["history"][-5:] if _game_state["history"] else []
-    recent_str = "\n".join([f"  > {a} → {r[:50]}..." for a, r in recent]) if recent else "  (none yet)"
-    return f"""Current State:
-- Location: {location}
-- Score: {score} points
-- Moves: {moves}
-Recent Actions:
-{recent_str}
-Current Observation:
-{obs}"""
-@tool
-def get_map() -> str:
-    """Get a map showing known locations and connections based on exploration history."""
-    # Build a simple map from history
-    locations = set()
-    connections = []
-    prev_loc = None
-    for action, result in _game_state["history"]:
-        # Extract location from result
-        lines = result.strip().split('\n')
-        if lines:
-            loc = lines[0]
-            locations.add(loc)
-            # If this was a movement action, record connection
-            if action in ["north", "south", "east", "west", "up", "down", "enter", "exit"]:
-                if prev_loc and prev_loc != loc:
-                    connections.append(f"  {prev_loc} --{action}--> {loc}")
-                prev_loc = loc
-    if not locations:
-        return "Map: No locations explored yet. Try moving around!"
-    loc_list = "\n".join([f"  - {loc}" for loc in sorted(locations)])
-    conn_list = "\n".join(connections[-10:]) if connections else "  (no connections recorded)"
-    return f"""Known Locations:
-{loc_list}
-Connections:
-{conn_list}"""
-@tool
-def inventory() -> str:
-    """Get the list of items currently in your inventory."""
-    items = _game_state["inventory"]
-    if not items:
-        return "Inventory: You are empty-handed."
-    # Clean up item names (Jericho returns objects with metadata)
-    item_names = []
-    for item in items:
-        item_str = str(item)
-        # Handle Jericho's object format: "leaflet Parent4 Sibling0..."
-        # Look for "Parent" (case-insensitive) to find where metadata starts
-        item_lower = item_str.lower()
-        if "parent" in item_lower:
-            idx = item_lower.index("parent")
-            name = item_str[:idx].strip()
-            # Remove leading "obj123: " if present
-            if ":" in name:
-                name = name.split(":", 1)[1].strip()
-            item_names.append(name)
-        elif ":" in item_str:
-            name = item_str.split(":")[1].strip()
-            item_names.append(name)
-        else:
-            item_names.append(item_str)
-    return f"Inventory: {', '.join(item_names)}"
-# Export all tools
-ALL_TOOLS = [memory, get_map, inventory]

mcp_server/README.md DELETED Viewed

@@ -1,83 +0,0 @@
-# Zork MCP Server
-This directory contains an MCP (Model Context Protocol) server that exposes Zork game tools to LLM agents.
-## Overview
-The MCP server wraps the Jericho Zork environment and provides tools that any MCP-compatible agent (like Mini SWE Agent) can use to play the game.
-## Tools Available
-| Tool | Description |
-|------|-------------|
-| `play_action(action)` | Execute a game command (e.g., "north", "take lamp") |
-| `memory()` | Get current state summary (location, score, recent actions) |
-| `get_map()` | View explored locations and connections |
-| `inventory()` | Check items you're carrying |
-| `valid_actions()` | Get hints on available commands |
-| `reset_game(game)` | Start over with zork1, zork2, or zork3 |
-| `hint()` | Get contextual hints for your situation |
-## Resources
-The server also exposes MCP resources:
-- `zork://state` - Current game state
-- `zork://history` - Complete action history
-- `zork://map` - Explored locations map
-## Running the Server
-### Standalone (for testing)
-```bash
-python mcp_server/zork_server.py
-```
-### With MCP Inspector (for debugging)
-```bash
-npx @modelcontextprotocol/inspector python mcp_server/zork_server.py
-```
-### With Mini SWE Agent
-```bash
-python play_zork.py
-```
-## Configuration
-The `mcp_config.json` file configures the server for use with MCP clients:
-```json
-{
-  "mcpServers": {
-    "zork": {
-      "command": "python",
-      "args": ["mcp_server/zork_server.py"]
-    }
-  }
-}
-```
-## Architecture
-```
-┌─────────────────────────────────────────┐
-│         MCP Client (Agent)              │
-│   (Mini SWE Agent / Claude / etc.)      │
-└──────────────────┬──────────────────────┘
-                   │ MCP Protocol (stdio)
-                   ▼
-┌─────────────────────────────────────────┐
-│         Zork MCP Server                 │
-│   (FastMCP - zork_server.py)            │
-│                                         │
-│   Tools: play_action, memory, map,      │
-│          inventory, valid_actions,      │
-│          reset_game, hint               │
-└──────────────────┬──────────────────────┘
-                   │
-                   ▼
-┌─────────────────────────────────────────┐
-│     Jericho + Frotz                     │
-│   (Z-machine game interpreter)          │
-└─────────────────────────────────────────┘
-```

mcp_server/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # Text Adventure MCP Server

mcp_server/mcp_config.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-  "mcpServers": {
-    "zork": {
-      "command": "python",
-      "args": ["mcp_server/zork_server.py"],
-      "cwd": "${workspaceFolder}"
-    }
-  }
-}

requirements.txt CHANGED Viewed

@@ -1,9 +1,12 @@
 # Core dependencies
 jericho
 python-dotenv
 # MCP Server
 fastmcp
 # Function calling (optional, for the alternative approach)
-langchain-core

 # Core dependencies
 jericho
 python-dotenv
+spacy
 # MCP Server
 fastmcp
 # Function calling (optional, for the alternative approach)
+langchain-core
+huggingface_hub

run_agent.py CHANGED Viewed

@@ -1,258 +1,138 @@
 #!/usr/bin/env python3
 """
-Unified Text Adventure Agent Runner
-Run different types of LLM agents to play text adventure games:
-  - react:     Basic ReAct agent with HuggingFace models
-  - function:  Function-calling controller (API-based or text-based)
-  - mcp:       MCP ReAct agent using FastMCP Client
 Usage:
-    python run_agent.py --mode react
-    python run_agent.py --mode function
-    python run_agent.py --mode mcp
 Examples:
-    # Run the basic ReAct agent
-    python run_agent.py --mode react
-    # Run the function-calling controller (API-based)
-    python run_agent.py --mode function
-    # Run the function-calling controller (text-based, works with any model)
-    python run_agent.py --mode function --simple
-    # Run with MCP ReAct agent (uses FastMCP Client)
-    python run_agent.py --mode mcp
-    # Play a different game
-    python run_agent.py --mode mcp --game advent
 """
 import argparse
 import sys
 import os
-import time
 from pathlib import Path
 # Add games module to path for discovering available games
 sys.path.insert(0, str(Path(__file__).parent))
-from games.zork_env import list_available_games, TextAdventureEnv
-# =============================================================================
-# Mode: ReAct Agent
-# =============================================================================
-def run_react_agent(args):
-    """Run the basic ReAct agent."""
-    from agents.react_agent import ReActAgent, ReActConfig
-    print("\n[ReAct] Running ReAct Agent")
-    print(f"   Game: {args.game}")
-    print(f"   Model: {args.model}")
-    print()
-    env = TextAdventureEnv(args.game)
-    config = ReActConfig(verbose=args.verbose, model=args.model)
-    agent = ReActAgent(config)
-    return run_game_loop(env, agent, args.max_steps, args.verbose)
-def run_game_loop(env, agent, max_steps: int, verbose: bool) -> dict:
-    """Common game loop for ReAct-style agents."""
-    state = env.reset()
-    agent.reset()
-    print("=" * 60)
-    print(f"{env.game.upper()} - Starting Game")
-    print(f"Max Score: {state.max_score}")
-    print("=" * 60)
-    print(f"\n{state.observation}\n")
-    start_time = time.time()
-    step = 0
-    try:
-        for step in range(1, max_steps + 1):
-            print(f"\n{'─' * 40}")
-            print(f"Step {step}")
-            print("─" * 40)
-            action = agent.choose_action(state.observation, state)
-            print(f"\n> {action}")
-            state = env.step(action)
-            print(f"\n{state.observation}")
-            if state.reward > 0:
-                print(f"\n+{state.reward} points! (Total: {state.score}/{state.max_score})")
-            elif state.reward < 0:
-                print(f"\n{state.reward} points! (Total: {state.score}/{state.max_score})")
-            else:
-                print(f"\nScore: {state.score}/{state.max_score}")
-            agent.update_history(action, state.observation, state)
-            if state.done:
-                print("\n" + "=" * 60)
-                print("GAME OVER!")
-                break
-    except KeyboardInterrupt:
-        print("\n\nGame interrupted by user")
-    elapsed_time = time.time() - start_time
-    return print_summary(env.game, state, step, elapsed_time)
-# =============================================================================
-# Mode: MCP ReAct Agent
-# =============================================================================
-def run_mcp_agent(args):
-    """Run MCP ReAct Agent using FastMCP Client."""
-    import asyncio
-    from agents.mcp_react_agent import MCPReActAgent, MCPAgentConfig
-    print("\n[MCP] Running MCP ReAct Agent with FastMCP")
     print(f"   Game: {args.game}")
-    print(f"   Model: {args.model}")
-    print(f"   Server: mcp_server/zork_server.py")
     print()
-    config = MCPAgentConfig(verbose=args.verbose, model=args.model, game=args.game)
-    agent = MCPReActAgent("mcp_server/zork_server.py", config)
-    return asyncio.run(agent.run(max_steps=args.max_steps))
-# =============================================================================
-# Mode: Function Calling
-# =============================================================================
-def run_function_calling(args):
-    """Run the function-calling controller."""
-    # Import the appropriate controller
-    sys.path.insert(0, str(Path(__file__).parent / "function_calling"))
-    from tools import add_to_history
-    if args.simple:
-        from simple_controller import SimpleController
-        print("\n[Function] Running Function Calling Controller (text-based)")
-        controller = SimpleController(model=args.model)
-    else:
-        from controller import FunctionCallingController
-        print("\n[Function] Running Function Calling Controller (API-based)")
-        controller = FunctionCallingController(model=args.model)
-    print(f"   Game: {args.game}")
-    print(f"   Model: {args.model}")
-    print()
-    env = TextAdventureEnv(args.game)
-    state = env.reset()
-    print("=" * 60)
-    print(f"{args.game.upper()} - Function Calling Mode")
-    print("=" * 60)
-    print(f"\n{state.observation}\n")
-    start_time = time.time()
-    step = 0
-    try:
-        for step in range(1, args.max_steps + 1):
-            print(f"\n{'─' * 50}")
-            print(f"Step {step}/{args.max_steps} | Score: {state.score}")
-            print("─" * 50)
-            action = controller.get_action(state.observation, state)
-            print(f"\n> ACTION: {action}")
-            state = env.step(action)
-            add_to_history(action, state.observation)
-            print(f"\n{state.observation}")
-            if state.reward > 0:
-                print(f"\n+{state.reward} points!")
-            if state.done:
-                print("\nGAME OVER!")
-                break
-    except KeyboardInterrupt:
-        print("\n\nGame interrupted by user")
-    elapsed_time = time.time() - start_time
-    return print_summary(args.game, state, step, elapsed_time)
-# =============================================================================
-# Common Utilities
-# =============================================================================
-def print_summary(game: str, state, step: int, elapsed_time: float) -> dict:
-    """Print game summary and return results dict."""
-    print("\n" + "=" * 60)
-    print("GAME SUMMARY")
-    print("=" * 60)
-    print(f"Game: {game}")
-    print(f"Final Score: {state.score}/{state.max_score} ({100*state.score/state.max_score:.1f}%)")
-    print(f"Total Moves: {state.moves}")
-    print(f"Steps Taken: {step}")
-    print(f"Time Elapsed: {elapsed_time:.1f} seconds")
-    print("=" * 60)
-    return {
-        "game": game,
-        "final_score": state.score,
-        "max_score": state.max_score,
-        "score_percentage": 100 * state.score / state.max_score,
-        "moves": state.moves,
-        "steps": step,
-        "elapsed_time": elapsed_time,
-        "game_over": state.done,
-    }
 def main():
     parser = argparse.ArgumentParser(
-        description="Run an LLM agent to play text adventure games",
         formatter_class=argparse.RawDescriptionHelpFormatter,
-        epilog="""
-Modes:
-  react     Basic ReAct agent (direct game interaction)
-  function  Function-calling controller (use --simple for text-based)
-  mcp       MCP ReAct agent using FastMCP Client (recommended)
 Examples:
-  python run_agent.py --mode react
-  python run_agent.py --mode function
-  python run_agent.py --mode function --simple  # text-based, any model
-  python run_agent.py --mode mcp                # MCP with FastMCP
-  python run_agent.py --mode mcp --game advent  # Play different game
-  python run_agent.py --mode mcp --model google/gemma-2-2b-it
         """
     )
     # Get available games for help text
     available_games = list_available_games()
     game_help = f"Game to play (default: zork1). {len(available_games)} games available."
     parser.add_argument(
-        "--mode", "-m",
         type=str,
-        default="react",
-        choices=["react", "function", "mcp"],
-        help="Which agent mode to use (default: react)"
     )
     parser.add_argument(
         "--game", "-g",
         type=str,
-        default="zork1",
         help=game_help
     )
     parser.add_argument(
@@ -260,31 +140,34 @@ Examples:
         action="store_true",
         help="List all available games and exit"
     )
     parser.add_argument(
         "--max-steps", "-n",
         type=int,
         default=100,
         help="Maximum number of steps to run (default: 100)"
     )
-    parser.add_argument(
-        "--model",
-        type=str,
-        default=None,
-        help="Model to use (default: meta-llama/Llama-3.2-3B-Instruct)"
-    )
     parser.add_argument(
         "--verbose", "-v",
         action="store_true",
         help="Show detailed reasoning from the agent"
     )
-    parser.add_argument(
-        "--simple",
-        action="store_true",
-        help="Use text-based function calling (works with any model, only for --mode function)"
-    )
     args = parser.parse_args()
     # Handle --list-games
     if args.list_games:
         print(f"\nAvailable games ({len(available_games)} total):\n")
@@ -295,41 +178,32 @@ Examples:
             print("  " + "  ".join(f"{g:<15}" for g in row))
         print()
         sys.exit(0)
     # Validate game choice
     if args.game.lower() not in available_games:
         print(f"\nError: Unknown game '{args.game}'")
         print(f"Use --list-games to see {len(available_games)} available options.")
         sys.exit(1)
-    # Get default model from environment
-    default_model = os.getenv("HF_MODEL", "meta-llama/Llama-3.2-3B-Instruct")
-    # Set model if not specified
-    if args.model is None:
-        args.model = default_model
     print("\n" + "=" * 60)
-    print("Text Adventure LLM Agent Runner")
     print("=" * 60)
-    print(f"Mode: {args.mode}" + (" (simple)" if args.simple else ""))
     print(f"Game: {args.game}")
     print(f"Max Steps: {args.max_steps}")
-    print(f"Model: {args.model}")
     print(f"Verbose: {args.verbose}")
-    # Run the selected mode
     try:
-        if args.mode == "react":
-            results = run_react_agent(args)
-        elif args.mode == "function":
-            results = run_function_calling(args)
-        elif args.mode == "mcp":
-            results = run_mcp_agent(args)
-        else:
-            print(f"Unknown mode: {args.mode}")
-            sys.exit(1)
     except FileNotFoundError as e:
         print(f"\n[Error] {e}")
         sys.exit(1)
@@ -344,7 +218,7 @@ Examples:
         print("\nMake sure to install dependencies:")
         print("  pip install -r requirements.txt")
         sys.exit(1)
     return results

 #!/usr/bin/env python3
 """
+Text Adventure Agent Runner
+Run the MCP ReAct agent to play text adventure games like Zork.
 Usage:
+    python run_agent.py
+    python run_agent.py --game advent
+    python run_agent.py --max-steps 50
+    python run_agent.py --agent hidden_submission
 Examples:
+    # Run on Zork 1 with example agent (default)
+    python run_agent.py
+    # Play a different game
+    python run_agent.py --game advent
+    # Use a different agent folder
+    python run_agent.py --agent hidden_submission
+    # List all available games
+    python run_agent.py --list-games
+    # Run with verbose output
+    python run_agent.py -v
 """
 import argparse
 import sys
 import os
+import asyncio
 from pathlib import Path
 # Add games module to path for discovering available games
 sys.path.insert(0, str(Path(__file__).parent))
+from games.zork_env import list_available_games
+def find_agent_folders() -> list[str]:
+    """Find all folders containing agent.py and mcp_server.py."""
+    project_root = Path(__file__).parent
+    agent_folders = []
+    for folder in project_root.iterdir():
+        if folder.is_dir():
+            agent_file = folder / "agent.py"
+            server_file = folder / "mcp_server.py"
+            if agent_file.exists() and server_file.exists():
+                agent_folders.append(folder.name)
+    return sorted(agent_folders)
+async def run_mcp_agent(args):
+    """Run MCP ReAct Agent from the specified folder."""
+    agent_folder = Path(__file__).parent / args.agent
+    agent_file = agent_folder / "agent.py"
+    server_file = agent_folder / "mcp_server.py"
+    # Validate folder structure
+    if not agent_folder.exists():
+        raise FileNotFoundError(f"Agent folder not found: {agent_folder}")
+    if not agent_file.exists():
+        raise FileNotFoundError(f"agent.py not found in {agent_folder}")
+    if not server_file.exists():
+        raise FileNotFoundError(f"mcp_server.py not found in {agent_folder}")
+    # Import from the specified folder
+    sys.path.insert(0, str(agent_folder))
+    from agent import StudentAgent
+    from fastmcp import Client
+    from fastmcp.client.transports import StdioTransport
+    print(f"\n[MCP] Running Student Agent with FastMCP")
+    print(f"   Agent: {args.agent}/")
     print(f"   Game: {args.game}")
     print()
+    agent = StudentAgent()
+    # Create transport for the MCP server
+    env_vars = os.environ.copy()
+    env_vars["GAME"] = args.game
+    transport = StdioTransport(
+        command=sys.executable,
+        args=[str(server_file)],
+        env=env_vars,
+    )
+    async with Client(transport) as client:
+        return await agent.run(
+            client=client,
+            game=args.game,
+            max_steps=args.max_steps,
+            seed=42,  # Using a fixed seed for direct running
+            verbose=args.verbose,
+        )
 def main():
+    # Find available agent folders
+    agent_folders = find_agent_folders()
     parser = argparse.ArgumentParser(
+        description="Run the MCP ReAct agent to play text adventure games",
         formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=f"""
 Examples:
+  python run_agent.py                           # Play Zork 1 with example agent
+  python run_agent.py --game advent             # Play Adventure
+  python run_agent.py --agent hidden_submission # Use hidden agent
+  python run_agent.py --list-games              # List all games
+  python run_agent.py --list-agents             # List all agent folders
+  python run_agent.py -v                        # Verbose output
         """
     )
     # Get available games for help text
     available_games = list_available_games()
     game_help = f"Game to play (default: zork1). {len(available_games)} games available."
+    agent_help = f"Agent folder to use (default: example_submission). Available: {', '.join(agent_folders)}"
     parser.add_argument(
+        "--agent", "-a",
         type=str,
+        default="example_submission",
+        help=agent_help
     )
     parser.add_argument(
         "--game", "-g",
         type=str,
+        default="lostpig",
         help=game_help
     )
     parser.add_argument(
         action="store_true",
         help="List all available games and exit"
     )
+    parser.add_argument(
+        "--list-agents",
+        action="store_true",
+        help="List all available agent folders and exit"
+    )
     parser.add_argument(
         "--max-steps", "-n",
         type=int,
         default=100,
         help="Maximum number of steps to run (default: 100)"
     )
     parser.add_argument(
         "--verbose", "-v",
         action="store_true",
         help="Show detailed reasoning from the agent"
     )
     args = parser.parse_args()
+    # Handle --list-agents
+    if args.list_agents:
+        print(f"\nAvailable agent folders ({len(agent_folders)} total):\n")
+        for folder in agent_folders:
+            print(f"  {folder}/")
+        print("\nEach folder must contain agent.py and mcp_server.py")
+        print()
+        sys.exit(0)
     # Handle --list-games
     if args.list_games:
         print(f"\nAvailable games ({len(available_games)} total):\n")
             print("  " + "  ".join(f"{g:<15}" for g in row))
         print()
         sys.exit(0)
+    # Validate agent choice
+    if args.agent not in agent_folders:
+        print(f"\nError: Unknown agent folder '{args.agent}'")
+        print(f"Available: {', '.join(agent_folders)}")
+        print("Use --list-agents to see details.")
+        sys.exit(1)
     # Validate game choice
     if args.game.lower() not in available_games:
         print(f"\nError: Unknown game '{args.game}'")
         print(f"Use --list-games to see {len(available_games)} available options.")
         sys.exit(1)
     print("\n" + "=" * 60)
+    print("Text Adventure MCP Agent Runner")
     print("=" * 60)
+    print(f"Agent: {args.agent}/")
     print(f"Game: {args.game}")
     print(f"Max Steps: {args.max_steps}")
     print(f"Verbose: {args.verbose}")
+    # Run the agent
     try:
+        results = asyncio.run(run_mcp_agent(args))
     except FileNotFoundError as e:
         print(f"\n[Error] {e}")
         sys.exit(1)
         print("\nMake sure to install dependencies:")
         print("  pip install -r requirements.txt")
         sys.exit(1)
     return results

submission_template/README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+# Student Submission: Text Adventure Agent
+> Replace this with your name and student ID
+## Overview
+This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
+## Approach
+<!-- Describe your approach here -->
+- What strategy does your agent use?
+- What tools did you implement in your MCP server?
+- Any interesting techniques or optimizations?
+## Files
+- `agent.py` - ReAct agent implementation
+- `mcp_server.py` - MCP server with game tools
+- `requirements.txt` - Additional dependencies (if any)
+## Local Testing
+```bash
+# Test the MCP server
+fastmcp dev mcp_server.py
+# Run the agent
+python agent.py
+```

submission_template/agent.py ADDED Viewed

	@@ -0,0 +1,279 @@

+"""
+Student Agent for Text Adventure Games
+This is your submission file. Implement the StudentAgent class to play
+text adventure games using the MCP server you also implement.
+Your agent should:
+1. Connect to the MCP server via the provided client
+2. Use the ReAct pattern (Thought -> Action -> Observation)
+3. Call MCP tools to interact with the game
+4. Maximize the game score within the step limit
+Required method:
+    async def run(self, client, game, max_steps, seed, verbose) -> RunResult
+The 'client' is a FastMCP Client already connected to your MCP server.
+Use it to call tools like: await client.call_tool("play_action", {"action": "look"})
+Tips:
+- Start by looking around and understanding your environment
+- Keep track of visited locations to avoid loops
+- Pick up useful items (lamp, sword, etc.)
+- The seed parameter should be used to set your LLM's seed for reproducibility
+"""
+import json
+import os
+import re
+from dataclasses import dataclass, field
+from typing import Optional
+from dotenv import load_dotenv
+from huggingface_hub import InferenceClient
+# Load environment variables
+load_dotenv()
+# =============================================================================
+# LLM Configuration - DO NOT MODIFY
+# =============================================================================
+# Model to use (fixed for fair evaluation)
+LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+# Initialize the LLM client (uses HF_TOKEN from environment)
+_hf_token = os.getenv("HF_TOKEN")
+if not _hf_token:
+    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
+LLM_CLIENT = InferenceClient(token=_hf_token)
+def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    """
+    Call the LLM with the given prompt. Use this function in your agent.
+    Args:
+        prompt: The user prompt (current game state, history, etc.)
+        system_prompt: The system prompt (instructions for the agent)
+        seed: Random seed for reproducibility
+        max_tokens: Maximum tokens in response (default: 300)
+    Returns:
+        The LLM's response text
+    Example:
+        response = call_llm(
+            prompt="You are in a forest. What do you do?",
+            system_prompt=SYSTEM_PROMPT,
+            seed=42,
+        )
+    """
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": prompt},
+    ]
+    response = LLM_CLIENT.chat.completions.create(
+        model=LLM_MODEL,
+        messages=messages,
+        temperature=0.0,  # Deterministic for reproducibility
+        max_tokens=max_tokens,
+        seed=seed,
+    )
+    return response.choices[0].message.content
+@dataclass
+class RunResult:
+    """Result of running the agent. Do not modify this class."""
+    final_score: int
+    max_score: int
+    moves: int
+    locations_visited: set[str]
+    game_completed: bool
+    error: Optional[str] = None
+    history: list[tuple[str, str, str]] = field(default_factory=list)
+# =============================================================================
+# System Prompt - Customize this for your agent
+# =============================================================================
+SYSTEM_PROMPT = """You are playing a classic text adventure game.
+GOAL: Explore the world, solve puzzles, and maximize your score.
+AVAILABLE TOOLS (use via MCP):
+- play_action: Execute a game command (north, take lamp, open mailbox, etc.)
+- memory: Get current game state and history (if implemented)
+- inventory: Check what you're carrying (if implemented)
+VALID GAME COMMANDS for play_action:
+- Movement: north, south, east, west, up, down, enter, exit
+- Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
+- Other: look, inventory, read <thing>, turn on lamp
+RESPOND IN THIS EXACT FORMAT (no markdown):
+THOUGHT: <your reasoning about what to do next>
+TOOL: <tool_name>
+ARGS: <JSON arguments, e.g., {"action": "look"}>
+Example:
+THOUGHT: I should look around to see where I am.
+TOOL: play_action
+ARGS: {"action": "look"}
+"""
+# =============================================================================
+# Student Agent - IMPLEMENT THIS CLASS
+# =============================================================================
+class StudentAgent:
+    """
+    Your ReAct agent implementation.
+    TODO:
+    1. Implement the run() method with the ReAct loop
+    2. Parse LLM responses to extract tool calls
+    3. Track state and avoid loops
+    Use the provided call_llm() function to interact with the LLM.
+    """
+    def __init__(self):
+        """Initialize your agent here."""
+        # TODO: Initialize any state tracking you need
+        # self.history = []
+        # self.visited_locations = set()
+        pass
+    async def run(
+        self,
+        client,  # FastMCP Client connected to your MCP server
+        game: str,
+        max_steps: int,
+        seed: int,
+        verbose: bool = False,
+    ) -> RunResult:
+        """
+        Run the agent for a game session.
+        Args:
+            client: FastMCP Client connected to your MCP server
+            game: Name of the game being played (e.g., "zork1")
+            max_steps: Maximum number of steps to take
+            seed: Random seed for reproducibility (use for LLM calls)
+            verbose: Whether to print detailed output
+        Returns:
+            RunResult with final score and statistics
+        """
+        # TODO: Implement your ReAct loop here
+        #
+        # Basic structure:
+        # 1. Get initial observation (call play_action with "look")
+        # 2. Loop for max_steps:
+        #    a. Build prompt with current observation and history
+        #    b. Call LLM to get thought and action
+        #    c. Parse the response to extract tool and args
+        #    d. Call the tool via client.call_tool(tool_name, args)
+        #    e. Update history and state
+        #    f. Check for game over
+        # 3. Return RunResult with final statistics
+        # Example of calling a tool:
+        # result = await client.call_tool("play_action", {"action": "look"})
+        # observation = result[0].text if result else "No response"
+        # Example of calling the LLM:
+        # response = call_llm(
+        #     prompt="Current observation: " + observation,
+        #     system_prompt=SYSTEM_PROMPT,
+        #     seed=seed,
+        # )
+        # Placeholder implementation - replace with your code
+        locations_visited = set()
+        history = []
+        final_score = 0
+        moves = 0
+        # TODO: Your implementation here
+        # ...
+        return RunResult(
+            final_score=final_score,
+            max_score=350,  # Zork1 max score, adjust if needed
+            moves=moves,
+            locations_visited=locations_visited,
+            game_completed=False,
+            history=history,
+        )
+    def _build_prompt(self, observation: str, history: list) -> str:
+        """
+        Build the prompt for the LLM.
+        TODO: Implement this to create effective prompts
+        """
+        # TODO: Combine system prompt, history, and current observation
+        pass
+    def _parse_response(self, response: str) -> tuple[str, str, dict]:
+        """
+        Parse LLM response to extract thought, tool name, and arguments.
+        TODO: Implement robust parsing
+        Returns:
+            Tuple of (thought, tool_name, args_dict)
+        """
+        # TODO: Parse the response format:
+        # THOUGHT: ...
+        # TOOL: ...
+        # ARGS: {...}
+        pass
+    def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
+        """
+        Call the LLM with the given prompt.
+        This is a convenience wrapper - you can also use call_llm() directly.
+        """
+        return call_llm(prompt, system_prompt, seed)
+# =============================================================================
+# For local testing
+# =============================================================================
+async def test_agent():
+    """Test the agent locally."""
+    from fastmcp import Client
+    # Path to your MCP server
+    server_path = "mcp_server.py"
+    agent = StudentAgent()
+    async with Client(server_path) as client:
+        result = await agent.run(
+            client=client,
+            game="zork1",
+            max_steps=10,
+            seed=42,
+            verbose=True,
+        )
+        print(f"\nFinal Score: {result.final_score}")
+        print(f"Moves: {result.moves}")
+        print(f"Locations: {result.locations_visited}")
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(test_agent())

submission_template/app.py ADDED Viewed

	@@ -0,0 +1,71 @@

+"""
+Hugging Face Space - Text Adventure Agent Submission
+This is a code-only Space for submitting your agent implementation.
+The evaluation is run separately.
+Files in this submission:
+- agent.py: Your ReAct agent implementation
+- mcp_server.py: Your MCP server implementation
+- requirements.txt: Additional dependencies
+To test locally:
+    fastmcp dev mcp_server.py
+    python agent.py
+"""
+import gradio as gr
+from pathlib import Path
+def read_readme():
+    """Read the README content."""
+    readme_path = Path(__file__).parent / "README.md"
+    if readme_path.exists():
+        return readme_path.read_text()
+    return "# Submission\n\nNo README.md found."
+def read_file_content(filename: str) -> str:
+    """Read a source file's content."""
+    file_path = Path(__file__).parent / filename
+    if file_path.exists():
+        return file_path.read_text()
+    return f"# File not found: {filename}"
+# Create the Gradio interface
+with gr.Blocks(title="Text Adventure Agent Submission") as demo:
+    gr.Markdown("# Text Adventure Agent Submission")
+    gr.Markdown(
+        "This Space contains a student submission for the Text Adventure Agent assignment. "
+        "Use the tabs below to view the submitted code."
+    )
+    with gr.Tabs():
+        with gr.Tab("README"):
+            gr.Markdown(read_readme())
+        with gr.Tab("Agent Code"):
+            gr.Code(
+                value=read_file_content("agent.py"),
+                language="python",
+                label="agent.py",
+            )
+        with gr.Tab("MCP Server Code"):
+            gr.Code(
+                value=read_file_content("mcp_server.py"),
+                language="python",
+                label="mcp_server.py",
+            )
+    gr.Markdown(
+        "---\n"
+        "**Note:** This is a code submission Space. "
+        "Evaluation is performed using the evaluation script."
+    )
+if __name__ == "__main__":
+    demo.launch()

templates/mcp_server_template.py → submission_template/mcp_server.py RENAMED Viewed

@@ -1,15 +1,27 @@
 """
-MCP Server Template for Text Adventure Games
-This is a starter template for building your text adventure MCP server.
-Your task is to implement the tools that allow an AI agent to play text adventures.
-FastMCP makes it easy to create MCP servers - just decorate functions!
-TODO:
-1. Implement the play_action tool (required)
-2. Add helper tools like memory, get_map, inventory (recommended)
-3. Test your server with: fastmcp dev templates/mcp_server_template.py
 """
 import sys
@@ -26,122 +38,172 @@ from games.zork_env import TextAdventureEnv
 # Create the MCP Server
 # =============================================================================
-# TODO: Create a FastMCP server instance
-# Hint: mcp = FastMCP("Your Server Name")
-mcp = FastMCP("Text Adventure Server")
 # =============================================================================
 # Game State Management
 # =============================================================================
-class GameState:
     """
     Manages the text adventure game state.
-    TODO: You may want to extend this class to track:
-    - Action history (for context)
     - Explored locations (for mapping)
-    - Current location name
     """
-    def __init__(self, game: str = "zork1"):
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
-        # TODO: Add more state tracking here
-        # self.history = []
-        # self.explored_locations = {}
-    def take_action(self, action: str) -> str:
-        """Execute a game action and return the result."""
         self.state = self.env.step(action)
         # TODO: Update your state tracking here
         return self.state.observation
-# Global game instance (created on first use)
-_game: GameState | None = None
-def get_game() -> GameState:
-    """Get or create the game instance."""
     global _game
-    if _game is None:
-        _game = GameState()
     return _game
 # =============================================================================
-# MCP Tools - IMPLEMENT THESE!
 # =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
-    Execute a game action in the text adventure.
     This is the main tool for interacting with the game.
-    Common commands:
-    - Movement: north, south, east, west, up, down
-    - Objects: take <item>, drop <item>, open <thing>
-    - Look: look, examine <thing>
     Args:
-        action: The command to execute (e.g., 'north', 'take lamp')
     Returns:
-        The game's response to your action
     """
-    # TODO: Implement this tool
-    # Hint: Use get_game().take_action(action)
     game = get_game()
-    result = game.take_action(action)
-    # TODO: Optionally add score info or game over detection
     return result
-# TODO: Implement additional helper tools
-# These are optional but will help your agent play better!
 # @mcp.tool()
 # def memory() -> str:
 #     """
-#     Get a summary of the current game state.
 #
-#     Returns location, score, recent actions, and current observation.
-#     Use this to understand where you are and what happened recently.
 #     """
-#     # TODO: Implement this
 #     pass
 # @mcp.tool()
 # def get_map() -> str:
 #     """
 #     Get a map of explored locations.
 #
-#     Useful for navigation and avoiding getting lost.
 #     """
-#     # TODO: Implement this
 #     pass
 # @mcp.tool()
-# def inventory() -> str:
 #     """
-#     Check what items you are carrying.
 #     """
-#     # TODO: Implement this
-#     pass
 # =============================================================================
-# Main - Run the server
 # =============================================================================
 if __name__ == "__main__":
-    # This runs the server using stdio transport (for local testing)
     mcp.run()

 """
+Student MCP Server for Text Adventure Games
+This is your MCP server submission. Implement the tools that your agent
+will use to play text adventure games.
+Required tool:
+    play_action(action: str) -> str
+        Execute a game command and return the result.
+Recommended tools:
+    memory() -> str
+        Return current game state, score, and recent history.
+    inventory() -> str
+        Return the player's current inventory.
+    get_map() -> str
+        Return a map of explored locations.
+Test your server with:
+    fastmcp dev submission_template/mcp_server.py
+Then open the MCP Inspector in your browser to test the tools interactively.
 """
 import sys
 # Create the MCP Server
 # =============================================================================
+mcp = FastMCP("Student Text Adventure Server")
 # =============================================================================
 # Game State Management
 # =============================================================================
+class GameManager:
     """
     Manages the text adventure game state.
+    TODO: Extend this class to track:
+    - Action history (for memory tool)
     - Explored locations (for mapping)
+    - Current score and moves
     """
+    def __init__(self):
+        self.env: TextAdventureEnv = None
+        self.state = None
+        self.game_name: str = ""
+        # TODO: Add more state tracking
+        # self.history: list[tuple[str, str]] = []
+        # self.explored_locations: dict[str, set[str]] = {}
+        # self.current_location: str = ""
+    def initialize(self, game: str = "zork1"):
+        """Initialize or reset the game."""
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
+        # TODO: Reset your state tracking here
+        return self.state.observation
+    def step(self, action: str) -> str:
+        """Execute an action and return the result."""
+        if self.env is None:
+            self.initialize()
         self.state = self.env.step(action)
         # TODO: Update your state tracking here
+        # self.history.append((action, self.state.observation))
+        # Update location tracking, etc.
         return self.state.observation
+    def get_score(self) -> int:
+        """Get current score."""
+        return self.state.score if self.state else 0
+    def get_moves(self) -> int:
+        """Get number of moves taken."""
+        return self.state.moves if self.state else 0
+# Global game manager
+_game = GameManager()
+def get_game() -> GameManager:
+    """Get or initialize the game manager."""
     global _game
+    if _game.env is None:
+        # Get game from environment variable (set by evaluator)
+        game = os.environ.get("GAME", "zork1")
+        _game.initialize(game)
     return _game
 # =============================================================================
+# MCP Tools - IMPLEMENT THESE
 # =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
+    Execute a game command and return the result.
     This is the main tool for interacting with the game.
     Args:
+        action: The command to execute (e.g., "north", "take lamp", "open mailbox")
     Returns:
+        The game's response to the action
+    Valid commands include:
+        - Movement: north, south, east, west, up, down, enter, exit
+        - Objects: take <item>, drop <item>, open <thing>, examine <thing>
+        - Other: look, inventory, read <thing>, turn on lamp
     """
     game = get_game()
+    # TODO: You might want to add action validation here
+    # TODO: You might want to include score changes in the response
+    result = game.step(action)
+    # Optional: Append score info
+    # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
     return result
+# TODO: Implement additional tools to help your agent
 # @mcp.tool()
 # def memory() -> str:
 #     """
+#     Get the current game state summary.
 #
+#     Returns:
+#         A summary including current location, score, moves, and recent history
 #     """
+#     game = get_game()
+#     # TODO: Return useful state information
 #     pass
+# @mcp.tool()
+# def inventory() -> str:
+#     """
+#     Check what the player is carrying.
+#
+#     Returns:
+#         List of items in the player's inventory
+#     """
+#     game = get_game()
+#     result = game.step("inventory")
+#     return result
 # @mcp.tool()
 # def get_map() -> str:
 #     """
 #     Get a map of explored locations.
 #
+#     Returns:
+#         A text representation of explored locations and connections
 #     """
+#     game = get_game()
+#     # TODO: Return map of explored locations
 #     pass
 # @mcp.tool()
+# def get_valid_actions() -> str:
 #     """
+#     Get a list of likely valid actions from the current location.
+#
+#     Returns:
+#         List of actions that might work here
 #     """
+#     # This is a hint: Jericho provides get_valid_actions()
+#     game = get_game()
+#     if game.env and game.env.env:
+#         valid = game.env.env.get_valid_actions()
+#         return "Valid actions: " + ", ".join(valid[:20])
+#     return "Could not determine valid actions"
 # =============================================================================
+# Run the server
 # =============================================================================
 if __name__ == "__main__":
+    # This runs the server with stdio transport (for MCP clients)
     mcp.run()

submission_template/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+# Core dependencies (provided by course infrastructure)
+# jericho
+# python-dotenv
+# fastmcp
+# huggingface_hub
+# Required for HF Space
+gradio

templates/README.md DELETED Viewed

@@ -1,129 +0,0 @@
-# Text Adventure LLM Agent Templates
-This folder contains starter templates for building your own AI agent to play text adventure games.
-## Assignment Overview
-You need to implement two components:
-1. **MCP Server** (`mcp_server_template.py`) - Exposes game functionality as tools
-2. **ReAct Agent** (`react_agent_template.py`) - Uses the MCP server to play the game
-## Architecture
-```
-+-------------------+     MCP Protocol     +------------------+
-|                   | <------------------> |                  |
-|   ReAct Agent     |    (tools/calls)     |   MCP Server     |
-|   (Your Agent)    |                      |   (Your Server)  |
-|                   |                      |                  |
-+-------------------+                      +------------------+
-        |                                           |
-        | LLM API                                   | Game API
-        v                                           v
-+-------------------+                      +------------------+
-|                   |                      |                  |
-|   HuggingFace     |                      |  Text Adventure  |
-|   Inference API   |                      |   (Jericho)      |
-+-------------------+                      +------------------+
-```
-## Getting Started
-### 1. Set Up Environment
-```bash
-# Create virtual environment
-uv venv
-source .venv/bin/activate
-# Install dependencies
-uv pip install -r requirements.txt
-# Copy environment file and add your HuggingFace token
-cp .env.example .env
-# Edit .env and add HF_TOKEN=your_token_here
-```
-### 2. Implement the MCP Server
-Start with `mcp_server_template.py`. Your server needs to:
-1. Create a FastMCP server instance
-2. Implement at least the `play_action` tool to send commands to the game
-3. Optionally add helper tools (memory, map, inventory, hints)
-Test your server:
-```bash
-# Run the server directly (will use stdio transport)
-python templates/mcp_server_template.py
-# Or use FastMCP's development tools
-fastmcp dev templates/mcp_server_template.py
-```
-### 3. Implement the ReAct Agent
-Start with `react_agent_template.py`. Your agent needs to:
-1. Connect to your MCP server using FastMCP Client
-2. Implement a ReAct loop (Thought -> Action -> Observation)
-3. Use the LLM to decide what tools to call
-4. Parse the LLM's response and execute the chosen tool
-Test your agent:
-```bash
-python templates/react_agent_template.py
-```
-## MCP Protocol Basics
-MCP (Model Context Protocol) is a standard for LLM-tool communication:
-- **Tools**: Functions the LLM can call (e.g., `play_action`, `get_inventory`)
-- **Resources**: Read-only data (e.g., game state, map)
-- **Prompts**: Reusable prompt templates
-FastMCP makes it easy:
-```python
-# Server side - define a tool
-from fastmcp import FastMCP
-mcp = FastMCP("My Server")
-@mcp.tool()
-def my_tool(arg: str) -> str:
-    """Tool description for the LLM."""
-    return f"Result: {arg}"
-# Client side - call a tool
-from fastmcp import Client
-async with Client(mcp) as client:
-    result = await client.call_tool("my_tool", {"arg": "hello"})
-```
-## Evaluation Criteria
-Your implementation will be evaluated on:
-1. **Correctness**: Does it work? Can it play text adventure games?
-2. **Score**: How many points does your agent achieve?
-3. **Code Quality**: Is your code clean, documented, and well-structured?
-4. **Creativity**: Did you add interesting features or optimizations?
-## Tips
-1. Start simple - get a basic loop working first
-2. Use `memory()` and `get_map()` tools to help the agent track state
-3. Add loop detection to avoid repeating the same actions
-4. Test with verbose output to debug the agent's reasoning
-5. The LLM may generate invalid commands - handle errors gracefully
-## Resources
-- [FastMCP Documentation](https://gofastmcp.com/)
-- [MCP Protocol Specification](https://modelcontextprotocol.io/)
-- [Jericho (Text Adventures)](https://github.com/microsoft/jericho)
-- [HuggingFace Inference API](https://huggingface.co/docs/huggingface_hub/guides/inference)

templates/react_agent_template.py DELETED Viewed

@@ -1,303 +0,0 @@
-"""
-ReAct Agent Template for Text Adventure Games
-This is a starter template for building a ReAct agent that plays text adventures using MCP.
-ReAct (Reasoning + Acting) is a simple but effective agent pattern:
-1. THINK: Reason about the current situation
-2. ACT: Choose and execute a tool
-3. OBSERVE: See the result
-4. Repeat until goal is achieved
-Your task is to implement:
-1. Connect to the MCP server
-2. Implement the ReAct loop
-3. Use the LLM to generate thoughts and choose actions
-TODO:
-1. Set up the MCP client connection
-2. Implement the agent loop
-3. Parse LLM responses to extract tool calls
-"""
-import asyncio
-import os
-from huggingface_hub import InferenceClient
-from dotenv import load_dotenv
-# FastMCP client for connecting to MCP servers
-from fastmcp import Client
-# =============================================================================
-# Configuration
-# =============================================================================
-# Load environment variables
-load_dotenv()
-# LLM Configuration
-MODEL = os.getenv("HF_MODEL", "meta-llama/Llama-3.2-3B-Instruct")
-HF_TOKEN = os.getenv("HF_TOKEN")
-if not HF_TOKEN:
-    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
-# =============================================================================
-# System Prompt - Instructions for the LLM
-# =============================================================================
-SYSTEM_PROMPT = """You are playing a classic text adventure game.
-GOAL: Explore the world, solve puzzles, collect treasures, and maximize your score.
-AVAILABLE TOOLS:
-- play_action: Execute a game command (north, take lamp, open mailbox, etc.)
-- memory: Get current game state summary (optional, if implemented)
-- get_map: See explored locations (optional, if implemented)
-- inventory: Check your items (optional, if implemented)
-VALID GAME COMMANDS:
-- Movement: north, south, east, west, up, down
-- Objects: take <item>, drop <item>, open <thing>, examine <thing>
-- Light: turn on lamp
-RESPOND IN THIS EXACT FORMAT:
-THOUGHT: <your reasoning>
-TOOL: <tool_name>
-ARGS: <arguments as JSON, or empty {} if no args>
-Example:
-THOUGHT: I see a container. I should open it to see what's inside.
-TOOL: play_action
-ARGS: {"action": "open container"}
-"""
-# =============================================================================
-# ReAct Agent Class
-# =============================================================================
-class ReActAgent:
-    """
-    A ReAct agent that uses MCP tools to play text adventures.
-    TODO: Complete this implementation!
-    """
-    def __init__(self, mcp_server_path: str):
-        """
-        Initialize the agent.
-        Args:
-            mcp_server_path: Path to the MCP server script
-        """
-        self.mcp_server_path = mcp_server_path
-        self.llm = InferenceClient(token=HF_TOKEN)
-        self.history: list[dict] = []
-    async def run(self, max_steps: int = 50, verbose: bool = True):
-        """
-        Run the ReAct agent loop.
-        TODO: Implement the main agent loop!
-        Steps:
-        1. Connect to MCP server using FastMCP Client
-        2. Get initial observation (call play_action with "look")
-        3. Loop:
-           a. Build prompt with current observation
-           b. Call LLM to get thought and tool choice
-           c. Parse the response
-           d. Execute the chosen tool via MCP
-           e. Update history with observation
-           f. Check if done
-        """
-        # TODO: Implement the agent loop
-        # Hint: Use `async with Client(self.mcp_server_path) as client:`
-        print("=" * 60)
-        print("Starting Text Adventure ReAct Agent")
-        print("=" * 60)
-        # Connect to the MCP server
-        async with Client(self.mcp_server_path) as client:
-            # List available tools
-            tools = await client.list_tools()
-            print(f"\nAvailable tools: {[t.name for t in tools]}")
-            # Get initial observation
-            result = await client.call_tool("play_action", {"action": "look"})
-            observation = result.content[0].text
-            print(f"\nInitial observation:\n{observation}\n")
-            # Main loop
-            for step in range(1, max_steps + 1):
-                print(f"\n{'─' * 40}")
-                print(f"Step {step}")
-                print("─" * 40)
-                # TODO: Build prompt for LLM
-                prompt = self._build_prompt(observation)
-                # TODO: Call LLM
-                response = self._call_llm(prompt)
-                # TODO: Parse response to get tool and arguments
-                thought, tool_name, tool_args = self._parse_response(response)
-                if verbose:
-                    print(f"\nTHOUGHT: {thought}")
-                    print(f"TOOL: {tool_name}")
-                    print(f"ARGS: {tool_args}")
-                # TODO: Execute the tool via MCP
-                try:
-                    result = await client.call_tool(tool_name, tool_args)
-                    observation = result.content[0].text
-                    print(f"\nRESULT:\n{observation}")
-                except Exception as e:
-                    observation = f"Error: {e}"
-                    print(f"\nERROR: {e}")
-                # TODO: Update history
-                self.history.append({
-                    "thought": thought,
-                    "tool": tool_name,
-                    "args": tool_args,
-                    "result": observation
-                })
-                # Check for game over
-                if "GAME OVER" in observation.upper():
-                    print("\n\nGame Over!")
-                    break
-        print("\n" + "=" * 60)
-        print("Agent finished")
-        print("=" * 60)
-    def _build_prompt(self, observation: str) -> str:
-        """
-        Build the prompt for the LLM.
-        TODO: Customize this to include relevant context!
-        Consider including:
-        - Current observation
-        - Recent history (last few actions and results)
-        - Warnings about repeated actions
-        """
-        parts = []
-        # Add recent history (last 3 actions)
-        if self.history:
-            parts.append("Recent actions:")
-            for entry in self.history[-3:]:
-                parts.append(f"  > {entry['tool']}({entry['args']}) -> {entry['result'][:100]}...")
-            parts.append("")
-        # Current observation
-        parts.append(f"Current observation:\n{observation}")
-        parts.append("\nWhat do you do next?")
-        return "\n".join(parts)
-    def _call_llm(self, prompt: str) -> str:
-        """
-        Call the LLM to get the next action.
-        TODO: Customize LLM parameters if needed.
-        """
-        try:
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": prompt}
-            ]
-            response = self.llm.chat.completions.create(
-                model=MODEL,
-                messages=messages,
-                temperature=0.7,
-                max_tokens=200,
-            )
-            return response.choices[0].message.content
-        except Exception as e:
-            print(f"LLM Error: {e}")
-            return "THOUGHT: Error occurred.\nTOOL: play_action\nARGS: {\"action\": \"look\"}"
-    def _parse_response(self, response: str) -> tuple[str, str, dict]:
-        """
-        Parse the LLM response to extract thought, tool, and arguments.
-        TODO: Make this more robust!
-        Expected format:
-        THOUGHT: <reasoning>
-        TOOL: <tool_name>
-        ARGS: <json args>
-        """
-        import json
-        thought = ""
-        tool_name = "play_action"
-        tool_args = {"action": "look"}
-        lines = response.strip().split("\n")
-        for line in lines:
-            line_upper = line.upper().strip()
-            if line_upper.startswith("THOUGHT:"):
-                thought = line.split(":", 1)[1].strip()
-            elif line_upper.startswith("TOOL:"):
-                tool_name = line.split(":", 1)[1].strip().lower()
-            elif line_upper.startswith("ARGS:"):
-                try:
-                    args_str = line.split(":", 1)[1].strip()
-                    tool_args = json.loads(args_str)
-                except (json.JSONDecodeError, IndexError):
-                    # Try to extract action from malformed args
-                    if "action" in args_str.lower():
-                        # Simple extraction for common case
-                        tool_args = {"action": "look"}
-        return thought, tool_name, tool_args
-# =============================================================================
-# Main - Run the agent
-# =============================================================================
-async def main():
-    """Run the ReAct agent."""
-    import argparse
-    parser = argparse.ArgumentParser(description="Run the ReAct Text Adventure Agent")
-    parser.add_argument(
-        "--server", "-s",
-        default="templates/mcp_server_template.py",
-        help="Path to the MCP server script"
-    )
-    parser.add_argument(
-        "--max-steps", "-n",
-        type=int,
-        default=50,
-        help="Maximum steps to run"
-    )
-    parser.add_argument(
-        "--verbose", "-v",
-        action="store_true",
-        default=True,
-        help="Show detailed output"
-    )
-    args = parser.parse_args()
-    agent = ReActAgent(args.server)
-    await agent.run(max_steps=args.max_steps, verbose=args.verbose)
-if __name__ == "__main__":
-    asyncio.run(main())