dspy-zmachine

Sleeping

App Files Files Community

janisaiad commited on 12 days ago

Commit

4c9426c

1 Parent(s): 59f6ee5

blog changes because not pro

Browse files

Files changed (3) hide show

README.md +28 -46
app.py +22 -10
mcp_server.py +1 -1

README.md CHANGED Viewed

@@ -10,9 +10,28 @@ pinned: false
 license: mit
 ---
-# Text Adventure Agent Submission
-> **Abstract** — We built a ReAct+MCP text-adventure agent and **stress-tested it at scale**: hundreds of non-walkthrough runs, heuristic ablation, UCB vs no-UCB, stagnation tuning, repeat-blocking, and exploration injection. **Results on Zork I:** **5** points and **35 locations** explored in the best non-walkthrough run (no hints); **109** points with hint-informed exploration (walkthrough-style). The 5-point run is reproducible under a tested config (repeat-blocking + forced exploration); we attribute the score to exploration heuristics, not RNG. Full design, run proofs, and learning pipeline are in `refs/`.
 ## Overview
@@ -41,13 +60,11 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
 ## Blog and report (SFT / RL)
-The full design narrative, learning pipeline, and SFT/RL work are in **`refs/blog.md`** (in this submission). The blog starts with an evaluation/context section (lexicographic score + locations, context management, no repeat of failed actions) and where each appears in the blog. It covers:
-- Game-agnostic design (mechanisms vs hints), overnight trace collection, statistical learning (`learned_*.json`), in-context learning (ICL) over 55 games
-- **SFT:** walkthrough-modified traces, thought/reasoning generation (`add_thought_to_traces.py`), training format with return-to-go
-- **RL / GRPO:** Group Relative Policy Optimization, turn-wise GRPO, state forking with Jericho, `train_grpo.py`, `prepare_grpo_dataset.py`
-- Data: `generate_walkthrough_modified_traces.py`, `generate_random_traces.py`, `generate_all_data.sh`, ~4.8M steps across 55 games; Decision Transformer and reward-conditioned BC
-- ICL results (action-only 75, CoT 64), RAG/GRPO next steps, and references to learnable knowledge (see `refs/MCP_AGENT_IMPLEMENTATION.md` §8), `ICL_DESIGN.md`
 ## Short blog post — What we did (Challenge 3)
@@ -90,48 +107,13 @@ The main differentiator is **heavy context management**: never repeating an acti
 - **Non-walkthrough Zork:** One run reached 5/350 under a **tested config** (not RNG): heuristics caused the score; UCB off by default; Zork knowledge, BFS, clear prompt.
 - **Game-agnostic design:** Mechanisms in code, hints from data; ICL over 55 games works best action-only (75); finetuning path uses walkthrough-modified and random traces, DT, thought-augmented data.
-The full narrative with every detail is in **`refs/blog.md`** and **`refs/ALL_REFS_ONE_FILE.md`**.
-## Files
-| File | Description |
-|------|-------------|
-| `agent.py` | ReAct agent with `StudentAgent` class |
-| `mcp_server.py` | MCP server with game interaction tools |
-| `app.py` | Gradio interface for HF Space |
-| `requirements.txt` | Additional dependencies |
-| `refs/` | Blog, run proofs, evaluation/context notes (see `refs/README.md`). Self-contained. |
-## How to Submit
-1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
-2. Clone your fork locally
-3. Implement your agent in `agent.py` and `mcp_server.py`
-4. Test locally (see below)
-5. Push your changes to your Space
-6. Submit your Space URL on the course platform
-## Local Testing
-```bash
-# Install dependencies
-uv sync
-# Test the MCP server interactively
-fastmcp dev mcp_server.py
-# Run your agent on a game
-python run_agent.py --agent . --game lostpig -v -n 20
-# Run evaluation
-python -m evaluation.evaluate -s . -g lostpig -t 3
-```
 ## Reproducing the 5-point run (35 locations, Zork I)
 **Important:** Evaluation uses **non-walkthrough** runs only. Any run that feeds the walkthrough to the LLM at test time is **cheating** (the model sees the solution). We ran one LLM walkthrough run **for curiosity only** (see the big repo). Our submitted result is the **non-walkthrough** 5-point run.
-**Run proof in this submission:** `refs/baseline_runs/zork1_20260126_190532_run1/` contains `summary.json` and `detailed_metrics.csv` (201 steps, 35 locations, score 5). Attribution and strategy are in `refs/ALL_REFS_ONE_FILE.md` (Parts 2, 4, and evaluation section).
 To **re-run** the 5-point config (from the **big repo** `challenge3/text-adventure-template` or the repo that contains `challenge3/Agentic-zork`):
@@ -196,4 +178,4 @@ python visualize_run.py refs/baseline_runs/zork1_20260126_190532_run1 -o refs/ba
 ### Learnable Knowledge (for future RL / SFT)
-Game-specific hints (mailbox→leaflet, kitchen→lantern, move rug→trap door, Gallery painting, etc.) are **not** hardcoded. See **`refs/MCP_AGENT_IMPLEMENTATION.md`** §8 (learnable knowledge catalogue) in this submission. For SFT/RL pipeline (walkthrough-modified traces, thought generation, GRPO, DT), see **`refs/blog.md`** (Parts XVII–XVIII).

 license: mit
 ---
+<div align="center">
+# 🖴 Text Adventure Agent Submission
+*ReAct + MCP · Z-machine · no walkthrough*
+</div>
+---
+### Abstract
+We built a **ReAct+MCP** text-adventure agent and **stress-tested it at scale**: hundreds of non-walkthrough runs, heuristic ablation, UCB vs no-UCB, stagnation tuning, repeat-blocking, and exploration injection.
+| Mode | Score | Locations |
+|------|-------|-----------|
+| **Non-walkthrough** (no hints) | **5** / 350 | **35** |
+| **Hint-informed** (walkthrough-style) | **109** / 350 | — |
+The 5-point run is **reproducible** under a tested config (repeat-blocking + forced exploration); we attribute the score to exploration heuristics, not RNG. Full design, run proofs, and learning pipeline → `refs/`.
+---
 ## Overview
 ## Blog and report (SFT / RL)
+Use the **Blog**, **Full refs**, and **MCP Implementation** tabs above (in this Space) to open the full design narrative and refs. They contain:
+- **Blog** — Full design narrative, learning pipeline, SFT/RL; evaluation/context (lexicographic score + locations, no repeat of failed actions); game-agnostic design (mechanisms vs hints), trace collection, ICL over 55 games, GRPO, walkthrough-modified traces, thought generation, ~4.8M steps; ICL results (action-only 75, CoT 64).
+- **Full refs** — Single-document view of all refs (README, evaluation, run analysis, baseline, MCP implementation, blog).
+- **MCP Implementation** — Server and agent description, heuristic plan (§7), learnable knowledge catalogue (§8).
 ## Short blog post — What we did (Challenge 3)
 - **Non-walkthrough Zork:** One run reached 5/350 under a **tested config** (not RNG): heuristics caused the score; UCB off by default; Zork knowledge, BFS, clear prompt.
 - **Game-agnostic design:** Mechanisms in code, hints from data; ICL over 55 games works best action-only (75); finetuning path uses walkthrough-modified and random traces, DT, thought-augmented data.
+The full narrative with every detail is in the **Blog** and **Full refs** tabs above.
 ## Reproducing the 5-point run (35 locations, Zork I)
 **Important:** Evaluation uses **non-walkthrough** runs only. Any run that feeds the walkthrough to the LLM at test time is **cheating** (the model sees the solution). We ran one LLM walkthrough run **for curiosity only** (see the big repo). Our submitted result is the **non-walkthrough** 5-point run.
+**Run proof in this submission:** `refs/baseline_runs/zork1_20260126_190532_run1/` contains `summary.json` and `detailed_metrics.csv` (201 steps, 35 locations, score 5). Attribution and strategy are in the **Full refs** tab (Parts 2, 4, and evaluation section).
 To **re-run** the 5-point config (from the **big repo** `challenge3/text-adventure-template` or the repo that contains `challenge3/Agentic-zork`):
 ### Learnable Knowledge (for future RL / SFT)
+Game-specific hints (mailbox→leaflet, kitchen→lantern, move rug→trap door, Gallery painting, etc.) are **not** hardcoded. See the **MCP Implementation** tab (§8 learnable knowledge catalogue). For SFT/RL pipeline (walkthrough-modified traces, thought generation, GRPO, DT), see the **Blog** tab (Parts XVII–XVIII).

app.py CHANGED Viewed

@@ -18,11 +18,11 @@ import gradio as gr
 from pathlib import Path
-def read_readme():
     """Read the README content."""
     readme_path = Path(__file__).parent / "README.md"
     if readme_path.exists():
-        return readme_path.read_text()
     return "# Submission\n\nNo README.md found."
@@ -30,16 +30,25 @@ def read_file_content(filename: str) -> str:
     """Read a source file's content."""
     file_path = Path(__file__).parent / filename
     if file_path.exists():
-        return file_path.read_text()
     return f"# File not found: {filename}"
 # Create the Gradio interface
 with gr.Blocks(title="Text Adventure Agent Submission") as demo:
     gr.Markdown("# Text Adventure Agent Submission")
     gr.Markdown(
         "This Space contains a student submission for the Text Adventure Agent assignment. "
-        "Use the tabs below to view the submitted code."
     )
     with gr.Tabs():
@@ -59,12 +68,15 @@ with gr.Blocks(title="Text Adventure Agent Submission") as demo:
                 language="python",
                 label="mcp_server.py",
             )
-    gr.Markdown(
-        "---\n"
-        "**Note:** This is a code submission Space. "
-        "Evaluation is performed using the evaluation script."
-    )
 if __name__ == "__main__":

 from pathlib import Path
+def read_readme() -> str:
     """Read the README content."""
     readme_path = Path(__file__).parent / "README.md"
     if readme_path.exists():
+        return readme_path.read_text(encoding="utf-8", errors="replace")
     return "# Submission\n\nNo README.md found."
     """Read a source file's content."""
     file_path = Path(__file__).parent / filename
     if file_path.exists():
+        return file_path.read_text(encoding="utf-8", errors="replace")
     return f"# File not found: {filename}"
+def read_ref(ref_path: str) -> str:
+    """Read a file from refs/."""
+    path = Path(__file__).parent / "refs" / ref_path
+    if path.exists():
+        return path.read_text(encoding="utf-8", errors="replace")
+    return f"# File not found: refs/{ref_path}"
 # Create the Gradio interface
 with gr.Blocks(title="Text Adventure Agent Submission") as demo:
     gr.Markdown("# Text Adventure Agent Submission")
     gr.Markdown(
         "This Space contains a student submission for the Text Adventure Agent assignment. "
+        "Use the tabs below to view the submitted code.\n\n"
+        "## **THE BLOG POST** (and full refs, MCP implementation) **ARE IN THE TABS BELOW.**"
     )
     with gr.Tabs():
                 language="python",
                 label="mcp_server.py",
             )
+        with gr.Tab("Blog"):
+            gr.Markdown(read_ref("blog.md"))
+        with gr.Tab("Full refs"):
+            gr.Markdown(read_ref("ALL_REFS_ONE_FILE.md"))
+        with gr.Tab("MCP Implementation"):
+            gr.Markdown(read_ref("MCP_AGENT_IMPLEMENTATION.md"))
 if __name__ == "__main__":

mcp_server.py CHANGED Viewed

@@ -281,7 +281,7 @@ def get_valid_actions() -> str:
     game = get_game()
     try:
         valid = game.get_valid_actions_zmachine()
-        return "Valid actions: " + ", ".join(valid[:30])
     except Exception:
         return "Could not get valid actions (spacy may be required)."

     game = get_game()
     try:
         valid = game.get_valid_actions_zmachine()
+        return "Valid actions: " + ", ".join(str(a) for a in valid[:30])
     except Exception:
         return "Could not get valid actions (spacy may be required)."