janisaiad commited on
Commit
4c9426c
·
1 Parent(s): 59f6ee5

blog changes because not pro

Browse files
Files changed (3) hide show
  1. README.md +28 -46
  2. app.py +22 -10
  3. mcp_server.py +1 -1
README.md CHANGED
@@ -10,9 +10,28 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # Text Adventure Agent Submission
14
 
15
- > **Abstract** We built a ReAct+MCP text-adventure agent and **stress-tested it at scale**: hundreds of non-walkthrough runs, heuristic ablation, UCB vs no-UCB, stagnation tuning, repeat-blocking, and exploration injection. **Results on Zork I:** **5** points and **35 locations** explored in the best non-walkthrough run (no hints); **109** points with hint-informed exploration (walkthrough-style). The 5-point run is reproducible under a tested config (repeat-blocking + forced exploration); we attribute the score to exploration heuristics, not RNG. Full design, run proofs, and learning pipeline are in `refs/`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Overview
18
 
@@ -41,13 +60,11 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
41
 
42
  ## Blog and report (SFT / RL)
43
 
44
- The full design narrative, learning pipeline, and SFT/RL work are in **`refs/blog.md`** (in this submission). The blog starts with an evaluation/context section (lexicographic score + locations, context management, no repeat of failed actions) and where each appears in the blog. It covers:
45
 
46
- - Game-agnostic design (mechanisms vs hints), overnight trace collection, statistical learning (`learned_*.json`), in-context learning (ICL) over 55 games
47
- - **SFT:** walkthrough-modified traces, thought/reasoning generation (`add_thought_to_traces.py`), training format with return-to-go
48
- - **RL / GRPO:** Group Relative Policy Optimization, turn-wise GRPO, state forking with Jericho, `train_grpo.py`, `prepare_grpo_dataset.py`
49
- - Data: `generate_walkthrough_modified_traces.py`, `generate_random_traces.py`, `generate_all_data.sh`, ~4.8M steps across 55 games; Decision Transformer and reward-conditioned BC
50
- - ICL results (action-only 75, CoT 64), RAG/GRPO next steps, and references to learnable knowledge (see `refs/MCP_AGENT_IMPLEMENTATION.md` §8), `ICL_DESIGN.md`
51
 
52
  ## Short blog post — What we did (Challenge 3)
53
 
@@ -90,48 +107,13 @@ The main differentiator is **heavy context management**: never repeating an acti
90
  - **Non-walkthrough Zork:** One run reached 5/350 under a **tested config** (not RNG): heuristics caused the score; UCB off by default; Zork knowledge, BFS, clear prompt.
91
  - **Game-agnostic design:** Mechanisms in code, hints from data; ICL over 55 games works best action-only (75); finetuning path uses walkthrough-modified and random traces, DT, thought-augmented data.
92
 
93
- The full narrative with every detail is in **`refs/blog.md`** and **`refs/ALL_REFS_ONE_FILE.md`**.
94
-
95
- ## Files
96
-
97
- | File | Description |
98
- |------|-------------|
99
- | `agent.py` | ReAct agent with `StudentAgent` class |
100
- | `mcp_server.py` | MCP server with game interaction tools |
101
- | `app.py` | Gradio interface for HF Space |
102
- | `requirements.txt` | Additional dependencies |
103
- | `refs/` | Blog, run proofs, evaluation/context notes (see `refs/README.md`). Self-contained. |
104
-
105
- ## How to Submit
106
-
107
- 1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
108
- 2. Clone your fork locally
109
- 3. Implement your agent in `agent.py` and `mcp_server.py`
110
- 4. Test locally (see below)
111
- 5. Push your changes to your Space
112
- 6. Submit your Space URL on the course platform
113
-
114
- ## Local Testing
115
-
116
- ```bash
117
- # Install dependencies
118
- uv sync
119
-
120
- # Test the MCP server interactively
121
- fastmcp dev mcp_server.py
122
-
123
- # Run your agent on a game
124
- python run_agent.py --agent . --game lostpig -v -n 20
125
-
126
- # Run evaluation
127
- python -m evaluation.evaluate -s . -g lostpig -t 3
128
- ```
129
 
130
  ## Reproducing the 5-point run (35 locations, Zork I)
131
 
132
  **Important:** Evaluation uses **non-walkthrough** runs only. Any run that feeds the walkthrough to the LLM at test time is **cheating** (the model sees the solution). We ran one LLM walkthrough run **for curiosity only** (see the big repo). Our submitted result is the **non-walkthrough** 5-point run.
133
 
134
- **Run proof in this submission:** `refs/baseline_runs/zork1_20260126_190532_run1/` contains `summary.json` and `detailed_metrics.csv` (201 steps, 35 locations, score 5). Attribution and strategy are in `refs/ALL_REFS_ONE_FILE.md` (Parts 2, 4, and evaluation section).
135
 
136
  To **re-run** the 5-point config (from the **big repo** `challenge3/text-adventure-template` or the repo that contains `challenge3/Agentic-zork`):
137
 
@@ -196,4 +178,4 @@ python visualize_run.py refs/baseline_runs/zork1_20260126_190532_run1 -o refs/ba
196
 
197
  ### Learnable Knowledge (for future RL / SFT)
198
 
199
- Game-specific hints (mailbox→leaflet, kitchen→lantern, move rug→trap door, Gallery painting, etc.) are **not** hardcoded. See **`refs/MCP_AGENT_IMPLEMENTATION.md`** §8 (learnable knowledge catalogue) in this submission. For SFT/RL pipeline (walkthrough-modified traces, thought generation, GRPO, DT), see **`refs/blog.md`** (Parts XVII–XVIII).
 
10
  license: mit
11
  ---
12
 
13
+ <div align="center">
14
 
15
+ # 🖴 Text Adventure Agent Submission
16
+
17
+ *ReAct + MCP · Z-machine · no walkthrough*
18
+
19
+ </div>
20
+
21
+ ---
22
+
23
+ ### Abstract
24
+
25
+ We built a **ReAct+MCP** text-adventure agent and **stress-tested it at scale**: hundreds of non-walkthrough runs, heuristic ablation, UCB vs no-UCB, stagnation tuning, repeat-blocking, and exploration injection.
26
+
27
+ | Mode | Score | Locations |
28
+ |------|-------|-----------|
29
+ | **Non-walkthrough** (no hints) | **5** / 350 | **35** |
30
+ | **Hint-informed** (walkthrough-style) | **109** / 350 | — |
31
+
32
+ The 5-point run is **reproducible** under a tested config (repeat-blocking + forced exploration); we attribute the score to exploration heuristics, not RNG. Full design, run proofs, and learning pipeline → `refs/`.
33
+
34
+ ---
35
 
36
  ## Overview
37
 
 
60
 
61
  ## Blog and report (SFT / RL)
62
 
63
+ Use the **Blog**, **Full refs**, and **MCP Implementation** tabs above (in this Space) to open the full design narrative and refs. They contain:
64
 
65
+ - **Blog** — Full design narrative, learning pipeline, SFT/RL; evaluation/context (lexicographic score + locations, no repeat of failed actions); game-agnostic design (mechanisms vs hints), trace collection, ICL over 55 games, GRPO, walkthrough-modified traces, thought generation, ~4.8M steps; ICL results (action-only 75, CoT 64).
66
+ - **Full refs** — Single-document view of all refs (README, evaluation, run analysis, baseline, MCP implementation, blog).
67
+ - **MCP Implementation** Server and agent description, heuristic plan (§7), learnable knowledge catalogue (§8).
 
 
68
 
69
  ## Short blog post — What we did (Challenge 3)
70
 
 
107
  - **Non-walkthrough Zork:** One run reached 5/350 under a **tested config** (not RNG): heuristics caused the score; UCB off by default; Zork knowledge, BFS, clear prompt.
108
  - **Game-agnostic design:** Mechanisms in code, hints from data; ICL over 55 games works best action-only (75); finetuning path uses walkthrough-modified and random traces, DT, thought-augmented data.
109
 
110
+ The full narrative with every detail is in the **Blog** and **Full refs** tabs above.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
  ## Reproducing the 5-point run (35 locations, Zork I)
113
 
114
  **Important:** Evaluation uses **non-walkthrough** runs only. Any run that feeds the walkthrough to the LLM at test time is **cheating** (the model sees the solution). We ran one LLM walkthrough run **for curiosity only** (see the big repo). Our submitted result is the **non-walkthrough** 5-point run.
115
 
116
+ **Run proof in this submission:** `refs/baseline_runs/zork1_20260126_190532_run1/` contains `summary.json` and `detailed_metrics.csv` (201 steps, 35 locations, score 5). Attribution and strategy are in the **Full refs** tab (Parts 2, 4, and evaluation section).
117
 
118
  To **re-run** the 5-point config (from the **big repo** `challenge3/text-adventure-template` or the repo that contains `challenge3/Agentic-zork`):
119
 
 
178
 
179
  ### Learnable Knowledge (for future RL / SFT)
180
 
181
+ Game-specific hints (mailbox→leaflet, kitchen→lantern, move rug→trap door, Gallery painting, etc.) are **not** hardcoded. See the **MCP Implementation** tab (§8 learnable knowledge catalogue). For SFT/RL pipeline (walkthrough-modified traces, thought generation, GRPO, DT), see the **Blog** tab (Parts XVII–XVIII).
app.py CHANGED
@@ -18,11 +18,11 @@ import gradio as gr
18
  from pathlib import Path
19
 
20
 
21
- def read_readme():
22
  """Read the README content."""
23
  readme_path = Path(__file__).parent / "README.md"
24
  if readme_path.exists():
25
- return readme_path.read_text()
26
  return "# Submission\n\nNo README.md found."
27
 
28
 
@@ -30,16 +30,25 @@ def read_file_content(filename: str) -> str:
30
  """Read a source file's content."""
31
  file_path = Path(__file__).parent / filename
32
  if file_path.exists():
33
- return file_path.read_text()
34
  return f"# File not found: {filename}"
35
 
36
 
 
 
 
 
 
 
 
 
37
  # Create the Gradio interface
38
  with gr.Blocks(title="Text Adventure Agent Submission") as demo:
39
  gr.Markdown("# Text Adventure Agent Submission")
40
  gr.Markdown(
41
  "This Space contains a student submission for the Text Adventure Agent assignment. "
42
- "Use the tabs below to view the submitted code."
 
43
  )
44
 
45
  with gr.Tabs():
@@ -59,12 +68,15 @@ with gr.Blocks(title="Text Adventure Agent Submission") as demo:
59
  language="python",
60
  label="mcp_server.py",
61
  )
62
-
63
- gr.Markdown(
64
- "---\n"
65
- "**Note:** This is a code submission Space. "
66
- "Evaluation is performed using the evaluation script."
67
- )
 
 
 
68
 
69
 
70
  if __name__ == "__main__":
 
18
  from pathlib import Path
19
 
20
 
21
+ def read_readme() -> str:
22
  """Read the README content."""
23
  readme_path = Path(__file__).parent / "README.md"
24
  if readme_path.exists():
25
+ return readme_path.read_text(encoding="utf-8", errors="replace")
26
  return "# Submission\n\nNo README.md found."
27
 
28
 
 
30
  """Read a source file's content."""
31
  file_path = Path(__file__).parent / filename
32
  if file_path.exists():
33
+ return file_path.read_text(encoding="utf-8", errors="replace")
34
  return f"# File not found: {filename}"
35
 
36
 
37
+ def read_ref(ref_path: str) -> str:
38
+ """Read a file from refs/."""
39
+ path = Path(__file__).parent / "refs" / ref_path
40
+ if path.exists():
41
+ return path.read_text(encoding="utf-8", errors="replace")
42
+ return f"# File not found: refs/{ref_path}"
43
+
44
+
45
  # Create the Gradio interface
46
  with gr.Blocks(title="Text Adventure Agent Submission") as demo:
47
  gr.Markdown("# Text Adventure Agent Submission")
48
  gr.Markdown(
49
  "This Space contains a student submission for the Text Adventure Agent assignment. "
50
+ "Use the tabs below to view the submitted code.\n\n"
51
+ "## **THE BLOG POST** (and full refs, MCP implementation) **ARE IN THE TABS BELOW.**"
52
  )
53
 
54
  with gr.Tabs():
 
68
  language="python",
69
  label="mcp_server.py",
70
  )
71
+
72
+ with gr.Tab("Blog"):
73
+ gr.Markdown(read_ref("blog.md"))
74
+
75
+ with gr.Tab("Full refs"):
76
+ gr.Markdown(read_ref("ALL_REFS_ONE_FILE.md"))
77
+
78
+ with gr.Tab("MCP Implementation"):
79
+ gr.Markdown(read_ref("MCP_AGENT_IMPLEMENTATION.md"))
80
 
81
 
82
  if __name__ == "__main__":
mcp_server.py CHANGED
@@ -281,7 +281,7 @@ def get_valid_actions() -> str:
281
  game = get_game()
282
  try:
283
  valid = game.get_valid_actions_zmachine()
284
- return "Valid actions: " + ", ".join(valid[:30])
285
  except Exception:
286
  return "Could not get valid actions (spacy may be required)."
287
 
 
281
  game = get_game()
282
  try:
283
  valid = game.get_valid_actions_zmachine()
284
+ return "Valid actions: " + ", ".join(str(a) for a in valid[:30])
285
  except Exception:
286
  return "Could not get valid actions (spacy may be required)."
287