Zork-Adventure-RL-Agent

Sleeping

App Files Files Community

InesManelB commited on Feb 22

Commit

4eb5e8b

1 Parent(s): ac36746

Updated README file

Browse files

Files changed (1) hide show

README.md +17 -16

README.md CHANGED Viewed

@@ -14,15 +14,27 @@ license: mit
 ## Overview
-This is my submission for the Text Adventure Agent assignment. My agent uses the ReAct pattern to play text adventure games via MCP.
 ## Approach
-<!-- Describe your approach here -->
-- What strategy does your agent use?
-- What tools did you implement in your MCP server?
-- Any interesting techniques or optimizations?
 ## Files
@@ -34,14 +46,6 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
 | `app.py` | Gradio interface for HF Space |
 | `requirements.txt` | Additional dependencies |
-## How to Submit
-1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
-2. Clone your fork locally
-3. Implement your agent in `agent.py` and `mcp_server.py`
-4. Test locally (see below)
-5. Push your changes to your Space
-6. Submit your Space URL on the course platform
 ## Local Testing
@@ -54,7 +58,4 @@ fastmcp dev mcp_server.py
 # Run your agent on a game
 python run_agent.py --agent . --game lostpig -v -n 20
-# Run evaluation
-python -m evaluation.evaluate -s . -g lostpig -t 3
 ```

 ## Overview
+This is my submission for the Text Adventure Agent assignment. The agent builds upon the baseline Agentic-Zork implementation, introducing agentic framework improvements and a Just-In-Time Reinforcement Learning (JitRL) mechanism for cross-episode learning without gradient updates.
 ## Approach
+### Agentic Framework
+- **ReAct agent** with structured prompting: the agent proposes multiple candidate actions with confidence scores and reasoning at each step
+- **History summarization**: a dedicated LLM call generates a structured summary (`[SUMMARY]`, `[PROGRESS]`, `[LOCATION]`) at each step to provide compact context without overloading the context window
+- **Valid action constraining**: the agent is restricted to the set of valid actions provided by the Jericho engine, eliminating invalid command errors
+- **Inventory injection**: current inventory is included directly in the prompt, avoiding unnecessary tool calls
+### JitRL Cross-Episode Memory
+- After each episode, an LLM-based evaluator assigns step-level rewards based on long-term impact
+- Discounted returns $G_t$ are computed for each (state, action) pair and stored in a FAISS vector index
+- At each step, similar past states are retrieved and used to estimate action advantages $\hat{A}(s, a) = \hat{Q}(s, a) - \hat{V}(s)$
+- Candidate action scores are updated following the JitRL rule: $z'(s,a) = z(s,a) + \beta \hat{A}(s,a)$
+### MCP Server Tools
+- `play_action`: executes a game command and returns the full game state
+- `reset_game`: initializes or resets the game environment
+- `get_valid_actions`: returns the list of valid actions at the current state
 ## Files
 | `app.py` | Gradio interface for HF Space |
 | `requirements.txt` | Additional dependencies |
 ## Local Testing
 # Run your agent on a game
 python run_agent.py --agent . --game lostpig -v -n 20
 ```