Spaces:
Sleeping
Sleeping
Commit ·
4eb5e8b
1
Parent(s): ac36746
Updated README file
Browse files
README.md
CHANGED
|
@@ -14,15 +14,27 @@ license: mit
|
|
| 14 |
|
| 15 |
## Overview
|
| 16 |
|
| 17 |
-
This is my submission for the Text Adventure Agent assignment.
|
| 18 |
|
| 19 |
## Approach
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
- What strategy does your agent use?
|
| 24 |
-
- What tools did you implement in your MCP server?
|
| 25 |
-
- Any interesting techniques or optimizations?
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
|
@@ -34,14 +46,6 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
|
|
| 34 |
| `app.py` | Gradio interface for HF Space |
|
| 35 |
| `requirements.txt` | Additional dependencies |
|
| 36 |
|
| 37 |
-
## How to Submit
|
| 38 |
-
|
| 39 |
-
1. Fork the template Space: `https://huggingface.co/spaces/LLM-course/text-adventure-template`
|
| 40 |
-
2. Clone your fork locally
|
| 41 |
-
3. Implement your agent in `agent.py` and `mcp_server.py`
|
| 42 |
-
4. Test locally (see below)
|
| 43 |
-
5. Push your changes to your Space
|
| 44 |
-
6. Submit your Space URL on the course platform
|
| 45 |
|
| 46 |
## Local Testing
|
| 47 |
|
|
@@ -54,7 +58,4 @@ fastmcp dev mcp_server.py
|
|
| 54 |
|
| 55 |
# Run your agent on a game
|
| 56 |
python run_agent.py --agent . --game lostpig -v -n 20
|
| 57 |
-
|
| 58 |
-
# Run evaluation
|
| 59 |
-
python -m evaluation.evaluate -s . -g lostpig -t 3
|
| 60 |
```
|
|
|
|
| 14 |
|
| 15 |
## Overview
|
| 16 |
|
| 17 |
+
This is my submission for the Text Adventure Agent assignment. The agent builds upon the baseline Agentic-Zork implementation, introducing agentic framework improvements and a Just-In-Time Reinforcement Learning (JitRL) mechanism for cross-episode learning without gradient updates.
|
| 18 |
|
| 19 |
## Approach
|
| 20 |
|
| 21 |
+
### Agentic Framework
|
| 22 |
+
- **ReAct agent** with structured prompting: the agent proposes multiple candidate actions with confidence scores and reasoning at each step
|
| 23 |
+
- **History summarization**: a dedicated LLM call generates a structured summary (`[SUMMARY]`, `[PROGRESS]`, `[LOCATION]`) at each step to provide compact context without overloading the context window
|
| 24 |
+
- **Valid action constraining**: the agent is restricted to the set of valid actions provided by the Jericho engine, eliminating invalid command errors
|
| 25 |
+
- **Inventory injection**: current inventory is included directly in the prompt, avoiding unnecessary tool calls
|
| 26 |
+
|
| 27 |
+
### JitRL Cross-Episode Memory
|
| 28 |
+
- After each episode, an LLM-based evaluator assigns step-level rewards based on long-term impact
|
| 29 |
+
- Discounted returns $G_t$ are computed for each (state, action) pair and stored in a FAISS vector index
|
| 30 |
+
- At each step, similar past states are retrieved and used to estimate action advantages $\hat{A}(s, a) = \hat{Q}(s, a) - \hat{V}(s)$
|
| 31 |
+
- Candidate action scores are updated following the JitRL rule: $z'(s,a) = z(s,a) + \beta \hat{A}(s,a)$
|
| 32 |
+
|
| 33 |
+
### MCP Server Tools
|
| 34 |
+
- `play_action`: executes a game command and returns the full game state
|
| 35 |
+
- `reset_game`: initializes or resets the game environment
|
| 36 |
+
- `get_valid_actions`: returns the list of valid actions at the current state
|
| 37 |
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## Files
|
| 40 |
|
|
|
|
| 46 |
| `app.py` | Gradio interface for HF Space |
|
| 47 |
| `requirements.txt` | Additional dependencies |
|
| 48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
## Local Testing
|
| 51 |
|
|
|
|
| 58 |
|
| 59 |
# Run your agent on a game
|
| 60 |
python run_agent.py --agent . --game lostpig -v -n 20
|
|
|
|
|
|
|
|
|
|
| 61 |
```
|