File size: 6,044 Bytes
9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 227d906 9bdda09 87436ce 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 4b6177e 9bdda09 4b6177e 9bdda09 3745a2d 9bdda09 3745a2d 9bdda09 3745a2d 9bdda09 4b6177e 9bdda09 3745a2d 9bdda09 4b6177e 9bdda09 3745a2d 9bdda09 4b6177e 9bdda09 3745a2d 9bdda09 4b6177e 9bdda09 4b6177e 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 20e7173 9bdda09 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | # AGENTS.md β Zero-Memory Survival Guide
> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**
---
## What This Is (10 seconds)
- **Challenge**: TIL-26-AE β train a Bomberman agent (`agent_0`) via RL
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning
---
## The 6 Unbreakable Rules
| # | Rule | Violation Cost |
|---|---|---|
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly β git ignores `HF_TOKEN` |
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |
---
## Session Startup (do this now)
1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
2. Check `hf_jobs ps` for running jobs
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
4. Determine: current phase, remaining steps, next action
---
## Development Workflow (follow exactly)
1. **Write on `cpu-basic`** β code, docs, scripts, planning. Never touch GPU sandboxes for editing.
2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) β run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.
3. **If smoke test fails** β look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.
4. **If smoke test passes** β update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.
5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.
---
## How to Submit an HF Job (the only way that works)
```python
# 1. Write to sandbox
write(path="/app/train.py", content="...")
# 2. UPLOAD TO HUB (critical β job fetches from Hub URL)
hf_repo_files(
operation="upload",
repo_id="E-Rong/til-26-ae-agent",
path="train.py",
content=open("/app/train.py").read()
)
# 3. Submit job
hf_jobs(
operation="run",
script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
"numpy", "huggingface_hub", "pygame", "omegaconf",
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
hardware_flavor="a10g-small",
timeout="6h",
namespace="E-Rong"
)
```
**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
```bash
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
```
If the file isn't on the Hub, the job 404s.
---
## How to Access the Private Env in a Job
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="e-rong/til-26-ae",
repo_type="space",
local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
```
`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.
---
## Docs Update Checklist (before ANY job >30 min)
- [ ] `session_state.json` β phase, job_id, script name, hardware, timeout, expected completion
- [ ] `AGENTS.md` β any new mistakes/API gotchas learned this session
- [ ] `docs/ae.md` β research results, completed phase metrics
- [ ] Push all three to Hub BEFORE calling `hf_jobs`
---
## Technical Gotchas
| Gotcha | Correct | Wrong |
|---|---|---|
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` β masks break |
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |
---
## Cost Table
| Hardware | $/hr | Use For |
|---|---|---|
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
| `t4-small` | ~$0.40 | Smoke tests ONLY |
| `a10g-small` | ~$1.00 | Training via HF Jobs |
**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.
---
## Curriculum Summary
| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | β
Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
| 3 | Rule-based curriculum | 1M | Pending |
Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).
---
## File Guide
| File | Purpose |
|---|---|
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
| `docs/ae.md` | Full research, design, results |
| `phase1_final.zip` | Phase 1 complete checkpoint |
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
| `phase2_resume.py` | Working HF Job script |
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
| `smoke_test.py` | 5-min validation |
---
## Contact
- **User**: E-Rong | **Org**: E-Rong
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
- **You are**: An ephemeral agent with no memory. This file is your only brain.
*Read the files. Check the state. Test before committing compute. Update docs before every job.*
|