| # AGENTS.md β Zero-Memory Survival Guide |
|
|
| > **You have no persistent memory. Read this first. Do not trust your internal knowledge.** |
|
|
| --- |
|
|
| ## What This Is (10 seconds) |
|
|
| - **Challenge**: TIL-26-AE β train a Bomberman agent (`agent_0`) via RL |
| - **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts) |
| - **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`) |
| - **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning |
|
|
| --- |
|
|
| ## The 6 Unbreakable Rules |
|
|
| | # | Rule | Violation Cost | |
| |---|---|---| |
| | 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers | |
| | 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly β git ignores `HF_TOKEN` | |
| | 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB | |
| | 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox | |
| | 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work | |
| | 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute | |
| |
| --- |
| |
| ## Session Startup (do this now) |
| |
| 1. Read `session_state.json` from `E-Rong/til-26-ae-agent` |
| 2. Check `hf_jobs ps` for running jobs |
| 3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`) |
| 4. Determine: current phase, remaining steps, next action |
| |
| --- |
| |
| ## Development Workflow (follow exactly) |
| |
| 1. **Write on `cpu-basic`** β code, docs, scripts, planning. Never touch GPU sandboxes for editing. |
| |
| 2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) β run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle. |
| |
| 3. **If smoke test fails** β look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1. |
| |
| 4. **If smoke test passes** β update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding. |
| |
| 5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job. |
|
|
| --- |
|
|
| ## How to Submit an HF Job (the only way that works) |
|
|
| ```python |
| # 1. Write to sandbox |
| write(path="/app/train.py", content="...") |
| |
| # 2. UPLOAD TO HUB (critical β job fetches from Hub URL) |
| hf_repo_files( |
| operation="upload", |
| repo_id="E-Rong/til-26-ae-agent", |
| path="train.py", |
| content=open("/app/train.py").read() |
| ) |
| |
| # 3. Submit job |
| hf_jobs( |
| operation="run", |
| script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py |
| dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo", |
| "numpy", "huggingface_hub", "pygame", "omegaconf", |
| "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"], |
| hardware_flavor="a10g-small", |
| timeout="6h", |
| namespace="E-Rong" |
| ) |
| ``` |
|
|
| **Why step 2 matters**: `hf_jobs inspect` reveals the job executes: |
| ```bash |
| uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py |
| ``` |
| If the file isn't on the Hub, the job 404s. |
|
|
| --- |
|
|
| ## How to Access the Private Env in a Job |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| snapshot_download( |
| repo_id="e-rong/til-26-ae", |
| repo_type="space", |
| local_dir="/app/til-26-ae-repo" |
| ) |
| # Then walk to find pyproject.toml and pip install -e . |
| ``` |
|
|
| `snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not. |
|
|
| --- |
|
|
| ## Docs Update Checklist (before ANY job >30 min) |
|
|
| - [ ] `session_state.json` β phase, job_id, script name, hardware, timeout, expected completion |
| - [ ] `AGENTS.md` β any new mistakes/API gotchas learned this session |
| - [ ] `docs/ae.md` β research results, completed phase metrics |
| - [ ] Push all three to Hub BEFORE calling `hf_jobs` |
|
|
| --- |
|
|
| ## Technical Gotchas |
|
|
| | Gotcha | Correct | Wrong | |
| |---|---|---| |
| | **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` β masks break | |
| | **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space | |
| | **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file | |
| | **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs | |
|
|
| --- |
|
|
| ## Cost Table |
|
|
| | Hardware | $/hr | Use For | |
| |---|---|---| |
| | `cpu-basic` | ~$0.05 | Writing code, docs, planning | |
| | `t4-small` | ~$0.40 | Smoke tests ONLY | |
| | `a10g-small` | ~$1.00 | Training via HF Jobs | |
|
|
| **Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing. |
|
|
| --- |
|
|
| ## Curriculum Summary |
|
|
| | Phase | Opponent | Steps | Status | |
| |---|---|---|---| |
| | 1 | Random | 500k | β
Complete (92% win rate) | |
| | 2 | Random + exploration shaping | 500k | Check `session_state.json` | |
| | 3 | Rule-based curriculum | 1M | Pending | |
|
|
| Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking). |
|
|
| --- |
|
|
| ## File Guide |
|
|
| | File | Purpose | |
| |---|---| |
| | `session_state.json` | Current phase, checkpoints, mistakes, next steps | |
| | `docs/ae.md` | Full research, design, results | |
| | `phase1_final.zip` | Phase 1 complete checkpoint | |
| | `phase2_ckpt_*.zip` | Phase 2 intermediates | |
| | `phase2_resume.py` | Working HF Job script | |
| | `phase3_curriculum.py` | Ready-to-submit Phase 3 script | |
| | `smoke_test.py` | 5-min validation | |
|
|
| --- |
|
|
| ## Contact |
|
|
| - **User**: E-Rong | **Org**: E-Rong |
| - **Billing namespace**: `E-Rong` (required on all `hf_jobs`) |
| - **You are**: An ephemeral agent with no memory. This file is your only brain. |
|
|
| *Read the files. Check the state. Test before committing compute. Update docs before every job.* |
|
|