# AGENTS.md — Zero-Memory Survival Guide > **You have no persistent memory. Read this first. Do not trust your internal knowledge.** --- ## What This Is (10 seconds) - **Challenge**: TIL-26-AE — train a Bomberman agent (`agent_0`) via RL - **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts) - **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`) - **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning --- ## The 6 Unbreakable Rules | # | Rule | Violation Cost | |---|---|---| | 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers | | 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly — git ignores `HF_TOKEN` | | 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB | | 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox | | 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work | | 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute | --- ## Session Startup (do this now) 1. Read `session_state.json` from `E-Rong/til-26-ae-agent` 2. Check `hf_jobs ps` for running jobs 3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`) 4. Determine: current phase, remaining steps, next action --- ## Development Workflow (follow exactly) 1. **Write on `cpu-basic`** — code, docs, scripts, planning. Never touch GPU sandboxes for editing. 2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle. 3. **If smoke test fails** — look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1. 4. **If smoke test passes** — update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding. 5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job. --- ## How to Submit an HF Job (the only way that works) ```python # 1. Write to sandbox write(path="/app/train.py", content="...") # 2. UPLOAD TO HUB (critical — job fetches from Hub URL) hf_repo_files( operation="upload", repo_id="E-Rong/til-26-ae-agent", path="train.py", content=open("/app/train.py").read() ) # 3. Submit job hf_jobs( operation="run", script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo", "numpy", "huggingface_hub", "pygame", "omegaconf", "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"], hardware_flavor="a10g-small", timeout="6h", namespace="E-Rong" ) ``` **Why step 2 matters**: `hf_jobs inspect` reveals the job executes: ```bash uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py ``` If the file isn't on the Hub, the job 404s. --- ## How to Access the Private Env in a Job ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="e-rong/til-26-ae", repo_type="space", local_dir="/app/til-26-ae-repo" ) # Then walk to find pyproject.toml and pip install -e . ``` `snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not. --- ## Docs Update Checklist (before ANY job >30 min) - [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion - [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session - [ ] `docs/ae.md` — research results, completed phase metrics - [ ] Push all three to Hub BEFORE calling `hf_jobs` --- ## Technical Gotchas | Gotcha | Correct | Wrong | |---|---|---| | **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` — masks break | | **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space | | **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file | | **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs | --- ## Cost Table | Hardware | $/hr | Use For | |---|---|---| | `cpu-basic` | ~$0.05 | Writing code, docs, planning | | `t4-small` | ~$0.40 | Smoke tests ONLY | | `a10g-small` | ~$1.00 | Training via HF Jobs | **Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing. --- ## Curriculum Summary | Phase | Opponent | Steps | Status | |---|---|---|---| | 1 | Random | 500k | ✅ Complete (92% win rate) | | 2 | Random + exploration shaping | 500k | Check `session_state.json` | | 3 | Rule-based curriculum | 1M | Pending | Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking). --- ## File Guide | File | Purpose | |---|---| | `session_state.json` | Current phase, checkpoints, mistakes, next steps | | `docs/ae.md` | Full research, design, results | | `phase1_final.zip` | Phase 1 complete checkpoint | | `phase2_ckpt_*.zip` | Phase 2 intermediates | | `phase2_resume.py` | Working HF Job script | | `phase3_curriculum.py` | Ready-to-submit Phase 3 script | | `smoke_test.py` | 5-min validation | --- ## Contact - **User**: E-Rong | **Org**: E-Rong - **Billing namespace**: `E-Rong` (required on all `hf_jobs`) - **You are**: An ephemeral agent with no memory. This file is your only brain. *Read the files. Check the state. Test before committing compute. Update docs before every job.*