# AGENTS.md — Zero-Memory Survival Guide

> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**

---

## What This Is (10 seconds)

- **Challenge**: TIL-26-AE — train a Bomberman agent (`agent_0`) via RL
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning

---

## The 6 Unbreakable Rules

| # | Rule | Violation Cost |
|---|---|---|
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly — git ignores `HF_TOKEN` |
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |

---

## Session Startup (do this now)

1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
2. Check `hf_jobs ps` for running jobs
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
4. Determine: current phase, remaining steps, next action

---

## Development Workflow (follow exactly)

1. **Write on `cpu-basic`** — code, docs, scripts, planning. Never touch GPU sandboxes for editing.

2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.

3. **If smoke test fails** — look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.

4. **If smoke test passes** — update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.

5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.

---

## How to Submit an HF Job (the only way that works)

```python
# 1. Write to sandbox
write(path="/app/train.py", content="...")

# 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# 3. Submit job
hf_jobs(
    operation="run",
    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                "numpy", "huggingface_hub", "pygame", "omegaconf",
                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"
)
```

**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
```bash
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
```
If the file isn't on the Hub, the job 404s.

---

## How to Access the Private Env in a Job

```python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
```

`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.

---

## Docs Update Checklist (before ANY job >30 min)

- [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion
- [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session
- [ ] `docs/ae.md` — research results, completed phase metrics
- [ ] Push all three to Hub BEFORE calling `hf_jobs`

---

## Technical Gotchas

| Gotcha | Correct | Wrong |
|---|---|---|
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` — masks break |
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |

---

## Cost Table

| Hardware | $/hr | Use For |
|---|---|---|
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
| `t4-small` | ~$0.40 | Smoke tests ONLY |
| `a10g-small` | ~$1.00 | Training via HF Jobs |

**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.

---

## Curriculum Summary

| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | ✅ Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
| 3 | Rule-based curriculum | 1M | Pending |

Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).

---

## File Guide

| File | Purpose |
|---|---|
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
| `docs/ae.md` | Full research, design, results |
| `phase1_final.zip` | Phase 1 complete checkpoint |
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
| `phase2_resume.py` | Working HF Job script |
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
| `smoke_test.py` | 5-min validation |

---

## Contact

- **User**: E-Rong | **Org**: E-Rong
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
- **You are**: An ephemeral agent with no memory. This file is your only brain.

*Read the files. Check the state. Test before committing compute. Update docs before every job.*