File size: 6,044 Bytes

9bdda09
20e7173
9bdda09
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
20e7173
 
 
227d906
 
 
 
 
 
 
 
 
 
 
 
 
 
9bdda09
87436ce
9bdda09
 
 
20e7173
9bdda09
 
 
 
 
 
 
20e7173
9bdda09
 
 
 
 
 
 
 
 
 
 
20e7173
 
9bdda09
 
 
20e7173
9bdda09
20e7173
 
 
9bdda09
4b6177e
 
 
 
 
 
 
 
9bdda09
4b6177e
 
9bdda09
3745a2d
9bdda09
3745a2d
9bdda09
3745a2d
9bdda09
 
 
 
4b6177e
9bdda09
3745a2d
9bdda09
4b6177e
9bdda09
 
 
 
 
 
3745a2d
9bdda09
4b6177e
9bdda09
3745a2d
9bdda09
 
 
 
 
4b6177e
9bdda09
4b6177e
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
20e7173
9bdda09

# AGENTS.md — Zero-Memory Survival Guide

> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**

---

## What This Is (10 seconds)

- **Challenge**: TIL-26-AE — train a Bomberman agent (`agent_0`) via RL
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning

---

## The 6 Unbreakable Rules

| # | Rule | Violation Cost |
|---|---|---|
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly — git ignores `HF_TOKEN` |
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |

---

## Session Startup (do this now)

1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
2. Check `hf_jobs ps` for running jobs
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
4. Determine: current phase, remaining steps, next action

---

## Development Workflow (follow exactly)

1. **Write on `cpu-basic`** — code, docs, scripts, planning. Never touch GPU sandboxes for editing.

2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.

3. **If smoke test fails** — look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.

4. **If smoke test passes** — update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.

5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.

---

## How to Submit an HF Job (the only way that works)

```python
# 1. Write to sandbox
write(path="/app/train.py", content="...")

# 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# 3. Submit job
hf_jobs(
    operation="run",
    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                "numpy", "huggingface_hub", "pygame", "omegaconf",
                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"
)
```

**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
```bash
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
```
If the file isn't on the Hub, the job 404s.

---

## How to Access the Private Env in a Job

```python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
```

`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.

---

## Docs Update Checklist (before ANY job >30 min)

- [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion
- [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session
- [ ] `docs/ae.md` — research results, completed phase metrics
- [ ] Push all three to Hub BEFORE calling `hf_jobs`

---

## Technical Gotchas

| Gotcha | Correct | Wrong |
|---|---|---|
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` — masks break |
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |

---

## Cost Table

| Hardware | $/hr | Use For |
|---|---|---|
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
| `t4-small` | ~$0.40 | Smoke tests ONLY |
| `a10g-small` | ~$1.00 | Training via HF Jobs |

**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.

---

## Curriculum Summary

| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | ✅ Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
| 3 | Rule-based curriculum | 1M | Pending |

Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).

---

## File Guide

| File | Purpose |
|---|---|
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
| `docs/ae.md` | Full research, design, results |
| `phase1_final.zip` | Phase 1 complete checkpoint |
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
| `phase2_resume.py` | Working HF Job script |
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
| `smoke_test.py` | 5-min validation |

---

## Contact

- **User**: E-Rong | **Org**: E-Rong
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
- **You are**: An ephemeral agent with no memory. This file is your only brain.

*Read the files. Check the state. Test before committing compute. Update docs before every job.*