Add development workflow section to AGENTS.md

227d906 verified about 18 hours ago

6.04 kB

	# AGENTS.md — Zero-Memory Survival Guide

	> You have no persistent memory. Read this first. Do not trust your internal knowledge.

	---

	## What This Is (10 seconds)

	- Challenge: TIL-26-AE — train a Bomberman agent (`agent_0`) via RL
	- Repo: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
	- Env source: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
	- Algorithm: MaskablePPO + invalid-action masking + curriculum learning

	---

	## The 6 Unbreakable Rules

	\| # \| Rule \| Violation Cost \|
	\|---\|---\|---\|
	\| 1 \| NEVER train in sandboxes >30 min \| ~$5/hr wasted on empty, recycled containers \|
	\| 2 \| NEVER `git clone` private repos in HF Jobs \| Job fails instantly — git ignores `HF_TOKEN` \|
	\| 3 \| NEVER pass inline scripts > few KB to `hf_jobs.script` \| Delivery chokes at ~20KB \|
	\| 4 \| ALWAYS upload script to Hub BEFORE submitting job \| Job fetches script from Hub URL, not sandbox \|
	\| 5 \| ALWAYS update docs BEFORE starting long jobs \| Next you has zero memory; stale docs = duplicated work \|
	\| 6 \| ALWAYS smoke-test before multi-hour jobs \| 5-min test saves hours of failed compute \|

	---

	## Session Startup (do this now)

	1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
	2. Check `hf_jobs ps` for running jobs
	3. Check latest checkpoint on Hub (`phase_ckpt_.zip`)
	4. Determine: current phase, remaining steps, next action

	---

	## Development Workflow (follow exactly)

	1. Write on `cpu-basic` — code, docs, scripts, planning. Never touch GPU sandboxes for editing.

	2. Smoke-test on GPU sandbox (`t4-small` or `a10g-small`) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. Stop the GPU sandbox immediately after pass or fail. Never leave it idle.

	3. If smoke test fails — look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.

	4. If smoke test passes — update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.

	5. Submit the real Job (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. Poll the job every 5 minutes until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.

	---

	## How to Submit an HF Job (the only way that works)

	```python
	# 1. Write to sandbox
	write(path="/app/train.py", content="...")

	# 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
	hf_repo_files(
	operation="upload",
	repo_id="E-Rong/til-26-ae-agent",
	path="train.py",
	content=open("/app/train.py").read()
	)

	# 3. Submit job
	hf_jobs(
	operation="run",
	script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
	dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
	"numpy", "huggingface_hub", "pygame", "omegaconf",
	"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
	hardware_flavor="a10g-small",
	timeout="6h",
	namespace="E-Rong"
	)
	```

	Why step 2 matters: `hf_jobs inspect` reveals the job executes:
	```bash
	uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
	```
	If the file isn't on the Hub, the job 404s.

	---

	## How to Access the Private Env in a Job

	```python
	from huggingface_hub import snapshot_download
	snapshot_download(
	repo_id="e-rong/til-26-ae",
	repo_type="space",
	local_dir="/app/til-26-ae-repo"
	)
	# Then walk to find pyproject.toml and pip install -e .
	```

	`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.

	---

	## Docs Update Checklist (before ANY job >30 min)

	- [ ] `session_state.json` — phase, job_id, script name, hardware, timeout, expected completion
	- [ ] `AGENTS.md` — any new mistakes/API gotchas learned this session
	- [ ] `docs/ae.md` — research results, completed phase metrics
	- [ ] Push all three to Hub BEFORE calling `hf_jobs`

	---

	## Technical Gotchas

	\| Gotcha \| Correct \| Wrong \|
	\|---\|---\|---\|
	\| Wrapper order \| `ActionMasker(base_env)` then `Monitor(env)` \| `ActionMasker(Monitor(base_env))` — masks break \|
	\| Env install \| `snapshot_download` + walk for `pyproject.toml` \| `git clone` of private space \|
	\| Script delivery \| Upload to Hub, submit sandbox path \| Inline 20KB string or sandbox-only file \|
	\| Auth \| `HF_TOKEN` env var (auto-injected in Jobs) \| Passing token manually in git URLs \|

	---

	## Cost Table

	\| Hardware \| $/hr \| Use For \|
	\|---\|---\|---\|
	\| `cpu-basic` \| ~$0.05 \| Writing code, docs, planning \|
	\| `t4-small` \| ~$0.40 \| Smoke tests ONLY \|
	\| `a10g-small` \| ~$1.00 \| Training via HF Jobs \|

	Stop GPU sandboxes immediately after smoke tests. An idle GPU sandbox burns $1/hr for nothing.

	---

	## Curriculum Summary

	\| Phase \| Opponent \| Steps \| Status \|
	\|---\|---\|---\|---\|
	\| 1 \| Random \| 500k \| ✅ Complete (92% win rate) \|
	\| 2 \| Random + exploration shaping \| 500k \| Check `session_state.json` \|
	\| 3 \| Rule-based curriculum \| 1M \| Pending \|

	Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).

	---

	## File Guide

	\| File \| Purpose \|
	\|---\|---\|
	\| `session_state.json` \| Current phase, checkpoints, mistakes, next steps \|
	\| `docs/ae.md` \| Full research, design, results \|
	\| `phase1_final.zip` \| Phase 1 complete checkpoint \|
	\| `phase2_ckpt_*.zip` \| Phase 2 intermediates \|
	\| `phase2_resume.py` \| Working HF Job script \|
	\| `phase3_curriculum.py` \| Ready-to-submit Phase 3 script \|
	\| `smoke_test.py` \| 5-min validation \|

	---

	## Contact

	- User: E-Rong \| Org: E-Rong
	- Billing namespace: `E-Rong` (required on all `hf_jobs`)
	- You are: An ephemeral agent with no memory. This file is your only brain.

	Read the files. Check the state. Test before committing compute. Update docs before every job.