AGENTS.md — Zero-Memory Survival Guide

You have no persistent memory. Read this first. Do not trust your internal knowledge.

What This Is (10 seconds)

Challenge: TIL-26-AE — train a Bomberman agent (agent_0) via RL
Repo: E-Rong/til-26-ae-agent (models, checkpoints, scripts)
Env source: Private Space e-rong/til-26-ae (contains til_environment/)
Algorithm: MaskablePPO + invalid-action masking + curriculum learning

The 6 Unbreakable Rules

#	Rule	Violation Cost
1	NEVER train in sandboxes >30 min	~$5/hr wasted on empty, recycled containers
2	NEVER `git clone` private repos in HF Jobs	Job fails instantly — git ignores `HF_TOKEN`
3	NEVER pass inline scripts > few KB to `hf_jobs.script`	Delivery chokes at ~20KB
4	ALWAYS upload script to Hub BEFORE submitting job	Job fetches script from Hub URL, not sandbox
5	ALWAYS update docs BEFORE starting long jobs	Next you has zero memory; stale docs = duplicated work
6	ALWAYS smoke-test before multi-hour jobs	5-min test saves hours of failed compute

Session Startup (do this now)

Read session_state.json from E-Rong/til-26-ae-agent
Check hf_jobs ps for running jobs
Check latest checkpoint on Hub (phase*_ckpt_*.zip)
Determine: current phase, remaining steps, next action

Development Workflow (follow exactly)

Write on cpu-basic — code, docs, scripts, planning. Never touch GPU sandboxes for editing.
Smoke-test on GPU sandbox (t4-small or a10g-small) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. Stop the GPU sandbox immediately after pass or fail. Never leave it idle.
If smoke test fails — look up Hugging Face documentation (explore_hf_docs, fetch_hf_docs) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.
If smoke test passes — update docs/ae.md with current project status, update AGENTS.md with anything new you learned. Push both to the Hub before proceeding.
Submit the real Job (a10g-small, a10g-large, etc.). Immediately check hf_jobs logs to confirm it starts successfully. Poll the job every 5 minutes until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.

How to Submit an HF Job (the only way that works)

# 1. Write to sandbox
write(path="/app/train.py", content="...")

# 2. UPLOAD TO HUB (critical — job fetches from Hub URL)
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# 3. Submit job
hf_jobs(
    operation="run",
    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                "numpy", "huggingface_hub", "pygame", "omegaconf",
                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"
)

Why step 2 matters: hf_jobs inspect reveals the job executes:

uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py

If the file isn't on the Hub, the job 404s.

How to Access the Private Env in a Job

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .

snapshot_download auto-uses HF_TOKEN. git clone does not.

Docs Update Checklist (before ANY job >30 min)

session_state.json — phase, job_id, script name, hardware, timeout, expected completion
AGENTS.md — any new mistakes/API gotchas learned this session
docs/ae.md — research results, completed phase metrics
Push all three to Hub BEFORE calling hf_jobs

Technical Gotchas

Gotcha	Correct	Wrong
Wrapper order	`ActionMasker(base_env)` then `Monitor(env)`	`ActionMasker(Monitor(base_env))` — masks break
Env install	`snapshot_download` + walk for `pyproject.toml`	`git clone` of private space
Script delivery	Upload to Hub, submit sandbox path	Inline 20KB string or sandbox-only file
Auth	`HF_TOKEN` env var (auto-injected in Jobs)	Passing token manually in git URLs

Cost Table

Hardware	$/hr	Use For
`cpu-basic`	~$0.05	Writing code, docs, planning
`t4-small`	~$0.40	Smoke tests ONLY
`a10g-small`	~$1.00	Training via HF Jobs

Stop GPU sandboxes immediately after smoke tests. An idle GPU sandbox burns $1/hr for nothing.

Curriculum Summary

Phase	Opponent	Steps	Status
1	Random	500k	✅ Complete (92% win rate)
2	Random + exploration shaping	500k	Check `session_state.json`
3	Rule-based curriculum	1M	Pending

Key papers: arxiv:2407.00662 (Pommerman curriculum + adaptive annealing), arxiv:2006.14171 (invalid action masking).

File Guide

File	Purpose
`session_state.json`	Current phase, checkpoints, mistakes, next steps
`docs/ae.md`	Full research, design, results
`phase1_final.zip`	Phase 1 complete checkpoint
`phase2_ckpt_*.zip`	Phase 2 intermediates
`phase2_resume.py`	Working HF Job script
`phase3_curriculum.py`	Ready-to-submit Phase 3 script
`smoke_test.py`	5-min validation

Contact

User: E-Rong | Org: E-Rong
Billing namespace: E-Rong (required on all hf_jobs)
You are: An ephemeral agent with no memory. This file is your only brain.

Read the files. Check the state. Test before committing compute. Update docs before every job.