til-26-ae-agent / AGENTS.md
E-Rong's picture
Add development workflow section to AGENTS.md
227d906 verified

AGENTS.md β€” Zero-Memory Survival Guide

You have no persistent memory. Read this first. Do not trust your internal knowledge.


What This Is (10 seconds)

  • Challenge: TIL-26-AE β€” train a Bomberman agent (agent_0) via RL
  • Repo: E-Rong/til-26-ae-agent (models, checkpoints, scripts)
  • Env source: Private Space e-rong/til-26-ae (contains til_environment/)
  • Algorithm: MaskablePPO + invalid-action masking + curriculum learning

The 6 Unbreakable Rules

# Rule Violation Cost
1 NEVER train in sandboxes >30 min ~$5/hr wasted on empty, recycled containers
2 NEVER git clone private repos in HF Jobs Job fails instantly β€” git ignores HF_TOKEN
3 NEVER pass inline scripts > few KB to hf_jobs.script Delivery chokes at ~20KB
4 ALWAYS upload script to Hub BEFORE submitting job Job fetches script from Hub URL, not sandbox
5 ALWAYS update docs BEFORE starting long jobs Next you has zero memory; stale docs = duplicated work
6 ALWAYS smoke-test before multi-hour jobs 5-min test saves hours of failed compute

Session Startup (do this now)

  1. Read session_state.json from E-Rong/til-26-ae-agent
  2. Check hf_jobs ps for running jobs
  3. Check latest checkpoint on Hub (phase*_ckpt_*.zip)
  4. Determine: current phase, remaining steps, next action

Development Workflow (follow exactly)

  1. Write on cpu-basic β€” code, docs, scripts, planning. Never touch GPU sandboxes for editing.

  2. Smoke-test on GPU sandbox (t4-small or a10g-small) β€” run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. Stop the GPU sandbox immediately after pass or fail. Never leave it idle.

  3. If smoke test fails β€” look up Hugging Face documentation (explore_hf_docs, fetch_hf_docs) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.

  4. If smoke test passes β€” update docs/ae.md with current project status, update AGENTS.md with anything new you learned. Push both to the Hub before proceeding.

  5. Submit the real Job (a10g-small, a10g-large, etc.). Immediately check hf_jobs logs to confirm it starts successfully. Poll the job every 5 minutes until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.


How to Submit an HF Job (the only way that works)

# 1. Write to sandbox
write(path="/app/train.py", content="...")

# 2. UPLOAD TO HUB (critical β€” job fetches from Hub URL)
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# 3. Submit job
hf_jobs(
    operation="run",
    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                "numpy", "huggingface_hub", "pygame", "omegaconf",
                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"
)

Why step 2 matters: hf_jobs inspect reveals the job executes:

uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py

If the file isn't on the Hub, the job 404s.


How to Access the Private Env in a Job

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .

snapshot_download auto-uses HF_TOKEN. git clone does not.


Docs Update Checklist (before ANY job >30 min)

  • session_state.json β€” phase, job_id, script name, hardware, timeout, expected completion
  • AGENTS.md β€” any new mistakes/API gotchas learned this session
  • docs/ae.md β€” research results, completed phase metrics
  • Push all three to Hub BEFORE calling hf_jobs

Technical Gotchas

Gotcha Correct Wrong
Wrapper order ActionMasker(base_env) then Monitor(env) ActionMasker(Monitor(base_env)) β€” masks break
Env install snapshot_download + walk for pyproject.toml git clone of private space
Script delivery Upload to Hub, submit sandbox path Inline 20KB string or sandbox-only file
Auth HF_TOKEN env var (auto-injected in Jobs) Passing token manually in git URLs

Cost Table

Hardware $/hr Use For
cpu-basic ~$0.05 Writing code, docs, planning
t4-small ~$0.40 Smoke tests ONLY
a10g-small ~$1.00 Training via HF Jobs

Stop GPU sandboxes immediately after smoke tests. An idle GPU sandbox burns $1/hr for nothing.


Curriculum Summary

Phase Opponent Steps Status
1 Random 500k βœ… Complete (92% win rate)
2 Random + exploration shaping 500k Check session_state.json
3 Rule-based curriculum 1M Pending

Key papers: arxiv:2407.00662 (Pommerman curriculum + adaptive annealing), arxiv:2006.14171 (invalid action masking).


File Guide

File Purpose
session_state.json Current phase, checkpoints, mistakes, next steps
docs/ae.md Full research, design, results
phase1_final.zip Phase 1 complete checkpoint
phase2_ckpt_*.zip Phase 2 intermediates
phase2_resume.py Working HF Job script
phase3_curriculum.py Ready-to-submit Phase 3 script
smoke_test.py 5-min validation

Contact

  • User: E-Rong | Org: E-Rong
  • Billing namespace: E-Rong (required on all hf_jobs)
  • You are: An ephemeral agent with no memory. This file is your only brain.

Read the files. Check the state. Test before committing compute. Update docs before every job.