File size: 6,044 Bytes
9bdda09
20e7173
9bdda09
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
20e7173
 
 
227d906
 
 
 
 
 
 
 
 
 
 
 
 
 
9bdda09
87436ce
9bdda09
 
 
20e7173
9bdda09
 
 
 
 
 
 
20e7173
9bdda09
 
 
 
 
 
 
 
 
 
 
20e7173
 
9bdda09
 
 
20e7173
9bdda09
20e7173
 
 
9bdda09
4b6177e
 
 
 
 
 
 
 
9bdda09
4b6177e
 
9bdda09
3745a2d
9bdda09
3745a2d
9bdda09
3745a2d
9bdda09
 
 
 
4b6177e
9bdda09
3745a2d
9bdda09
4b6177e
9bdda09
 
 
 
 
 
3745a2d
9bdda09
4b6177e
9bdda09
3745a2d
9bdda09
 
 
 
 
4b6177e
9bdda09
4b6177e
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
 
 
 
 
 
 
20e7173
 
 
9bdda09
20e7173
9bdda09
 
 
20e7173
9bdda09
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# AGENTS.md β€” Zero-Memory Survival Guide

> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**

---

## What This Is (10 seconds)

- **Challenge**: TIL-26-AE β€” train a Bomberman agent (`agent_0`) via RL
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning

---

## The 6 Unbreakable Rules

| # | Rule | Violation Cost |
|---|---|---|
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly β€” git ignores `HF_TOKEN` |
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |

---

## Session Startup (do this now)

1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
2. Check `hf_jobs ps` for running jobs
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
4. Determine: current phase, remaining steps, next action

---

## Development Workflow (follow exactly)

1. **Write on `cpu-basic`** β€” code, docs, scripts, planning. Never touch GPU sandboxes for editing.

2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) β€” run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.

3. **If smoke test fails** β€” look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.

4. **If smoke test passes** β€” update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.

5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.

---

## How to Submit an HF Job (the only way that works)

```python
# 1. Write to sandbox
write(path="/app/train.py", content="...")

# 2. UPLOAD TO HUB (critical β€” job fetches from Hub URL)
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# 3. Submit job
hf_jobs(
    operation="run",
    script="/app/train.py",   # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                "numpy", "huggingface_hub", "pygame", "omegaconf",
                "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"
)
```

**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
```bash
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
```
If the file isn't on the Hub, the job 404s.

---

## How to Access the Private Env in a Job

```python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
```

`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.

---

## Docs Update Checklist (before ANY job >30 min)

- [ ] `session_state.json` β€” phase, job_id, script name, hardware, timeout, expected completion
- [ ] `AGENTS.md` β€” any new mistakes/API gotchas learned this session
- [ ] `docs/ae.md` β€” research results, completed phase metrics
- [ ] Push all three to Hub BEFORE calling `hf_jobs`

---

## Technical Gotchas

| Gotcha | Correct | Wrong |
|---|---|---|
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` β€” masks break |
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |

---

## Cost Table

| Hardware | $/hr | Use For |
|---|---|---|
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
| `t4-small` | ~$0.40 | Smoke tests ONLY |
| `a10g-small` | ~$1.00 | Training via HF Jobs |

**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.

---

## Curriculum Summary

| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | βœ… Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
| 3 | Rule-based curriculum | 1M | Pending |

Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).

---

## File Guide

| File | Purpose |
|---|---|
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
| `docs/ae.md` | Full research, design, results |
| `phase1_final.zip` | Phase 1 complete checkpoint |
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
| `phase2_resume.py` | Working HF Job script |
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
| `smoke_test.py` | 5-min validation |

---

## Contact

- **User**: E-Rong | **Org**: E-Rong
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
- **You are**: An ephemeral agent with no memory. This file is your only brain.

*Read the files. Check the state. Test before committing compute. Update docs before every job.*