File size: 8,778 Bytes
bc35a94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# getting started β€” EnterpriseHPC-v0

end-to-end setup guide. covers a fresh linux machine, colab, and hugging
face spaces. pick the path that matches your situation.

## tl;dr fastest possible path

```bash
git clone https://github.com/<your-user>/low-taper-fade-openenv-scaler.git
cd low-taper-fade-openenv-scaler
python3.13 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
make gold         # deterministic proof all 6 scenarios are solvable
make bench        # reset-latency benchmark (<3 ms p50 in copy mode)
make eval         # gold vs random vs bad policies, writes runs/eval/leaderboard.md
make reward-demo  # gpu-free reward-curve png, proves reward improvement
make dry          # training rollout smoke test, no gpu required
```

if everything passes, skip to [training paths](#training-paths).

## 1 prerequisites

### system packages (linux)

these are only required for the local sandbox. colab and hf jobs handle
them automatically.

```bash
sudo apt update
sudo apt install -y bubblewrap fuse-overlayfs fuse3 tini coreutils
bwrap --version           # >= 0.6 recommended
fuse-overlayfs --version  # optional, copy fallback also works
```

- `bubblewrap` (the `bwrap` binary) provides the user namespace sandbox
- `fuse-overlayfs` gives you sub-1 ms resets. missing it is fine, we fall
  back to a shutil-copy path that still hits ~2.4 ms p50

### python

- python `>=3.12` is required. python `3.13` is the current unsloth
  default (per their install docs) and the one used in `Dockerfile` +
  `server/Dockerfile`
- `pip install -e '.[dev]'` installs the package in dev mode plus all
  runtime deps (fastapi, uvicorn, gymnasium, pexpect, httpx,
  matplotlib, numpy, etc.) and pytest
- `pip install -e '.[train]'` adds the gpu-training deps (torch,
  transformers, trl, accelerate, peft, bitsandbytes, tensorboard,
  datasets). only needed on the training host

## 2 sanity checks (no gpu, 15 seconds)

run these in order. any failure means the environment is misconfigured.

```bash
# proves every scenario is deterministically solvable
python -m tools.verify_gold_trajectory -v

# measures reset latency β€” should be under 10 ms
python -m bench.bench_reset -n 100

# runs gold/random/bad policies against every scenario,
# writes runs/eval/leaderboard.md
python -m eval.eval_suite --trials 2
```

## 3 run the openenv server locally

```bash
make serve                 # runs the server console script on 0.0.0.0:8000
# or equivalently (after pip install -e .)
server --host 0.0.0.0 --port 8000
```

smoke test in another terminal:

```bash
curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8000/tasks
curl -X POST http://127.0.0.1:8000/reset -H 'content-type: application/json' \
  -d '{"task_id": "hpc_outage"}'
curl -X POST http://127.0.0.1:8000/step -H 'content-type: application/json' \
  -d '{"action": {"command": "sinfo"}}'
```

## 4 deploy to hugging face spaces (for remote training)

this is required if you want to train via `--env-urls https://...`. the
reference deployment lives at
[`huggingmenfordays/enterprise-hpc-openenv`](https://huggingface.co/spaces/huggingmenfordays/enterprise-hpc-openenv)
(public url: `https://huggingmenfordays-enterprise-hpc-openenv.hf.space`).

### first-time push

1. create a new space on huggingface.co β€” type `Docker`, any hardware tier
2. push this repo to the space:
   ```bash
   hf auth login           # once
   huggingface-cli repo create enterprise-hpc-openenv --type space --space_sdk docker
   git remote add space https://huggingface.co/spaces/<user>/enterprise-hpc-openenv
   git push space main
   ```
3. wait for the build. the space should expose your env at
   `https://<user>-enterprise-hpc-openenv.hf.space`
4. smoke test:
   ```bash
   curl https://<user>-enterprise-hpc-openenv.hf.space/health
   ```

### redeploying updates (orphan-branch trick)

this repo has `.venv/` and `docs/assets/*.png` binaries sitting in git
history that hf xet refuses to accept. a plain
`git push space final-round:main` will be rejected with
`pre-receive hook declined`. force-push a clean orphan snapshot instead:

```bash
hf auth login                                                                  # ensure token is live
git remote set-url space https://huggingface.co/spaces/<user>/enterprise-hpc-openenv

git checkout --orphan space-deploy
git rm -rf --cached .
rm -f docs/assets/reward_curve_demo.png                                        # drop binaries hf xet trips on
git add -A
git commit -m "deploy: clean snapshot for hf space"
git push space space-deploy:main --force

git checkout final-round
git branch -D space-deploy
git checkout HEAD -- docs/assets/reward_curve_demo.png                         # restore the png locally
```

your local `final-round` history stays intact; only the space's `main`
is rewritten. the build takes 5-10 min; hit `/health` to confirm it
came up green.

full guide: [`docs/hf_spaces_deploy.md`](./docs/hf_spaces_deploy.md)

## 5 training paths

### path A β€” local gpu (colab / single workstation)

```bash
python -m training.train_hpc_outage \
  --model Qwen/Qwen2.5-Coder-7B-Instruct \
  --scenarios hpc_outage,hpc_munge,hpc_pid_stale,hpc_gpu_ecc,hpc_nfs_stale,hpc_ood_apache \
  --group-size 4 --max-turns 12 --num-train-steps 100 \
  --output-dir ./runs/hpc_grpo_local
```

on colab open [`training/hpc_colab.ipynb`](./training/hpc_colab.ipynb) β€”
it handles all the setup. the t4 free tier works at `--group-size 2`,
l4 / a100 can push `--group-size 4+`.

### path B β€” remote hosted openenv (multiple spaces = throughput)

```bash
python -m training.hpc_openenv_gemma \
  --env-urls https://<user>-enterprise-hpc-openenv.hf.space \
             https://<user>-enterprise-hpc-openenv-2.hf.space \
  --model Qwen/Qwen2.5-Coder-7B-Instruct \
  --group-size 4 --max-turns 24 --num-train-steps 200 \
  --curriculum --save-adapter-only
```

the pool round-robins across every `--env-urls` entry for parallel
rollouts. as of apr 23 2026 the remote server supports per-episode
sessions (keyed on `episode_id`), so `group_size > 1` against a single
space no longer clobbers episode state. the default `--max-turns` is
now `24` β€” many scenarios need 10+ turns once format compliance and
diagnostic steps are accounted for.

### path C β€” hf jobs (fully managed, gpu-on-demand)

```bash
python -m training.hf_jobs \
  --env-urls https://<user>-enterprise-hpc-openenv.hf.space \
  --repo-url https://huggingface.co/spaces/<user>/enterprise-hpc-openenv \
  --gpu a10g-large \
  --num-train-steps 300 \
  --hub-repo <user>/hpc-grpo-runs
```

see [`docs/hf_jobs.md`](./docs/hf_jobs.md) for the full guide.

## 6 expected artifacts

every training run produces:

- `runs/<name>/<name>.metrics.jsonl` β€” reward curve time series
- tensorboard event files β€” `tensorboard --logdir ./runs`
- optional wandb run if `--wandb-project` is set
- optional lora adapter weights in `runs/<name>/`

to plot the reward curve locally:

```bash
tensorboard --logdir ./runs
# or use the plot cell at the bottom of training/hpc_colab.ipynb
```

## 7 troubleshooting

| symptom | fix |
| --- | --- |
| `bwrap: setting up uid map: Permission denied` | enable unprivileged user namespaces: `sudo sysctl -w kernel.unprivileged_userns_clone=1` |
| `fuse-overlayfs: not found` | harmless, we fall back to copy mode. apt install it for <1 ms resets |
| `OSError: out of pty devices` | pexpect cannot allocate a PTY. rerun on a host with `/dev/ptmx` accessible (colab, hf spaces, most linux hosts) |
| `ModuleNotFoundError: gymnasium` / `pexpect` | `pip install -e .` again, or `pip install gymnasium pexpect httpx` |
| HF Space deploy: build fails on `fuse-overlayfs` install | ignore β€” Spaces have apparmor restrictions, the copy fallback still works |
| `huggingface_hub.run_uv` missing | upgrade: `pip install -U huggingface_hub`. otherwise `--dry-run-local` prints the shell script |
| training OOM on T4 | lower `--group-size 2 --max-new-tokens 256`, or switch to `Qwen/Qwen2.5-Coder-3B-Instruct` / `unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit` |
| "no pty devices" when running training locally in a container | run on a linux host directly, or in colab |

## 8 one-line reproduction for judges

```bash
make help                                         # list all targets
make gold                                         # prove solvable
make bench                                        # reset latency
make eval                                         # policy leaderboard
make dry                                          # training plumbing smoke test
make train                                        # local grpo training
make train-remote ENV_URLS=https://your.hf.space  # remote openenv training
```