File size: 14,382 Bytes
bc35a94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# EnterpriseHPC-v0 + Qwen2.5-Coder-7B GRPO on Colab / Kaggle\n",
        "\n",
        "This notebook post-trains `Qwen/Qwen2.5-Coder-7B-Instruct` with TRL GRPO\n",
        "on the `EnterpriseHPC-v0` environment. The env simulates a Rocky Linux\n",
        "HPC cluster (login + compute-01 nodes, mock Slurm state machine, Open\n",
        "OnDemand Apache portal, mock NFS share, mock NVIDIA GPUs) inside a\n",
        "single user-namespace sandbox with sub-10 ms overlay resets.\n",
        "\n",
        "Scenarios (six remediation incidents in the **Theme #3.1 World\n",
        "Modeling / Professional Tasks** bucket, aligned with the Scaler AI Labs\n",
        "Multi-App RL Environment sub-theme):\n",
        "- `hpc_outage` β€” broken compute node network route, `slurmd` down\n",
        "- `hpc_munge` β€” corrupt/permission-broken `munge.key`, auth failures\n",
        "- `hpc_pid_stale` β€” stale `/var/run/slurmd.pid` blocks service start\n",
        "- `hpc_gpu_ecc` β€” GPU ECC volatile errors, node drained, need `nvidia-smi -r`\n",
        "- `hpc_nfs_stale` β€” `/mnt/shared` stale NFS handle, umount/remount dance\n",
        "- `hpc_ood_apache` β€” Open OnDemand Apache portal config typo on :8081\n",
        "\n",
        "Three round-1 legacy tasks (`nginx_crash`, `disk_full`, `network_broken`)\n",
        "are retained as a **warm-up curriculum tier** for difficulty ramping,\n",
        "not as a separate theme claim.\n",
        "\n",
        "Two training paths are supported:\n",
        "- **Local**: run the sandbox inside the Colab / Kaggle runtime via `train_hpc_outage.py`\n",
        "- **Remote**: train against one or more Hugging Face Spaces hosting the openenv server via `hpc_openenv_gemma.py`. This is the exact shape of the TRL + OpenEnv launch example (the CARLA driving notebook) but for HPC incidents, with a code-tuned Qwen policy in place of Gemma 4.\n",
        "\n",
        "Prereqs\n",
        "- Colab or Kaggle runtime with a GPU. Qwen2.5-Coder-7B fits in 4-bit QLoRA on a single A100 (Kaggle free tier). On T4/L4 use `--model Qwen/Qwen2.5-Coder-3B-Instruct` and `--group-size 2`. Python 3.12+ is required\n",
        "- `HF_TOKEN` in Colab/Kaggle secrets (model is open but token unlocks uploads)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1 System dependencies"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%%bash\n",
        "set -euxo pipefail\n",
        "apt-get update -qq\n",
        "apt-get install -y -qq bubblewrap fuse-overlayfs fuse3 tini coreutils\n",
        "bwrap --version\n",
        "fuse-overlayfs --version || true"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2 Clone the repo and install python deps"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%%bash\n",
        "set -euxo pipefail\n",
        "if [ ! -d low-taper-fade-openenv-scaler ]; then\n",
        "  git clone https://github.com/your-org/low-taper-fade-openenv-scaler.git\n",
        "fi\n",
        "cd low-taper-fade-openenv-scaler\n",
        "python --version\n",
        "pip install -q --upgrade pip setuptools wheel\n",
        "pip install -q -e '.[train]'\n",
        "pip install -q --no-deps 'unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git'\n",
        "pip install -q 'unsloth-zoo' wandb\n",
        "python -c \"import torch, transformers, trl, unsloth, gymnasium, fastapi; print('torch', torch.__version__, 'transformers', transformers.__version__, 'trl', trl.__version__, 'unsloth', unsloth.__version__, 'gymnasium', gymnasium.__version__, 'fastapi', fastapi.__version__)\""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%cd low-taper-fade-openenv-scaler"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3 Prove the environment is solvable (gold trajectory verifier)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "!python -m tools.verify_gold_trajectory -v"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4 Benchmark reset latency"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "!python -m bench.bench_reset -n 200"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5 Leaderboard (gold vs random vs bad policies)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "!python -m eval.eval_suite --trials 3 --output-dir ./runs/eval \\\n",
        "  --scenarios hpc_outage,hpc_munge,hpc_pid_stale,hpc_gpu_ecc,hpc_nfs_stale,hpc_ood_apache\n",
        "!cat ./runs/eval/leaderboard.md"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6 Reward-curve demo (gpu-free, proves reward improvement)\n",
        "\n",
        "This replays a curriculum-annealed reward probe against the\n",
        "grader and plots `reward_mean`, `solve_rate`, `terminal_health` over\n",
        "simulated policy improvement steps. It is the evidence the judges want\n",
        "under the **Showing Improvement in Rewards (20%)** rubric and it runs\n",
        "in under a minute without a GPU or `bwrap`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "!python -m tools.reward_curve_demo --num-steps 24 --rollouts-per-step 12\n",
        "from IPython.display import Image, display\n",
        "display(Image('docs/assets/reward_curve_demo.png'))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7 Dry-run rollout inside the real sandbox (no GPU required)"
      ],
      "id": "80b68a95"
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "!python -m training.train_hpc_outage --dry-run --group-size 2 --max-turns 8 --output-dir ./runs/dry \\\n",
        "  --scenarios hpc_outage,hpc_munge,hpc_pid_stale,hpc_gpu_ecc,hpc_nfs_stale,hpc_ood_apache"
      ],
      "execution_count": null,
      "outputs": [],
      "id": "2c7f23b4"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 8 Option A: local GRPO training with Qwen2.5-Coder-7B\n",
        "\n",
        "On a T4 swap to `--model Qwen/Qwen2.5-Coder-3B-Instruct --group-size 2 --max-turns 8`. On a Kaggle / Colab A100 keep the 7B and go `--group-size 4 --max-turns 16`. All six HPC scenarios are mixed into the rollout pool so GRPO learns a single policy across the whole incident catalogue."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%env TRANSFORMERS_VERBOSITY=error\n",
        "%env TOKENIZERS_PARALLELISM=false\n",
        "!python -m training.train_hpc_outage \\\n",
        "  --model Qwen/Qwen2.5-Coder-7B-Instruct \\\n",
        "  --output-dir ./runs/hpc_grpo_local \\\n",
        "  --group-size 4 \\\n",
        "  --max-turns 12 \\\n",
        "  --num-train-steps 100 \\\n",
        "  --max-new-tokens 512 \\\n",
        "  --max-seq-length 8192 \\\n",
        "  --learning-rate 1e-5 \\\n",
        "  --curriculum --save-adapter-only \\\n",
        "  --scenarios hpc_outage,hpc_munge,hpc_pid_stale,hpc_gpu_ecc,hpc_nfs_stale,hpc_ood_apache \\\n",
        "  --report-to tensorboard"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9 Option B: remote GRPO training against a Hugging Face Space\n",
        "\n",
        "Deploy `Dockerfile` to an HF Space first (see `docs/hf_spaces_deploy.md`). Then:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import os\n",
        "os.environ.setdefault('ENV_URLS', 'https://your-user-enterprise-hpc-openenv.hf.space')\n",
        "!python -m training.hpc_openenv_gemma \\\n",
        "  --env-urls ${ENV_URLS} \\\n",
        "  --model Qwen/Qwen2.5-Coder-7B-Instruct \\\n",
        "  --output-dir ./runs/hpc_grpo_remote \\\n",
        "  --group-size 4 --max-turns 20 --num-train-steps 100 \\\n",
        "  --max-new-tokens 512 --max-seq-length 8192 \\\n",
        "  --curriculum --save-adapter-only \\\n",
        "  --scenarios hpc_outage,hpc_munge,hpc_pid_stale,hpc_gpu_ecc,hpc_nfs_stale,hpc_ood_apache \\\n",
        "  --report-to tensorboard"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 10 Plot the real GRPO reward curve"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json, matplotlib.pyplot as plt\n",
        "from pathlib import Path\n",
        "metrics = []\n",
        "for p in Path('./runs').rglob('*.metrics.jsonl'):\n",
        "    for line in p.read_text().strip().splitlines():\n",
        "        m = json.loads(line); m['source']=p.parent.name; metrics.append(m)\n",
        "if not metrics:\n",
        "    print('no metrics found yet β€” run section 8 (local) or section 9 (remote) first')\n",
        "else:\n",
        "    import collections\n",
        "    by_run = collections.defaultdict(list)\n",
        "    for m in metrics: by_run[m['source']].append(m)\n",
        "    fig, ax = plt.subplots(1, 2, figsize=(12,4))\n",
        "    for run, rows in by_run.items():\n",
        "        rows.sort(key=lambda r: r['step'])\n",
        "        ax[0].plot([r['step'] for r in rows], [r['solve_rate'] for r in rows], label=run)\n",
        "        ax[1].plot([r['step'] for r in rows], [r['reward_mean'] for r in rows], label=run)\n",
        "    ax[0].set_title('solve_rate over GRPO steps'); ax[0].legend(); ax[0].set_ylim(0,1)\n",
        "    ax[1].set_title('reward_mean over GRPO steps'); ax[1].legend(); ax[1].set_ylim(0,1)\n",
        "    plt.tight_layout(); plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 11 Inspect the trained agent transcripts\n",
        "\n",
        "Run a single rollout with the trained adapter and save the transcript. These are the clips you want in the pitch and video."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json, os\n",
        "from pathlib import Path\n",
        "from training.rollout import run_interactive_group\n",
        "from hpc_gym import EnterpriseHPCEnv\n",
        "from unsloth import FastLanguageModel\n",
        "import torch\n",
        "\n",
        "ckpt = './runs/hpc_grpo_local'\n",
        "if not Path(ckpt).exists():\n",
        "    ckpt = 'Qwen/Qwen2.5-Coder-7B-Instruct'\n",
        "model, tokenizer = FastLanguageModel.from_pretrained(model_name=ckpt, max_seq_length=4096, load_in_4bit=True)\n",
        "FastLanguageModel.for_inference(model)\n",
        "\n",
        "def generate_fn(batch_messages):\n",
        "    texts = [tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True) for m in batch_messages]\n",
        "    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=4096).to(model.device)\n",
        "    with torch.inference_mode():\n",
        "        out = model.generate(**inputs, do_sample=True, temperature=0.7, top_p=0.95, max_new_tokens=256,\n",
        "                              pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id)\n",
        "    new = out[:, inputs['input_ids'].shape[1]:]\n",
        "    return tokenizer.batch_decode(new, skip_special_tokens=True)\n",
        "\n",
        "records = run_interactive_group(\n",
        "  group_size=4,\n",
        "  generate_fn=generate_fn,\n",
        "  env_factory=lambda: EnterpriseHPCEnv(scenario_pool=[\n",
        "      'hpc_outage','hpc_munge','hpc_pid_stale',\n",
        "      'hpc_gpu_ecc','hpc_nfs_stale','hpc_ood_apache',\n",
        "  ]),\n",
        "  max_turns=16,\n",
        ")\n",
        "for r in records:\n",
        "    print('task', r.task_id, 'reward', r.reward, 'steps', r.steps, 'health', r.grader_health)\n",
        "\n",
        "os.makedirs('./runs/eval_trained', exist_ok=True)\n",
        "with open('./runs/eval_trained/transcripts.json', 'w') as f:\n",
        "    json.dump([r.__dict__ for r in records], f, indent=2, default=str)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 12 (Optional) push artifacts to the Hub\n",
        "\n",
        "Upload adapter weights, metrics jsonl, and leaderboard to a model repo so judges can load them."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from huggingface_hub import HfApi, create_repo\n",
        "import os\n",
        "repo_id = os.environ.get('HF_HUB_REPO', 'your-user/hpc-grpo-runs')\n",
        "api = HfApi(token=os.environ.get('HF_TOKEN'))\n",
        "create_repo(repo_id, exist_ok=True, token=api.token)\n",
        "api.upload_folder(folder_path='./runs/hpc_grpo_local', repo_id=repo_id, path_in_repo='hpc_grpo_local')"
      ],
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}