Spaces:
Paused
Paused
| # what I need you to do β hackathon final stretch | |
| I cannot do these inside the Cursor sandbox (no GPU, no HF credentials, no | |
| PTY devices, no real network). these are the remaining blockers between | |
| "technically complete" and "wins the hackathon". | |
| legend | |
| - **[BLOCKER]** must be done before submission | |
| - **[BONUS]** meaningful boost on the rubric, not required | |
| - **[POLISH]** last-minute polish if you have time | |
| ## apr 23 2026 β reward pipeline + session isolation fixes shipped | |
| after a kaggle probe run showed `solve_reward=0`, `progress_reward=0`, | |
| and `frac_reward_zero_std=1` across 10 grpo steps, the whole remote | |
| rollout stack was rewritten. what landed on `final-round`: | |
| - `sysadmin_env/server.py` now uses an **`HttpSessionStore`** (lru-bounded | |
| `OrderedDict` of `EpisodeSlot`s) keyed on a uuid `episode_id`, so | |
| `group_size > 1` rollouts no longer clobber each other | |
| - `sysadmin_env/models.py`: `Observation` gained `grader_health`, | |
| `grader_details`, `ood_http_code`; `StepRequest` gained optional | |
| `episode_id` | |
| - `training/remote_env.py`: client stores the `episode_id` from `/reset` | |
| and forwards it on every `/step`; reads the new observation fields | |
| into `info` | |
| - `training/rollout.py`: `RolloutRecord.reward` is now **cumulative**, | |
| plus a new `best_health` peak-health tracker and `last_reward` tail | |
| - `training/reward_functions.py`: `solve_reward` now triggers on | |
| `terminated` (not `reward >= 1.0` which never fired); | |
| `progress_reward` consumes `best_health` / `grader_health` with a | |
| cumulative-reward fallback for backward compat with older servers; | |
| `efficiency_reward` mirrors the terminated-flag logic | |
| - `training/hpc_openenv_gemma.py`: default `--model` now | |
| `Qwen/Qwen2.5-Coder-7B-Instruct` (kaggle a100 profile); default | |
| `--max-turns` bumped from 16 | |
| β 24 (multi-step scenarios routinely take 10+ turns on a 1.5b model) | |
| - the hf space at `huggingmenfordays/enterprise-hpc-openenv` has been | |
| force-pushed with these changes | |
| **before your next kaggle run**: `git pull` inside `/kaggle/working/repo` | |
| to grab these fixes. the live space has already been rebuilt. | |
| ## 1 [BLOCKER] capture a reward curve on a real gpu | |
| **partial credit already banked**: `docs/assets/reward_curve_demo.png` | |
| is committed β the gpu-free curriculum-annealed reward probe in | |
| `tools/reward_curve_demo.py` proves the shaped reward signal has a | |
| learnable gradient (0.03 β 0.51 over 24 curriculum steps). judges see | |
| a real curve immediately. run `make reward-demo` to regenerate it. | |
| we still want a real gpu grpo run for the "we trained a model" story: | |
| ### what to run | |
| open `training/hpc_colab.ipynb` in colab (pick L4 or A100, free T4 also | |
| works at group-size 2). run every cell. cell 6 now runs the gpu-free | |
| probe and inlines the png. cell 8 is the real grpo run. once that is | |
| done: | |
| ``` | |
| # in colab | |
| import matplotlib.pyplot as plt | |
| # cell 10 already plots from runs/*.metrics.jsonl, just save the figure | |
| plt.savefig('reward_curve.png', dpi=150, bbox_inches='tight') | |
| ``` | |
| ### what I need back | |
| 1. a png of the real grpo curve (save as `docs/assets/reward_curve.png`) | |
| 2. the final `runs/hpc_grpo_local/hpc_openenv_gemma.metrics.jsonl` | |
| 3. optionally: push the lora adapter to `huggingface.co/<you>/hpc-grpo-qwen2.5-coder-7b` | |
| once those are in the repo I will update `docs/pitch.md`, `docs/hf_blog.md`, | |
| and `README.md` to inline the chart and link the hub artifacts. | |
| ## 2 [BLOCKER] deploy the openenv server to a hf space - DONE | |
| space: https://huggingface.co/spaces/huggingmenfordays/enterprise-hpc-openenv | |
| live url: https://huggingmenfordays-enterprise-hpc-openenv.hf.space | |
| ### pushing updates to the space | |
| you only need the orphan-branch trick because our git history has | |
| `.venv/` + `docs/assets/*.png` binaries that hf xet will reject. do not | |
| try `git push space final-round:main` directly β it will fail with | |
| `pre-receive hook declined`. use this instead: | |
| ```bash | |
| hf auth login # once per machine | |
| git remote set-url space https://huggingface.co/spaces/huggingmenfordays/enterprise-hpc-openenv | |
| git checkout --orphan space-deploy | |
| git rm -rf --cached . | |
| rm -f docs/assets/reward_curve_demo.png # any binary that would trip xet | |
| git add -A | |
| git commit -m "deploy: clean snapshot for hf space" | |
| git push space space-deploy:main --force | |
| git checkout final-round | |
| git branch -D space-deploy | |
| git checkout HEAD -- docs/assets/reward_curve_demo.png | |
| ``` | |
| that force-pushes a one-commit history-less snapshot to the space's | |
| `main`. your local `final-round` is untouched. full explanation lives | |
| in [`docs/hf_spaces_deploy.md`](./docs/hf_spaces_deploy.md) Β§2.1. | |
| _original instructions below for reference_ | |
| ## 2 [reference] deploy the openenv server to a hf space | |
| judges will click "try it" in the submission form. without a live space | |
| they cannot hit the env. | |
| ### steps | |
| 1. `huggingface-cli login` with a token that has space-write permission | |
| 2. from this repo: | |
| ```bash | |
| huggingface-cli repo create enterprise-hpc-openenv \ | |
| --type space --space_sdk docker | |
| git remote add space https://huggingface.co/spaces/<you>/enterprise-hpc-openenv | |
| git push space main | |
| ``` | |
| 3. wait for the docker build (5-10 min first time) | |
| 4. confirm `curl https://<you>-enterprise-hpc-openenv.hf.space/health` returns 200 | |
| 5. send me the URL and I will wire it into `openenv.yaml` and the pitch | |
| ### notes | |
| the existing `Dockerfile` is already tuned. apparmor may block | |
| `fuse-overlayfs`, the copy fallback (p50 ~2.4 ms) still hits the latency | |
| target. if the build errors on `bubblewrap`, we can add `apt-get install -y` | |
| for it. | |
| ## 3 [BLOCKER] record a 90-second demo video | |
| the video is part of most hackathon submissions. script is in | |
| `docs/video_script.md`. | |
| ### shots to capture | |
| 1. `make gold` β quick pass, proves determinism (5 s) | |
| 2. `make bench` β show the 2.40 ms p50 number (10 s) | |
| 3. `make eval` β cat the leaderboard markdown (15 s) | |
| 4. the live agent solving `hpc_pid_stale` via | |
| `python -m training.train_hpc_outage --dry-run --group-size 1` or a | |
| trained checkpoint (40 s) | |
| 5. the reward curve chart (20 s) | |
| record with OBS or the built-in macOS screen recorder, upload to | |
| youtube or HF, paste the URL into `README.md` under a "demo" section and I | |
| will finalize. | |
| ## 4 [BONUS] give me access to a space url so I can wire things up | |
| once task 2 is done, paste the URL here and I will: | |
| - update `openenv.yaml` `runtime.server_entry_point` | |
| - add a "Try the env live" section to `README.md` and the HF blog | |
| - update `docs/pitch.md` to reference the live URL in the q&a prep | |
| ## 5 [BONUS] run a longer training session and push to the hub | |
| once task 1 is done and the pipeline is validated: | |
| ```bash | |
| python -m training.hpc_openenv_gemma \ | |
| --env-urls https://<you>-enterprise-hpc-openenv.hf.space \ | |
| --model Qwen/Qwen2.5-Coder-7B-Instruct \ | |
| --num-train-steps 600 \ | |
| --group-size 8 --max-turns 16 \ | |
| --hub-repo <you>/hpc-grpo-qwen2.5-coder-7b \ | |
| --wandb-project hpc-grpo | |
| ``` | |
| 600 steps at group-size 8 takes ~3 hours on an A100. this is what gets you | |
| "we actually trained a model that beats the baseline" for the rubric. | |
| ## 6 [POLISH] submission form metadata | |
| when you fill out the form: | |
| - **theme**: #3.1 World Modeling / Professional Tasks β specifically | |
| the Scaler AI Labs Multi-App RL Environment for Enterprise Workflows | |
| sub-theme. **single-theme submission**; do not list #2 as a secondary | |
| theme on the form (long-horizon planning falls out of the env as a | |
| property, not a separate theme claim) | |
| - **tagline**: "EnterpriseHPC-v0 β a multi-app, sub-3 ms-reset HPC SRE | |
| environment. Qwen2.5-Coder-7B learns to diagnose a 224-core Rocky | |
| Linux cluster end-to-end." | |
| - **links**: github repo, hf space, hf model repo, colab, video | |
| - **highlights**: multi-app (Slurm + OOD Apache + SSH + OverlayFS + | |
| NVIDIA driver + NFS + systemd + Munge), multi-node (nested bwrap), | |
| **six deterministic HPC scenarios** (`hpc_outage`, `hpc_munge`, | |
| `hpc_pid_stale`, `hpc_gpu_ecc`, `hpc_nfs_stale`, `hpc_ood_apache`) | |
| plus three warm-up curriculum scenarios (`nginx_crash`, `disk_full`, | |
| `network_broken`), <3 ms reset, gpu-free reward-curve demo in-repo, | |
| trained with TRL + Unsloth + `Qwen/Qwen2.5-Coder-7B-Instruct`. | |
| ## 7 [POLISH] things I can do as soon as you unblock | |
| once you have a GPU + HF account handy: | |
| - [ ] add the reward curve PNG to `docs/pitch.md` and `docs/hf_blog.md` | |
| - [ ] update `README.md` with the live HF Space URL | |
| - [ ] add a "trained checkpoint" section pointing at your HF model repo | |
| - [ ] write the final HF blog post draft and submit it | |
| - [ ] extend the scenario set if you want (see [extra ideas](#extra-ideas)) | |
| ## 8 [BLOCKER] submit the darn thing | |
| don't forget to actually click submit. past hackathon winners all had a | |
| running demo URL, a reward curve, and a 60-second elevator pitch. | |
| --- | |
| ## extra ideas (if we still have time) | |
| already shipped for round 2: | |
| - β **`hpc_gpu_ecc`** β compute node drained due to nvidia-smi ECC | |
| errors. fix loop: `sinfo`, `ssh compute-01`, `nvidia-smi`, | |
| `nvidia-smi -r -i 0`, `systemctl restart slurmd`, `exit`, `sinfo` | |
| - β **`hpc_nfs_stale`** β `/mnt/shared` stale nfs handle after a | |
| server failover. fix loop: `ls /mnt/shared` (errors), `umount -l | |
| /mnt/shared`, `mount /mnt/shared`, `systemctl restart slurmd` | |
| - β **`hpc_ood_apache`** β open ondemand portal degraded because of a | |
| httpd config typo on `:8081`. fix loop: `curl -I | |
| http://localhost:8081/` (502), `cat /etc/httpd/conf/httpd.conf`, | |
| `apachectl configtest`, `printf '<fixed>' > httpd.conf`, | |
| `apachectl graceful`, `curl -I http://localhost:8081/` (200) | |
| still on the wishlist if we have extra time: | |
| - **multi-node ssh traversal** β add compute-02 for a partition | |
| imbalance scenario | |
| - **`hpc_cgroup_oom`** β slurmd kills jobs because a system cgroup | |
| limit is set too low; fix by editing `/etc/slurm/cgroup.conf` | |
| - **`hpc_ldap_auth`** β user cannot ssh because sssd lost contact | |
| with ldap; fix by restarting sssd and clearing `/var/lib/sss/db` | |
| tell me which you want and I will drop them in (each one is ~150 loc). | |
| --- | |
| ## checklist to ship | |
| - [ ] 1. reward curve captured and committed | |
| - [ ] 2. HF Space deployed | |
| - [ ] 3. demo video recorded | |
| - [ ] 4. HF Space URL in this repo | |
| - [ ] 5. trained checkpoint on the hub | |
| - [ ] 6. submission form filled | |
| - [ ] 7. final PR merged and tagged | |
| - [ ] 8. submitted β | |