Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| title: Fusion Design Lab | |
| sdk: docker | |
| app_port: 8000 | |
| short_description: OpenEnv stellarator design optimization environment | |
| # Fusion Design Lab | |
| Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark. | |
| **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab) | |
| **Training Notebook**: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb) | |
| ## What It Does | |
| An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints: | |
| | Constraint | Bound | | |
| |---|---| | |
| | `aspect_ratio` | β€ 4.0 | | |
| | `average_triangularity` | β€ -0.5 | | |
| | `abs(edge_iota_over_nfp)` | β₯ 0.3 | | |
| The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes **26 discrete actions** (4 parameters Γ 2 directions Γ 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode. | |
| ## Architecture | |
| - **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints | |
| - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation | |
| - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state | |
| - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction | |
| - **Training** (`training/`): GRPO notebook (HF TRL) and PPO smoke test | |
| ## Current Status | |
| - `P1` is locked as the benchmark task with `constellaration` as verifier of record | |
| - The repaired 4-knob low-dimensional boundary family is wired into the runtime path | |
| - Environment deployed to HF Spaces and verified (health, reset, step all operational) | |
| - GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract | |
| - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries | |
| - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist | |
| - The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms | |
| - Before/after trained-policy evidence on the current unified low-fidelity workflow is still open | |
| ## Execution Status | |
| - [x] Lock the `P1` contract in code | |
| - [x] Rewrite shared models to the repaired low-dimensional `P1` schema | |
| - [x] Rewrite the environment loop to the repaired low-dimensional `P1` schema | |
| - [x] Update the API/task surface to match `P1` | |
| - [x] Update baseline agents to the `P1` contract | |
| - [x] Add a post-terminal guard so `step()` is a no-op after `done=True` | |
| - [x] Re-run the baseline comparison on the `constellaration`-backed branch state | |
| - [x] Replace the synthetic evaluator with `constellaration` | |
| - [x] Add a runnable Northflank smoke workflow and note | |
| - [x] Pass the Northflank smoke test on the H100 workspace | |
| - [x] Verify the current 3-knob family against the real low-fidelity verifier | |
| - [x] Add a custom low-dimensional boundary builder with an explicit triangularity control knob | |
| - [x] Split boundary construction from boundary evaluation in `server/physics.py` | |
| - [x] Update the action contract from 3 knobs to the repaired low-dimensional family | |
| - [x] Add explicit VMEC failure semantics to the environment contract | |
| - [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit` | |
| - [x] Add tracked `P1` fixtures under `server/data/p1/` | |
| - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact | |
| - [x] Complete paired high-fidelity validation artifacts outside the live environment path | |
| - [x] Refresh the heuristic baseline for the real verifier path | |
| - [x] Deploy the real environment to HF Space | |
| - [x] Add the public training notebook under `training/notebooks` | |
| ## Known Gaps | |
| - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live. | |
| - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks. | |
| - The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`. | |
| - The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop. | |
| - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty. | |
| - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission. | |
| - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random. | |
| - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence. | |
| - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage. | |
| Current mode: | |
| - strategic task choice is already locked | |
| - the next work is reset-seed confirmation, trace export, and deployment | |
| - new planning text should only appear when a real blocker forces a decision change | |
| ## Planned Repository Layout | |
| ```text | |
| fusion-design-lab/ | |
| βββ baselines/ | |
| βββ demo/ | |
| βββ docs/ | |
| βββ fusion_lab/ | |
| βββ server/ | |
| βββ server/data/p1/ | |
| βββ training/ | |
| βββ openenv.yaml | |
| βββ pyproject.toml | |
| βββ README.md | |
| ``` | |
| ## Setup | |
| Base runtime: | |
| ```bash | |
| uv sync | |
| ``` | |
| Development tooling: | |
| ```bash | |
| uv sync --extra dev | |
| pre-commit install | |
| ``` | |
| Optional local notebook tooling: | |
| ```bash | |
| uv sync --extra notebooks | |
| ``` | |
| ## Runtime Assumptions | |
| - Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100 | |
| - OpenEnv deployment target: Hugging Face Spaces | |
| - Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically | |
| - Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment | |
| - Verifier of record: `constellaration.problems.GeometricalProblem` | |
| - Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness | |
| - Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data | |
| - Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually | |
| - Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook | |
| ## Immediate Next Steps | |
| - [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode. | |
| - [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run. | |
| - [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any. | |
| - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks. | |
| - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`. | |
| - [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow. | |
| - [ ] Re-run the same seeds after training and save one before/after artifact. | |
| - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline. | |
| - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path. | |
| - [x] Deploy the environment to HF Space. | |
| - [x] Add the public training notebook under `training/notebooks`. | |
| These are implementation steps, not another planning phase. | |
| ## Fixture Policy | |
| This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures. | |
| Allowed examples: | |
| - a known-good or near-winning `P1` boundary | |
| - near-boundary cases | |
| - clearly bad cases | |
| Disallowed: | |
| - porting the old planner, governor, or experiment harness into this repo | |
| ## Technical Spec | |
| The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md). | |
| ## Hackathon Working Note | |
| This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon. | |