Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 9,893 Bytes
cdc237b 65b799e c647aa0 65b799e c647aa0 cdc237b 65b799e c647aa0 65b799e c647aa0 65b799e c647aa0 2fccde8 d22b376 cdc237b 65b799e c647aa0 e815b38 c647aa0 cdc237b 65b799e c647aa0 65b799e c647aa0 e8e5af5 c647aa0 cdc237b daba1b9 6deaccc daba1b9 d58c100 78463b7 3f7be89 acb992c fe3a41d cdc237b e815b38 c3a24db cdc237b f238af4 c647aa0 2348d3e daba1b9 5271cce cdc237b c3a24db cdc237b fe3a41d daba1b9 cdc237b c647aa0 5354ca9 f238af4 5354ca9 65b799e 5354ca9 65b799e 5354ca9 2348d3e 5354ca9 2348d3e 5354ca9 65b799e c3a24db cdc237b e815b38 e8e5af5 cdc237b e8e5af5 f238af4 e815b38 c647aa0 2348d3e 5354ca9 65b799e acb992c 98ffb4a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | ---
title: Fusion Design Lab
sdk: docker
app_port: 8000
short_description: OpenEnv stellarator design optimization environment
---
# Fusion Design Lab
Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
**Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
**Training Notebook**: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb)
## What It Does
An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:
| Constraint | Bound |
|---|---|
| `aspect_ratio` | β€ 4.0 |
| `average_triangularity` | β€ -0.5 |
| `abs(edge_iota_over_nfp)` | β₯ 0.3 |
The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes **26 discrete actions** (4 parameters Γ 2 directions Γ 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.
## Architecture
- **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
- **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
- **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
- **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
- **Training** (`training/`): GRPO notebook (HF TRL) and PPO smoke test
## Current Status
- `P1` is locked as the benchmark task with `constellaration` as verifier of record
- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
- Environment deployed to HF Spaces and verified (health, reset, step all operational)
- GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
- LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
- The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
- Before/after trained-policy evidence on the current unified low-fidelity workflow is still open
## Execution Status
- [x] Lock the `P1` contract in code
- [x] Rewrite shared models to the repaired low-dimensional `P1` schema
- [x] Rewrite the environment loop to the repaired low-dimensional `P1` schema
- [x] Update the API/task surface to match `P1`
- [x] Update baseline agents to the `P1` contract
- [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
- [x] Re-run the baseline comparison on the `constellaration`-backed branch state
- [x] Replace the synthetic evaluator with `constellaration`
- [x] Add a runnable Northflank smoke workflow and note
- [x] Pass the Northflank smoke test on the H100 workspace
- [x] Verify the current 3-knob family against the real low-fidelity verifier
- [x] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
- [x] Split boundary construction from boundary evaluation in `server/physics.py`
- [x] Update the action contract from 3 knobs to the repaired low-dimensional family
- [x] Add explicit VMEC failure semantics to the environment contract
- [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit`
- [x] Add tracked `P1` fixtures under `server/data/p1/`
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
- [x] Complete paired high-fidelity validation artifacts outside the live environment path
- [x] Refresh the heuristic baseline for the real verifier path
- [x] Deploy the real environment to HF Space
- [x] Add the public training notebook under `training/notebooks`
## Known Gaps
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
- The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop.
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
Current mode:
- strategic task choice is already locked
- the next work is reset-seed confirmation, trace export, and deployment
- new planning text should only appear when a real blocker forces a decision change
## Planned Repository Layout
```text
fusion-design-lab/
βββ baselines/
βββ demo/
βββ docs/
βββ fusion_lab/
βββ server/
βββ server/data/p1/
βββ training/
βββ openenv.yaml
βββ pyproject.toml
βββ README.md
```
## Setup
Base runtime:
```bash
uv sync
```
Development tooling:
```bash
uv sync --extra dev
pre-commit install
```
Optional local notebook tooling:
```bash
uv sync --extra notebooks
```
## Runtime Assumptions
- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
- OpenEnv deployment target: Hugging Face Spaces
- Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically
- Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment
- Verifier of record: `constellaration.problems.GeometricalProblem`
- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
- Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
- Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook
## Immediate Next Steps
- [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
- [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
- [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
- [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
- [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
- [ ] Re-run the same seeds after training and save one before/after artifact.
- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
- [x] Deploy the environment to HF Space.
- [x] Add the public training notebook under `training/notebooks`.
These are implementation steps, not another planning phase.
## Fixture Policy
This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.
Allowed examples:
- a known-good or near-winning `P1` boundary
- near-boundary cases
- clearly bad cases
Disallowed:
- porting the old planner, governor, or experiment harness into this repo
## Technical Spec
The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md).
## Hackathon Working Note
This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.
|