fusion-design-lab / README.md
CreativeEngineer's picture
feat: reward verifier alignment, notebook hardening, model name fix
cdc237b
---
title: Fusion Design Lab
sdk: docker
app_port: 8000
short_description: OpenEnv stellarator design optimization environment
---
# Fusion Design Lab
Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
**Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
**Training Notebook**: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb)
## What It Does
An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:
| Constraint | Bound |
|---|---|
| `aspect_ratio` | ≀ 4.0 |
| `average_triangularity` | ≀ -0.5 |
| `abs(edge_iota_over_nfp)` | β‰₯ 0.3 |
The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes **26 discrete actions** (4 parameters Γ— 2 directions Γ— 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.
## Architecture
- **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
- **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
- **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
- **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
- **Training** (`training/`): GRPO notebook (HF TRL) and PPO smoke test
## Current Status
- `P1` is locked as the benchmark task with `constellaration` as verifier of record
- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
- Environment deployed to HF Spaces and verified (health, reset, step all operational)
- GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
- LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
- The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
- Before/after trained-policy evidence on the current unified low-fidelity workflow is still open
## Execution Status
- [x] Lock the `P1` contract in code
- [x] Rewrite shared models to the repaired low-dimensional `P1` schema
- [x] Rewrite the environment loop to the repaired low-dimensional `P1` schema
- [x] Update the API/task surface to match `P1`
- [x] Update baseline agents to the `P1` contract
- [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
- [x] Re-run the baseline comparison on the `constellaration`-backed branch state
- [x] Replace the synthetic evaluator with `constellaration`
- [x] Add a runnable Northflank smoke workflow and note
- [x] Pass the Northflank smoke test on the H100 workspace
- [x] Verify the current 3-knob family against the real low-fidelity verifier
- [x] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
- [x] Split boundary construction from boundary evaluation in `server/physics.py`
- [x] Update the action contract from 3 knobs to the repaired low-dimensional family
- [x] Add explicit VMEC failure semantics to the environment contract
- [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit`
- [x] Add tracked `P1` fixtures under `server/data/p1/`
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
- [x] Complete paired high-fidelity validation artifacts outside the live environment path
- [x] Refresh the heuristic baseline for the real verifier path
- [x] Deploy the real environment to HF Space
- [x] Add the public training notebook under `training/notebooks`
## Known Gaps
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
- The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop.
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
Current mode:
- strategic task choice is already locked
- the next work is reset-seed confirmation, trace export, and deployment
- new planning text should only appear when a real blocker forces a decision change
## Planned Repository Layout
```text
fusion-design-lab/
β”œβ”€β”€ baselines/
β”œβ”€β”€ demo/
β”œβ”€β”€ docs/
β”œβ”€β”€ fusion_lab/
β”œβ”€β”€ server/
β”œβ”€β”€ server/data/p1/
β”œβ”€β”€ training/
β”œβ”€β”€ openenv.yaml
β”œβ”€β”€ pyproject.toml
└── README.md
```
## Setup
Base runtime:
```bash
uv sync
```
Development tooling:
```bash
uv sync --extra dev
pre-commit install
```
Optional local notebook tooling:
```bash
uv sync --extra notebooks
```
## Runtime Assumptions
- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
- OpenEnv deployment target: Hugging Face Spaces
- Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically
- Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment
- Verifier of record: `constellaration.problems.GeometricalProblem`
- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
- Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
- Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook
## Immediate Next Steps
- [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
- [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
- [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
- [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
- [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
- [ ] Re-run the same seeds after training and save one before/after artifact.
- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
- [x] Deploy the environment to HF Space.
- [x] Add the public training notebook under `training/notebooks`.
These are implementation steps, not another planning phase.
## Fixture Policy
This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.
Allowed examples:
- a known-good or near-winning `P1` boundary
- near-boundary cases
- clearly bad cases
Disallowed:
- porting the old planner, governor, or experiment harness into this repo
## Technical Spec
The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md).
## Hackathon Working Note
This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.