Spaces:
Running on CPU Upgrade
title: Fusion Design Lab
sdk: docker
app_port: 8000
short_description: OpenEnv stellarator design optimization environment
Fusion Design Lab
Fusion Design Lab is an environment-first OpenEnv hackathon project for the P1 stellarator benchmark.
Live Environment: HF Space Training Notebook: Repository Notebook (GRPO + HF TRL)
What It Does
An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to minimize max elongation while satisfying 3 hard physics constraints:
| Constraint | Bound |
|---|---|
aspect_ratio |
β€ 4.0 |
average_triangularity |
β€ -0.5 |
abs(edge_iota_over_nfp) |
β₯ 0.3 |
The environment uses constellaration as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes 26 discrete actions (4 parameters Γ 2 directions Γ 3 magnitudes + restore_best + submit), and submit remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.
Architecture
- Environment server (
server/): FastAPI app with/reset,/step,/health,/taskendpoints - Physics engine (
server/physics.py):constellarationVMEC-backed boundary evaluation - Models (
fusion_lab/models.py): Pydantic schemas for actions, observations, state - Client (
fusion_lab/client.py): Typed OpenEnv client for remote interaction - Training (
training/): GRPO notebook (HF TRL) and PPO smoke test
Current Status
P1is locked as the benchmark task withconstellarationas verifier of record- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
- Environment deployed to HF Spaces and verified (health, reset, step all operational)
- GRPO training notebook is checked into the repo and aligned with the shared
fusion_lab/llm_agent.pycontract - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
- The live low-fidelity reward is now
Reward V2: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms - Before/after trained-policy evidence on the current unified low-fidelity workflow is still open
Execution Status
- Lock the
P1contract in code - Rewrite shared models to the repaired low-dimensional
P1schema - Rewrite the environment loop to the repaired low-dimensional
P1schema - Update the API/task surface to match
P1 - Update baseline agents to the
P1contract - Add a post-terminal guard so
step()is a no-op afterdone=True - Re-run the baseline comparison on the
constellaration-backed branch state - Replace the synthetic evaluator with
constellaration - Add a runnable Northflank smoke workflow and note
- Pass the Northflank smoke test on the H100 workspace
- Verify the current 3-knob family against the real low-fidelity verifier
- Add a custom low-dimensional boundary builder with an explicit triangularity control knob
- Split boundary construction from boundary evaluation in
server/physics.py - Update the action contract from 3 knobs to the repaired low-dimensional family
- Add explicit VMEC failure semantics to the environment contract
- Collapse the live environment to one low-fidelity truth surface while keeping explicit
submit - Add tracked
P1fixtures underserver/data/p1/ - Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
- Complete paired high-fidelity validation artifacts outside the live environment path
- Refresh the heuristic baseline for the real verifier path
- Deploy the real environment to HF Space
- Add the public training notebook under
training/notebooks
Known Gaps
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept
average_triangularityat roughly+0.004975andp1_feasibilityat roughly1.00995, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live. - The repaired family now has a first coarse measured sweep note in docs/P1_MEASURED_SWEEP_NOTE.md, but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in
baselines/fixture_high_fidelity_pairs.json. - The live environment now uses one low-fidelity verifier surface for
run,restore_best, andsubmit. Keep high-fidelity checks inbaselines/high_fidelity_validation.pyand other offline validation artifacts rather than mixing them back into the environment reward loop. - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
- Budget exhaustion now returns a smaller terminal reward than explicit
submit; keep that asymmetry when tuning reward so agents still prefer deliberate submission. - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh
uv run python baselines/compare.py 5rerun, it finished with5/5feasible submitted finals, mean finalP1score0.291951, and5/5wins over random. - The first low-fidelity manual playtest note is in docs/P1_MANUAL_PLAYTEST_LOG.md. The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
- The first tiny PPO smoke note is in docs/P1_PPO_SMOKE_NOTE.md. The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
Current mode:
- strategic task choice is already locked
- the next work is reset-seed confirmation, trace export, and deployment
- new planning text should only appear when a real blocker forces a decision change
Planned Repository Layout
fusion-design-lab/
βββ baselines/
βββ demo/
βββ docs/
βββ fusion_lab/
βββ server/
βββ server/data/p1/
βββ training/
βββ openenv.yaml
βββ pyproject.toml
βββ README.md
Setup
Base runtime:
uv sync
Development tooling:
uv sync --extra dev
pre-commit install
Optional local notebook tooling:
uv sync --extra notebooks
Runtime Assumptions
- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
- OpenEnv deployment target: Hugging Face Spaces
- Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically
- Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment
- Verifier of record:
constellaration.problems.GeometricalProblem - Environment style: fresh wiring in this repo, not a port of the old
ai-sci-feasible-designsharness - Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
- Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
- Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook
Immediate Next Steps
- Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
- Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
- Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
- Decide whether any reset seed should move based on the measured sweep plus those paired checks.
- Save one fixed-seed untrained baseline with
training/llm_rollout.py evaluate. - Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
- Re-run the same seeds after training and save one before/after artifact.
- Save one presentation-ready comparison trace from the refreshed heuristic baseline.
- Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
- Deploy the environment to HF Space.
- Add the public training notebook under
training/notebooks.
These are implementation steps, not another planning phase.
Fixture Policy
This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.
Allowed examples:
- a known-good or near-winning
P1boundary - near-boundary cases
- clearly bad cases
Disallowed:
- porting the old planner, governor, or experiment harness into this repo
Technical Spec
The focused technical plan for the repaired P1 environment lives in docs/P1_ENV_CONTRACT_V1.md.
Hackathon Working Note
This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.