Spaces:

CreativeEngineer
/

fusion-design-lab

Running on CPU Upgrade

App Files Files Community

fusion-design-lab / README.md

CreativeEngineer

feat: reward verifier alignment, notebook hardening, model name fix

cdc237b 6 days ago

preview code

raw

history blame contribute delete

9.89 kB

metadata

title: Fusion Design Lab
sdk: docker
app_port: 8000
short_description: OpenEnv stellarator design optimization environment

Fusion Design Lab

Fusion Design Lab is an environment-first OpenEnv hackathon project for the P1 stellarator benchmark.

Live Environment: HF Space Training Notebook: Repository Notebook (GRPO + HF TRL)

What It Does

An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to minimize max elongation while satisfying 3 hard physics constraints:

Constraint	Bound
`aspect_ratio`	≤ 4.0
`average_triangularity`	≤ -0.5
`abs(edge_iota_over_nfp)`	≥ 0.3

The environment uses constellaration as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes 26 discrete actions (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), and submit remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.

Architecture

Environment server (server/): FastAPI app with /reset, /step, /health, /task endpoints
Physics engine (server/physics.py): constellaration VMEC-backed boundary evaluation
Models (fusion_lab/models.py): Pydantic schemas for actions, observations, state
Client (fusion_lab/client.py): Typed OpenEnv client for remote interaction
Training (training/): GRPO notebook (HF TRL) and PPO smoke test

Current Status

P1 is locked as the benchmark task with constellaration as verifier of record
The repaired 4-knob low-dimensional boundary family is wired into the runtime path
Environment deployed to HF Spaces and verified (health, reset, step all operational)
GRPO training notebook is checked into the repo and aligned with the shared fusion_lab/llm_agent.py contract
LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
The live low-fidelity reward is now Reward V2: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
Before/after trained-policy evidence on the current unified low-fidelity workflow is still open

Execution Status

Lock the P1 contract in code
Rewrite shared models to the repaired low-dimensional P1 schema
Rewrite the environment loop to the repaired low-dimensional P1 schema
Update the API/task surface to match P1
Update baseline agents to the P1 contract
Add a post-terminal guard so step() is a no-op after done=True
Re-run the baseline comparison on the constellaration-backed branch state
Replace the synthetic evaluator with constellaration
Add a runnable Northflank smoke workflow and note
Pass the Northflank smoke test on the H100 workspace
Verify the current 3-knob family against the real low-fidelity verifier
Add a custom low-dimensional boundary builder with an explicit triangularity control knob
Split boundary construction from boundary evaluation in server/physics.py
Update the action contract from 3 knobs to the repaired low-dimensional family
Add explicit VMEC failure semantics to the environment contract
Collapse the live environment to one low-fidelity truth surface while keeping explicit submit
Add tracked P1 fixtures under server/data/p1/
Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
Complete paired high-fidelity validation artifacts outside the live environment path
Refresh the heuristic baseline for the real verifier path
Deploy the real environment to HF Space
Add the public training notebook under training/notebooks

Known Gaps

Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept average_triangularity at roughly +0.004975 and p1_feasibility at roughly 1.00995, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
The repaired family now has a first coarse measured sweep note in docs/P1_MEASURED_SWEEP_NOTE.md, but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in baselines/fixture_high_fidelity_pairs.json.
The live environment now uses one low-fidelity verifier surface for run, restore_best, and submit. Keep high-fidelity checks in baselines/high_fidelity_validation.py and other offline validation artifacts rather than mixing them back into the environment reward loop.
VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
Budget exhaustion now returns a smaller terminal reward than explicit submit; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh uv run python baselines/compare.py 5 rerun, it finished with 5/5 feasible submitted finals, mean final P1 score 0.291951, and 5/5 wins over random.
The first low-fidelity manual playtest note is in docs/P1_MANUAL_PLAYTEST_LOG.md. The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
The first tiny PPO smoke note is in docs/P1_PPO_SMOKE_NOTE.md. The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.

Current mode:

strategic task choice is already locked
the next work is reset-seed confirmation, trace export, and deployment
new planning text should only appear when a real blocker forces a decision change

Planned Repository Layout

fusion-design-lab/
├── baselines/
├── demo/
├── docs/
├── fusion_lab/
├── server/
├── server/data/p1/
├── training/
├── openenv.yaml
├── pyproject.toml
└── README.md

Setup

Base runtime:

uv sync

Development tooling:

uv sync --extra dev
pre-commit install

Optional local notebook tooling:

uv sync --extra notebooks

Runtime Assumptions

Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
OpenEnv deployment target: Hugging Face Spaces
Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically
Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment
Verifier of record: constellaration.problems.GeometricalProblem
Environment style: fresh wiring in this repo, not a port of the old ai-sci-feasible-designs harness
Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook

Immediate Next Steps

Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
Decide whether any reset seed should move based on the measured sweep plus those paired checks.
Save one fixed-seed untrained baseline with training/llm_rollout.py evaluate.
Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
Re-run the same seeds after training and save one before/after artifact.
Save one presentation-ready comparison trace from the refreshed heuristic baseline.
Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
Deploy the environment to HF Space.
Add the public training notebook under training/notebooks.

These are implementation steps, not another planning phase.

Fixture Policy

This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.

Allowed examples:

a known-good or near-winning P1 boundary
near-boundary cases
clearly bad cases

Disallowed:

porting the old planner, governor, or experiment harness into this repo

Technical Spec

The focused technical plan for the repaired P1 environment lives in docs/P1_ENV_CONTRACT_V1.md.

Hackathon Working Note

This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.