Spaces:

CreativeEngineer
/

fusion-design-lab

Running on CPU Upgrade

App Files Files Community

fusion-design-lab / README.md

CreativeEngineer

feat: reward verifier alignment, notebook hardening, model name fix

cdc237b 7 days ago

preview code

raw

history blame contribute delete

9.89 kB

	---
	title: Fusion Design Lab
	sdk: docker
	app_port: 8000
	short_description: OpenEnv stellarator design optimization environment
	---

	# Fusion Design Lab

	Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.

	Live Environment: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
	Training Notebook: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb)

	## What It Does

	An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to minimize max elongation while satisfying 3 hard physics constraints:

	\| Constraint \| Bound \|
	\|---\|---\|
	\| `aspect_ratio` \| ≤ 4.0 \|
	\| `average_triangularity` \| ≤ -0.5 \|
	\| `abs(edge_iota_over_nfp)` \| ≥ 0.3 \|

	The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes 26 discrete actions (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.

	## Architecture

	- Environment server (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
	- Physics engine (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
	- Models (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
	- Client (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
	- Training (`training/`): GRPO notebook (HF TRL) and PPO smoke test

	## Current Status

	- `P1` is locked as the benchmark task with `constellaration` as verifier of record
	- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
	- Environment deployed to HF Spaces and verified (health, reset, step all operational)
	- GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
	- LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
	- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
	- The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
	- Before/after trained-policy evidence on the current unified low-fidelity workflow is still open

	## Execution Status

	- [x] Lock the `P1` contract in code
	- [x] Rewrite shared models to the repaired low-dimensional `P1` schema
	- [x] Rewrite the environment loop to the repaired low-dimensional `P1` schema
	- [x] Update the API/task surface to match `P1`
	- [x] Update baseline agents to the `P1` contract
	- [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
	- [x] Re-run the baseline comparison on the `constellaration`-backed branch state
	- [x] Replace the synthetic evaluator with `constellaration`
	- [x] Add a runnable Northflank smoke workflow and note
	- [x] Pass the Northflank smoke test on the H100 workspace
	- [x] Verify the current 3-knob family against the real low-fidelity verifier
	- [x] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
	- [x] Split boundary construction from boundary evaluation in `server/physics.py`
	- [x] Update the action contract from 3 knobs to the repaired low-dimensional family
	- [x] Add explicit VMEC failure semantics to the environment contract
	- [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit`
	- [x] Add tracked `P1` fixtures under `server/data/p1/`
	- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
	- [x] Complete paired high-fidelity validation artifacts outside the live environment path
	- [x] Refresh the heuristic baseline for the real verifier path
	- [x] Deploy the real environment to HF Space
	- [x] Add the public training notebook under `training/notebooks`

	## Known Gaps

	- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
	- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
	- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
	- The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop.
	- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
	- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
	- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
	- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
	- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.

	Current mode:

	- strategic task choice is already locked
	- the next work is reset-seed confirmation, trace export, and deployment
	- new planning text should only appear when a real blocker forces a decision change

	## Planned Repository Layout

	```text
	fusion-design-lab/
	├── baselines/
	├── demo/
	├── docs/
	├── fusion_lab/
	├── server/
	├── server/data/p1/
	├── training/
	├── openenv.yaml
	├── pyproject.toml
	└── README.md
	```

	## Setup

	Base runtime:

	```bash
	uv sync
	```

	Development tooling:

	```bash
	uv sync --extra dev
	pre-commit install
	```

	Optional local notebook tooling:

	```bash
	uv sync --extra notebooks
	```

	## Runtime Assumptions

	- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
	- OpenEnv deployment target: Hugging Face Spaces
	- Submission notebook surface: one public notebook artifact; mirror it to Colab if the submission form still requires Colab specifically
	- Required notebook artifact: one public notebook that demonstrates trained-policy behavior against the environment
	- Verifier of record: `constellaration.problems.GeometricalProblem`
	- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
	- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
	- Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
	- Preferred notebook/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook

	## Immediate Next Steps

	- [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
	- [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
	- [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
	- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
	- [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
	- [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
	- [ ] Re-run the same seeds after training and save one before/after artifact.
	- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
	- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
	- [x] Deploy the environment to HF Space.
	- [x] Add the public training notebook under `training/notebooks`.

	These are implementation steps, not another planning phase.

	## Fixture Policy

	This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.

	Allowed examples:

	- a known-good or near-winning `P1` boundary
	- near-boundary cases
	- clearly bad cases

	Disallowed:

	- porting the old planner, governor, or experiment harness into this repo

	## Technical Spec

	The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md).

	## Hackathon Working Note

	This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.