Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

CreativeEngineer Claude Opus 4.6 commited on Mar 8

Commit

cdc237b

1 Parent(s): 1ddf57a

feat: reward verifier alignment, notebook hardening, model name fix

Squashed 18 commits for HF Space deployment:
- fix: correct model name from Qwen3.5-4B to Qwen3-4B
- fix: harden notebook dependency bootstrap
- fix: align notebook prompts with chat templates
- fix: clean Dockerfile PYTHONPATH for HF build
- fix: stabilize environment submit flow and training references
- fix: add contract compatibility check to remote HF Space demo
- fix: write full Reward V2 breakdown and harden notebook
- fix: make notebook evaluation reproducible
- fix: tighten reward v2 bookkeeping
- feat: add untrained model baseline and before/after comparison
- chore: remove auto-generated matplotlib PNG artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (35) hide show

server/Dockerfile → Dockerfile +1 -1
README.md +21 -17
TODO.md +11 -4
assets/p1_seeds/creative_best.json +351 -0
assets/p1_seeds/creative_seed.json +351 -0
assets/p1_seeds/egodos_seed.json +120 -0
assets/p1_seeds/egodos_sparse_rgroup_best.json +120 -0
assets/p1_seeds/manifest.json +81 -0
assets/p1_seeds/samet_seed.json +82 -0
assets/p1_seeds/scadena_repaired_best.json +120 -0
assets/p1_seeds/scadena_seed.json +120 -0
baselines/README.md +1 -1
baselines/compare.py +8 -6
baselines/fixture_high_fidelity_pairs.json +1 -1
baselines/heuristic_agent.py +1 -1
baselines/high_fidelity_validation.py +2 -4
baselines/replay_playtest.py +1 -1
baselines/submit_side_trace.json +14 -16
baselines/sweep_results/measured_sweep_20260308T045600Z.json +0 -1308
docs/FUSION_DESIGN_LAB_PLAN_V2.md +15 -16
docs/P1_ENV_CONTRACT_V1.md +34 -21
docs/p1_transfer/p1_reward_transfer.md +40 -0
docs/p1_transfer/p1_seed_selection.md +31 -0
fusion_lab/llm_agent.py +31 -7
fusion_lab/models.py +9 -5
server/app.py +3 -3
server/data/README.md +0 -7
server/data/p1/bad_low_iota.json +1 -1
server/data/p1/boundary_default_reset.json +1 -1
server/data/p1/lowfi_feasible_local.json +1 -1
server/environment.py +220 -63
training/README.md +5 -5
training/llm_rollout.py +14 -12
training/notebooks/README.md +3 -3
training/notebooks/fusion_design_lab_training.ipynb +592 -68

server/Dockerfile → Dockerfile RENAMED Viewed

@@ -18,7 +18,7 @@ RUN pip install --no-cache-dir \
 COPY . /app/env
-ENV PYTHONPATH="/app/env:$PYTHONPATH"
 ENV ENABLE_WEB_INTERFACE=true
 EXPOSE 8000

 COPY . /app/env
+ENV PYTHONPATH="/app/env"
 ENV ENABLE_WEB_INTERFACE=true
 EXPOSE 8000

README.md CHANGED Viewed

@@ -1,9 +1,16 @@
 # Fusion Design Lab
 Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
 **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
-**Training Notebook**: [Repository Notebook (GRPO + Unsloth)](training/notebooks/fusion_design_lab_training.ipynb)
 ## What It Does
@@ -15,7 +22,7 @@ An RL environment where agents optimize stellarator fusion reactor designs by ad
 | `average_triangularity` | ≤ -0.5 |
 | `abs(edge_iota_over_nfp)` | ≥ 0.3 |
-The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. The live environment still exposes **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), but the standard GRPO notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on the low-fidelity `run` surface and ignore `submit` by default.
 ## Architecture
@@ -23,7 +30,7 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
 - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
 - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
 - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
-- **Training** (`training/`): GRPO notebook (Unsloth + TRL) and PPO smoke test
 ## Current Status
@@ -33,7 +40,8 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
 - GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
 - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
 - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
-- Before/after trained-policy evidence on the current low-fidelity-only workflow is still open
 ## Execution Status
@@ -52,11 +60,10 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
 - [x] Split boundary construction from boundary evaluation in `server/physics.py`
 - [x] Update the action contract from 3 knobs to the repaired low-dimensional family
 - [x] Add explicit VMEC failure semantics to the environment contract
-- [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
-- [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
 - [x] Add tracked `P1` fixtures under `server/data/p1/`
 - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
-- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
 - [x] Refresh the heuristic baseline for the real verifier path
 - [x] Deploy the real environment to HF Space
 - [x] Add the public training notebook under `training/notebooks`
@@ -64,16 +71,13 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
 ## Known Gaps
 - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
-- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
 - The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
-- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `from_boundary_resolution`; do not present step-time metrics as final submission metrics.
-- The standard LLM training and evaluation workflow is now low-fidelity-only: the repo notebook and `training/llm_rollout.py` `monitor` / `evaluate` ignore `submit` by default. Reserve `submit` for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
-- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
-- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
-- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
-- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
 - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
 Current mode:
@@ -134,11 +138,11 @@ uv sync --extra notebooks
 ## Immediate Next Steps
 - [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
-- [x] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
-- [x] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
 - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
 - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
-- [ ] Run one short H100 GRPO pass with the repository notebook on the same low-fidelity-only workflow.
 - [ ] Re-run the same seeds after training and save one before/after artifact.
 - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.

+---
+title: Fusion Design Lab
+sdk: docker
+app_port: 8000
+short_description: OpenEnv stellarator design optimization environment
+---
 # Fusion Design Lab
 Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
 **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
+**Training Notebook**: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb)
 ## What It Does
 | `average_triangularity` | ≤ -0.5 |
 | `abs(edge_iota_over_nfp)` | ≥ 0.3 |
+The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.
 ## Architecture
 - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
 - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
 - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
+- **Training** (`training/`): GRPO notebook (HF TRL) and PPO smoke test
 ## Current Status
 - GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
 - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
 - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
+- The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
+- Before/after trained-policy evidence on the current unified low-fidelity workflow is still open
 ## Execution Status
 - [x] Split boundary construction from boundary evaluation in `server/physics.py`
 - [x] Update the action contract from 3 knobs to the repaired low-dimensional family
 - [x] Add explicit VMEC failure semantics to the environment contract
+- [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit`
 - [x] Add tracked `P1` fixtures under `server/data/p1/`
 - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
+- [x] Complete paired high-fidelity validation artifacts outside the live environment path
 - [x] Refresh the heuristic baseline for the real verifier path
 - [x] Deploy the real environment to HF Space
 - [x] Add the public training notebook under `training/notebooks`
 ## Known Gaps
 - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
+- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
 - The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
+- The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
+- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
+- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
 - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
 Current mode:
 ## Immediate Next Steps
 - [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
+- [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
+- [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
 - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
 - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
+- [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
 - [ ] Re-run the same seeds after training and save one before/after artifact.
 - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.

TODO.md CHANGED Viewed

@@ -61,7 +61,7 @@ flowchart TD
     P --> F["Tiny PPO Smoke"]
     F --> E["Fixture Checks"]
     E --> G["Submit-side Manual Playtest"]
-    G --> H["Reward V1"]
     H --> I["Baselines"]
     I --> J["HF Space Deploy"]
     J --> K["Colab Notebook"]
@@ -112,7 +112,7 @@ flowchart TD
 - [x] Replace the synthetic physics path with `constellaration` wiring
   Files:
   [server/physics.py](server/physics.py),
-  [server/Dockerfile](server/Dockerfile),
   [pyproject.toml](pyproject.toml)
 - [x] Update the API/task surface to match `P1`
@@ -220,6 +220,13 @@ flowchart TD
   [AGENTS.md](AGENTS.md),
   [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
 - [x] Write down why `Reward V0` did not survive unchanged
   Goal:
   document the concrete pathology: pure `Δ official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
@@ -294,6 +301,6 @@ flowchart TD
 - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
 - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
 - [ ] Do not describe the current baseline reset state as feasible or near-feasible
-- [x] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
   Note:
-  completed by recording the concrete `Reward V0` pathology and only then moving to `Reward V1`

     P --> F["Tiny PPO Smoke"]
     F --> E["Fixture Checks"]
     E --> G["Submit-side Manual Playtest"]
+    G --> H["Reward V2"]
     H --> I["Baselines"]
     I --> J["HF Space Deploy"]
     J --> K["Colab Notebook"]
 - [x] Replace the synthetic physics path with `constellaration` wiring
   Files:
   [server/physics.py](server/physics.py),
+  [Dockerfile](Dockerfile),
   [pyproject.toml](pyproject.toml)
 - [x] Update the API/task surface to match `P1`
   [AGENTS.md](AGENTS.md),
   [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
+- [x] Update reward from `V1` to `V2` after the verifier-native shaping exposed short-horizon gaps
+  Goal:
+  add bounded new-best, near-feasible, and anti-stagnation terms without breaking the verifier-native reward story
+  Related:
+  [AGENTS.md](AGENTS.md),
+  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
 - [x] Write down why `Reward V0` did not survive unchanged
   Goal:
   document the concrete pathology: pure `Δ official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
 - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
 - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
 - [ ] Do not describe the current baseline reset state as feasible or near-feasible
+- [x] Do not force a new reward-version story until the previous reward version shows a real pathology
   Note:
+  completed by recording the concrete `Reward V0` pathology before `Reward V1`, then recording the concrete short-horizon `Reward V1` gaps before `Reward V2`

assets/p1_seeds/creative_best.json ADDED Viewed

	@@ -0,0 +1,351 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      1.0,
+      0.3178024006853376,
+      -0.00494453968429039,
+      -9.008828074894216e-05,
+      0.00034826523984284985,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.0284150340556629e-05,
+      0.0,
+      0.00022783626301464338,
+      -0.002504825752146429,
+      -0.0019791928302356574,
+      -0.05986009847084577,
+      0.2378930884212573,
+      0.07041809177925817,
+      -0.03405649158367229,
+      -0.001255290887707734,
+      0.00015598389817458503,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      1.2017696110684103e-08,
+      0.0,
+      1.0284150340556629e-05,
+      0.0,
+      0.00017649409443341567,
+      -0.0008786797258682598,
+      0.00871051319329453,
+      -0.006108510773329939,
+      0.012799177446456245,
+      0.02540372085366101,
+      0.0061202246568943935,
+      0.005782073039163714,
+      -0.00032573388857629895,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      4.6090364178986136e-05,
+      -0.00018625377697293322,
+      0.0018402574023466017,
+      -0.0028694583032823347,
+      0.003249729685005616,
+      -0.0032546505923570497,
+      0.0028927525886110798,
+      -0.005727300326564687,
+      0.0009349924265612791,
+      -0.00029069423934959806,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      -0.00013796254756594706,
+      -7.113785778163368e-05,
+      -3.70039013071251e-05,
+      -9.26230222850333e-05,
+      -7.55348144625171e-05,
+      -5.890789481852012e-05,
+      -0.00016787611941031008,
+      -0.00013512182402120827,
+      -0.0007577573777222754,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      0.0,
+      7.374458268637783e-06,
+      0.0,
+      7.374458268637783e-06,
+      7.374458268637783e-06,
+      0.0,
+      7.374458268637783e-06,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      1.6454640544890612e-06,
+      1.6454640544890612e-06,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.3682437645980358,
+      -0.010313325093545838,
+      0.0009509826733591118,
+      8.731728723274532e-05,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.00021820594976462235,
+      0.00257055829045463,
+      -0.0127795602890544,
+      -0.05705253192342194,
+      0.25012256718258646,
+      0.012207198333313168,
+      0.0340313223723876,
+      0.0003576776007283744,
+      -0.0002845557347781907,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0006581156987357039,
+      0.0001351500860080824,
+      0.021441581178606544,
+      -0.009487259768838647,
+      0.023875626799357026,
+      0.018329471230646432,
+      0.03202330538363405,
+      -0.002402806268419791,
+      -0.00021251611687155453,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      7.374458268637783e-06,
+      0.00042403289373939767,
+      -0.0009267683930298048,
+      0.002547132301658598,
+      0.0018252150534381778,
+      0.0025447442599817994,
+      -0.0006139539204201418,
+      0.0040519500435168615,
+      -0.002119370745245054,
+      0.0006644491009208615,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0003314760556756545,
+      0.0003278655337955934,
+      0.00013156289898347628,
+      -2.5890394467765182e-05,
+      0.0005822073549505646,
+      3.0278685234777375e-05,
+      -0.0001386996202989576,
+      0.0005453186709654603,
+      0.00024046539821892854,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ]
+  ],
+  "n_field_periods": 3,
+  "n_periodicity": 1,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/creative_seed.json ADDED Viewed

	@@ -0,0 +1,351 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      1.0,
+      0.3178024006853376,
+      -0.00494453968429039,
+      -9.008828074894216e-05,
+      0.00034826523984284985,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.0284150340556629e-05,
+      0.0,
+      0.00022783626301464338,
+      -0.002504825752146429,
+      -0.0019791928302356574,
+      -0.05986009847084577,
+      0.2378930884212573,
+      0.07041809177925817,
+      -0.03405649158367229,
+      -0.001255290887707734,
+      0.00015598389817458503,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      5.120176961106841e-07,
+      0.0,
+      1.0284150340556629e-05,
+      0.0,
+      0.00017649409443341567,
+      -0.0008786797258682598,
+      0.00871051319329453,
+      -0.006108510773329939,
+      0.012799177446456245,
+      0.02540372085366101,
+      0.0061202246568943935,
+      0.005782073039163714,
+      -0.00032573388857629895,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      4.6090364178986136e-05,
+      -0.00018625377697293322,
+      0.0018402574023466017,
+      -0.0028694583032823347,
+      0.003249729685005616,
+      -0.0032546505923570497,
+      0.0028927525886110798,
+      -0.005727300326564687,
+      0.0009349924265612791,
+      -0.00029069423934959806,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      -0.00013796254756594706,
+      -7.113785778163368e-05,
+      -3.70039013071251e-05,
+      -9.26230222850333e-05,
+      -7.55348144625171e-05,
+      -5.890789481852012e-05,
+      -0.00016787611941031008,
+      -0.00013512182402120827,
+      -0.0007577573777222754,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      0.0,
+      7.374458268637783e-06,
+      0.0,
+      7.374458268637783e-06,
+      7.374458268637783e-06,
+      0.0,
+      7.374458268637783e-06,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      1.6454640544890612e-06,
+      1.6454640544890612e-06,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.3682437645980358,
+      -0.010313325093545838,
+      0.0009509826733591118,
+      8.731728723274532e-05,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.00021820594976462235,
+      0.00257055829045463,
+      -0.0127795602890544,
+      -0.05705253192342194,
+      0.25012256718258646,
+      0.012207198333313168,
+      0.0340313223723876,
+      0.0003576776007283744,
+      -0.0002845557347781907,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      0.0,
+      0.0006581156987357039,
+      0.0001351500860080824,
+      0.021441581178606544,
+      -0.009487259768838647,
+      0.023875626799357026,
+      0.018329471230646432,
+      0.03202330538363405,
+      -0.002402806268419791,
+      -0.00021251611687155453,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      1.6454640544890612e-06,
+      7.374458268637783e-06,
+      0.00042403289373939767,
+      -0.0009267683930298048,
+      0.002547132301658598,
+      0.0018252150534381778,
+      0.0025447442599817994,
+      -0.0006139539204201418,
+      0.0040519500435168615,
+      -0.002119370745245054,
+      0.0006644491009208615,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0003314760556756545,
+      0.0003278655337955934,
+      0.00013156289898347628,
+      -2.5890394467765182e-05,
+      0.0005822073549505646,
+      3.0278685234777375e-05,
+      -0.0001386996202989576,
+      0.0005453186709654603,
+      0.00024046539821892854,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ]
+  ],
+  "n_field_periods": 3,
+  "n_periodicity": 1,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/egodos_seed.json ADDED Viewed

	@@ -0,0 +1,120 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.9889786243438721,
+      0.29489704966545105,
+      -0.029949292540550232,
+      0.007402452640235424,
+      -0.0021754007320851088
+    ],
+    [
+      3.0306517146527767e-05,
+      -0.017258530482649803,
+      0.11244918406009674,
+      -0.00027375295758247375,
+      0.4030463397502899,
+      0.05457010865211487,
+      0.0050158146768808365,
+      -0.009017078205943108,
+      0.00023299557506106794
+    ],
+    [
+      -0.0035085747949779034,
+      -0.007740889210253954,
+      -0.019238369539380074,
+      -0.004338215570896864,
+      -0.01707017421722412,
+      -0.01595107652246952,
+      -0.008797697722911835,
+      -0.0027677465695887804,
+      -0.0003153726283926517
+    ],
+    [
+      0.0012443774612620473,
+      0.0018073361134156585,
+      -0.007023670244961977,
+      0.000234402425121516,
+      0.0017306806985288858,
+      0.003982230089604855,
+      -0.002272964920848608,
+      0.0021430065389722586,
+      -0.0004695240349974483
+    ],
+    [
+      0.0004951803712174296,
+      0.00010301961447112262,
+      0.0006218982161954045,
+      -3.61714992322959e-05,
+      0.000459781993413344,
+      -0.0011883215047419071,
+      0.0015523011097684503,
+      0.001801402191631496,
+      0.0007655859808437526
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.2114829421043396,
+      -0.04368766397237778,
+      0.011270688846707344,
+      -0.0033141719177365303
+    ],
+    [
+      -0.0012609786354005337,
+      -0.008882338181138039,
+      0.04093347489833832,
+      0.06044723838567734,
+      -0.40492475032806396,
+      -0.04713256657123566,
+      -0.00028245686553418636,
+      0.00870777852833271,
+      0.001710485783405602
+    ],
+    [
+      -0.0005412710597738624,
+      -0.009137776680290699,
+      -0.013082815334200859,
+      -0.01053211372345686,
+      -0.01276348065584898,
+      0.017182836309075356,
+      -0.012362353503704071,
+      -0.001533929375000298,
+      -0.0028038574382662773
+    ],
+    [
+      -0.0011634031543508172,
+      0.007165416143834591,
+      -0.014393662102520466,
+      0.0011076449882239103,
+      -0.006598849315196276,
+      0.006964890286326408,
+      -0.008261557668447495,
+      0.0032563884742558002,
+      -0.0006506771314889193
+    ],
+    [
+      -0.0008520428673364222,
+      -0.00014924361312296242,
+      -0.001169409602880478,
+      0.002478198613971472,
+      0.0025256099179387093,
+      -0.001493512187153101,
+      -0.0013979775831103325,
+      0.0012794585200026631,
+      -0.0007043574005365372
+    ]
+  ],
+  "r_sin": null,
+  "z_cos": null,
+  "n_field_periods": 3,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/egodos_sparse_rgroup_best.json ADDED Viewed

	@@ -0,0 +1,120 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.9889786243438721,
+      0.29489704966545105,
+      -0.024982345699897754,
+      0.007402452640235424,
+      -0.0021754007320851088
+    ],
+    [
+      0.000030306517146527767,
+      -0.017258530482649803,
+      0.11244918406009674,
+      -0.00027375295758247375,
+      0.3946223671947347,
+      0.06473962641387607,
+      0.0050158146768808365,
+      -0.009017078205943108,
+      0.00023299557506106794
+    ],
+    [
+      -0.0035085747949779034,
+      -0.007740889210253954,
+      -0.019238369539380074,
+      -0.004338215570896864,
+      -0.01707017421722412,
+      -0.01595107652246952,
+      -0.008797697722911835,
+      -0.0027677465695887804,
+      -0.0003153726283926517
+    ],
+    [
+      0.0012443774612620473,
+      0.0018073361134156585,
+      -0.007023670244961977,
+      0.000234402425121516,
+      0.0017306806985288858,
+      0.003982230089604855,
+      -0.002272964920848608,
+      0.0021430065389722586,
+      -0.0004695240349974483
+    ],
+    [
+      0.0004951803712174296,
+      0.00010301961447112262,
+      0.0006218982161954045,
+      -0.0000361714992322959,
+      0.000459781993413344,
+      -0.0011883215047419071,
+      0.0015523011097684503,
+      0.001801402191631496,
+      0.0007655859808437526
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.2114829421043396,
+      -0.04368766397237778,
+      0.011270688846707344,
+      -0.0033141719177365303
+    ],
+    [
+      -0.0012609786354005337,
+      -0.008882338181138039,
+      0.04093347489833832,
+      0.06044723838567734,
+      -0.40492475032806396,
+      -0.04713256657123566,
+      -0.00028245686553418636,
+      0.00870777852833271,
+      0.001710485783405602
+    ],
+    [
+      -0.0005412710597738624,
+      -0.009137776680290699,
+      -0.013082815334200859,
+      -0.01053211372345686,
+      -0.01276348065584898,
+      0.017182836309075356,
+      -0.012362353503704071,
+      -0.001533929375000298,
+      -0.0028038574382662773
+    ],
+    [
+      -0.0011634031543508172,
+      0.007165416143834591,
+      -0.014393662102520466,
+      0.0011076449882239103,
+      -0.006598849315196276,
+      0.006964890286326408,
+      -0.008261557668447495,
+      0.0032563884742558002,
+      -0.0006506771314889193
+    ],
+    [
+      -0.0008520428673364222,
+      -0.00014924361312296242,
+      -0.001169409602880478,
+      0.002478198613971472,
+      0.0025256099179387093,
+      -0.001493512187153101,
+      -0.0013979775831103325,
+      0.0012794585200026631,
+      -0.0007043574005365372
+    ]
+  ],
+  "r_sin": null,
+  "z_cos": null,
+  "n_field_periods": 3,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/manifest.json ADDED Viewed

	@@ -0,0 +1,81 @@

+{
+  "bundle": "p1_seed_transfer_2026-03-08",
+  "source_repo": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs",
+  "target_repo": "/Users/suhjungdae/code/fusion-design-lab",
+  "selection_principle": "small high-value P1 seed pack for reward design and policy initialization",
+  "entries": [
+    {
+      "file": "creative_best.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p04_best_r_cos_2_0_down.json",
+      "family": "creative",
+      "role": "best_endpoint",
+      "origin": "CreativeEngineer micro-perturbation winner family",
+      "score": 0.9701411603598098,
+      "feasibility": 0.009487821019544596,
+      "objective": 1.268729556761712
+    },
+    {
+      "file": "creative_seed.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p02_best_submission_seed.json",
+      "family": "creative",
+      "role": "exploitation_anchor",
+      "origin": "CreativeEngineer leaderboard seed",
+      "score": 0.9701409584443864,
+      "feasibility": 0.00949088322352376,
+      "objective": 1.2687313740005226
+    },
+    {
+      "file": "scadena_seed.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p01_scadena_seed.json",
+      "family": "scadena",
+      "role": "repair_anchor",
+      "origin": "scadena-pf leaderboard seed",
+      "score": 0.9694573991433482,
+      "feasibility": 0.0001722869358491049,
+      "objective": 1.2748834077098663
+    },
+    {
+      "file": "scadena_repaired_best.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_hf_repaired_scadena_cluster_20260308/s03_DAMXY_t099.json",
+      "family": "scadena",
+      "role": "repaired_diverse_feasible",
+      "origin": "raw HF near-P1 seed repaired along scadena corridor",
+      "score": 0.9696318182039995,
+      "feasibility": 0.008515139661142479,
+      "objective": 1.2733136361640056,
+      "blend_t": 0.99
+    },
+    {
+      "file": "samet_seed.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_seed_pack_20260308/02_samet_exact_feasible_distinct_SametKokoslocke_2026-02-20T16-31-36.961612.json",
+      "family": "samet",
+      "role": "diverse_feasible_anchor",
+      "origin": "SametKokoslocke exact-feasible distinct family",
+      "score": 0.7797358473578075,
+      "feasibility": 0.0,
+      "objective": 2.982377373779732
+    },
+    {
+      "file": "egodos_seed.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_seed_pack_20260308/03_egodos_near_feasible_repair_target_egodos_2026-02-15T19-23-28.679506.json",
+      "family": "egodos",
+      "role": "near_feasible_target",
+      "origin": "best non-scadena near-feasible repair source",
+      "score": 0.0,
+      "feasibility": 0.012140230868772806,
+      "objective": 2.1191483320378808
+    },
+    {
+      "file": "egodos_sparse_rgroup_best.json",
+      "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_frontier_20260308/egodos_rgroup_to_samet_t0p07475.json",
+      "family": "egodos_samet_bridge",
+      "role": "non_scadena_best_new_frontier",
+      "origin": "small sparse low-order r_cos move from egodos toward Samet",
+      "score": 0.8646431004091908,
+      "feasibility": 0.009999602734502733,
+      "objective": 2.2182120963172833,
+      "operator": "sparse_r_group_to_samet",
+      "threshold_t": 0.07475
+    }
+  ]
+}

assets/p1_seeds/samet_seed.json ADDED Viewed

	@@ -0,0 +1,82 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      1.0,
+      0.2998406753373,
+      0.03649815683272709,
+      -0.0023934004574476322
+    ],
+    [
+      0.004923194200362755,
+      0.0019538314697309803,
+      -0.030393426440835376,
+      0.2903510547261398,
+      0.19061716901012418,
+      0.010663078794311078,
+      0.0004915361883997499
+    ],
+    [
+      0.004461085403616445,
+      0.0018480423209113024,
+      0.008322853001526121,
+      -0.0016888734032345434,
+      0.029738966870065234,
+      -0.017367616857085766,
+      0.005096201721111707
+    ],
+    [
+      0.003660255000690111,
+      -0.0008161620102724097,
+      0.002659988210077753,
+      -0.0011090243735975415,
+      0.0014976808264200786,
+      0.0006788200984889833,
+      0.003049742178214439
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.18809278786125713,
+      -0.004276023017828923,
+      0.004322589614437422
+    ],
+    [
+      0.0034290834645214264,
+      -0.0009718600955131964,
+      -0.04584371642254841,
+      0.33895425254520006,
+      -0.11542379428754948,
+      -0.006266355167467825,
+      0.004540925438226553
+    ],
+    [
+      0.002276104811351958,
+      0.007696920953052154,
+      -0.00560829420301698,
+      0.01756845689719225,
+      0.019251214823981313,
+      0.02607207531935209,
+      0.0012839605524015184
+    ],
+    [
+      0.0005920371380011517,
+      0.003256903574999701,
+      0.0007021997737304861,
+      0.0034139505822126832,
+      0.0017613753357548154,
+      -0.0013703743967947743,
+      -0.0017751147642294824
+    ]
+  ],
+  "r_sin": null,
+  "z_cos": null,
+  "n_field_periods": 3,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/scadena_repaired_best.json ADDED Viewed

	@@ -0,0 +1,120 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.9995558823937495,
+      0.3145330963488296,
+      -0.004852928784117629,
+      -0.00047853952359897824,
+      0.00014028553286944676
+    ],
+    [
+      0.000021060845809522364,
+      -0.002388888945772252,
+      -0.0010920699762301598,
+      -0.05996056988781852,
+      0.23806626206705117,
+      0.07010624141600151,
+      -0.033894859150088635,
+      -0.0012720520309685694,
+      -0.000050072995382135404
+    ],
+    [
+      -0.00002976790108589305,
+      -0.0010127528261545883,
+      0.008465486468268894,
+      -0.006310050622106053,
+      0.0128452270223199,
+      0.02530181645810451,
+      0.006091523234460731,
+      0.00565240857271131,
+      -0.00032247654969053596
+    ],
+    [
+      -0.0003888882937781785,
+      0.0018218548283231357,
+      -0.002840763720249511,
+      0.0032172323881555598,
+      -0.0033687428407382773,
+      0.002863825062724969,
+      -0.005816666077603838,
+      0.0009256425022956663,
+      -0.0002877872969561021
+    ],
+    [
+      -0.00013658292209028756,
+      -0.00007042647920381733,
+      -0.00006935339102604978,
+      -0.0001244163207941789,
+      -0.00007477946631789192,
+      -0.00005831881587033492,
+      -0.00016619735821620698,
+      -0.00016649013451299212,
+      -0.0007501798039450527
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.36736678272076023,
+      -0.009828810099230176,
+      0.0004982425415133734,
+      0.00008644411436041787
+    ],
+    [
+      -0.00021602389026697613,
+      0.002847751999147193,
+      -0.01208084403849006,
+      -0.05624644212913916,
+      0.24528675315533016,
+      0.01182189736817721,
+      0.0336478912876403,
+      0.0005027013352311548,
+      -0.00028171017743040865
+    ],
+    [
+      0.0006188150130163509,
+      0.000015920855717569947,
+      0.020426156577576828,
+      -0.009570454955266509,
+      0.023644042506073312,
+      0.017426508469963166,
+      0.031709641217604306,
+      -0.0023390750650408433,
+      -0.00024311048443483493
+    ],
+    [
+      0.0003870730360700077,
+      -0.0009175007090995068,
+      0.002521660978642012,
+      0.001806962902903796,
+      0.0025192968173819814,
+      -0.0006078143812159401,
+      0.004011430543081693,
+      -0.0020981770377926034,
+      0.0006578046099116529
+    ],
+    [
+      0.00032816129511889795,
+      0.00032458687845763744,
+      0.00013024726999364151,
+      -0.00002563149052308753,
+      0.0005763852814010589,
+      0.0000299758983824296,
+      -0.00013731262409596802,
+      0.0005398654842558058,
+      0.00023806074423673925
+    ]
+  ],
+  "r_sin": null,
+  "z_cos": null,
+  "n_field_periods": 3,
+  "is_stellarator_symmetric": true
+}

assets/p1_seeds/scadena_seed.json ADDED Viewed

	@@ -0,0 +1,120 @@

+{
+  "r_cos": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      1.0,
+      0.3178024006853376,
+      -0.00494453968429039,
+      -0.0010158379922691344,
+      0.00014170255845398662
+    ],
+    [
+      2.1273581625780166e-05,
+      -0.002504825752146429,
+      -0.0019791928302356574,
+      -0.05986009847084577,
+      0.2378930884212573,
+      0.07041809177925817,
+      -0.03405649158367229,
+      -0.0012552908877077342,
+      -5.0578783214278185e-05
+    ],
+    [
+      -3.0068586955447527e-05,
+      -0.0008786797258682598,
+      0.00871051319329453,
+      -0.006108510773329939,
+      0.012799177446456245,
+      0.02540372085366101,
+      0.006120224656894394,
+      0.005782073039163714,
+      -0.00032573388857629895
+    ],
+    [
+      -0.0003928164583617964,
+      0.0018402574023466017,
+      -0.0028694583032823347,
+      0.003249729685005616,
+      -0.00340277054620028,
+      0.0028927525886110798,
+      -0.0058754202804079175,
+      0.0009349924265612791,
+      -0.00029069423934959806
+    ],
+    [
+      -0.00013796254756594703,
+      -7.113785778163367e-05,
+      -7.005393032934321e-05,
+      -0.00012567305130725142,
+      -7.55348144625171e-05,
+      -5.890789481852012e-05,
+      -0.00016787611941031008,
+      -0.0001681718530434264,
+      -0.0007577573777222754
+    ]
+  ],
+  "z_sin": [
+    [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      -0.3682437645980358,
+      -0.010313325093545838,
+      0.0008028627195158811,
+      8.731728723274532e-05
+    ],
+    [
+      -0.00021820594976462235,
+      0.00257055829045463,
+      -0.0127795602890544,
+      -0.05705253192342194,
+      0.25012256718258646,
+      0.012207198333313168,
+      0.0340313223723876,
+      0.0003576776007283744,
+      -0.0002845557347781906
+    ],
+    [
+      0.0006250656697134858,
+      0.0001351500860080824,
+      0.02077775360036836,
+      -0.009487259768838647,
+      0.023875626799357026,
+      0.017665643652408247,
+      0.03202330538363405,
+      -0.002402806268419791,
+      -0.00024556614589377266
+    ],
+    [
+      0.0003909828647171795,
+      -0.0009267683930298048,
+      0.002547132301658598,
+      0.0018252150534381778,
+      0.0025447442599817994,
+      -0.0006139539204201416,
+      0.0040519500435168615,
+      -0.002119370745245054,
+      0.0006644491009208615
+    ],
+    [
+      0.0003314760556756545,
+      0.0003278655337955934,
+      0.00013156289898347628,
+      -2.5890394467765182e-05,
+      0.0005822073549505646,
+      3.0278685234777375e-05,
+      -0.0001386996202989576,
+      0.0005453186709654603,
+      0.00024046539821892854
+    ]
+  ],
+  "r_sin": null,
+  "z_cos": null,
+  "n_field_periods": 3,
+  "is_stellarator_symmetric": true
+}

baselines/README.md CHANGED Viewed

@@ -33,7 +33,7 @@ This keeps the baseline on the real verifier path instead of relying on the olde
 - heuristic mean reward: `+5.2825`
 - random mean final `P1` score: `0.000000`
 - heuristic mean final `P1` score: `0.291951`
-- feasible high-fidelity finals: `0/5` random vs `5/5` heuristic
 - heuristic wins: `5/5`
 The first baseline milestone is:

 - heuristic mean reward: `+5.2825`
 - random mean final `P1` score: `0.000000`
 - heuristic mean final `P1` score: `0.291951`
+- feasible submitted finals: `0/5` random vs `5/5` heuristic
 - heuristic wins: `5/5`
 The first baseline milestone is:

baselines/compare.py CHANGED Viewed

@@ -21,13 +21,13 @@ def main(n_episodes: int = 20) -> None:
     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
-        _require_submit_fidelity(rt[-1], baseline_name="random")
         random_rewards.append(rr)
         random_final_scores.append(rt[-1]["score"])
         random_feasible.append(1 if rt[-1]["constraints_satisfied"] else 0)
         hr, ht = heuristic_episode(env, seed=i)
-        _require_submit_fidelity(ht[-1], baseline_name="heuristic")
         heuristic_rewards.append(hr)
         heuristic_final_scores.append(ht[-1]["score"])
         heuristic_feasible.append(1 if ht[-1]["constraints_satisfied"] else 0)
@@ -51,12 +51,14 @@ def main(n_episodes: int = 20) -> None:
     print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
-def _require_submit_fidelity(final_step: dict[str, object], *, baseline_name: str) -> None:
-    fidelity = final_step["evaluation_fidelity"]
-    if fidelity != "high":
         raise ValueError(
-            f"{baseline_name} baseline ended on {fidelity!r} instead of high-fidelity submit."
         )
 if __name__ == "__main__":

     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
+        _require_successful_submit(rt[-1], baseline_name="random")
         random_rewards.append(rr)
         random_final_scores.append(rt[-1]["score"])
         random_feasible.append(1 if rt[-1]["constraints_satisfied"] else 0)
         hr, ht = heuristic_episode(env, seed=i)
+        _require_successful_submit(ht[-1], baseline_name="heuristic")
         heuristic_rewards.append(hr)
         heuristic_final_scores.append(ht[-1]["score"])
         heuristic_feasible.append(1 if ht[-1]["constraints_satisfied"] else 0)
     print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
+def _require_successful_submit(final_step: dict[str, object], *, baseline_name: str) -> None:
+    action = final_step.get("action")
+    if action != "submit":
         raise ValueError(
+            f"{baseline_name} baseline ended on {action!r} instead of an explicit submit."
         )
+    if bool(final_step.get("evaluation_failed")):
+        raise ValueError(f"{baseline_name} baseline submit ended in evaluation failure.")
 if __name__ == "__main__":

baselines/fixture_high_fidelity_pairs.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "timestamp_utc": "2026-03-08T12:05:24.982605+00:00",
   "n_field_periods": 3,
   "fixture_count": 3,
   "pass_count": 3,

 {
+  "timestamp_utc": "2026-03-08T15:21:04.110925+00:00",
   "n_field_periods": 3,
   "fixture_count": 3,
   "pass_count": 3,

baselines/heuristic_agent.py CHANGED Viewed

@@ -50,7 +50,7 @@ def heuristic_episode(
                 "average_triangularity": obs.average_triangularity,
                 "edge_iota_over_nfp": obs.edge_iota_over_nfp,
                 "reward": obs.reward,
-                "failure": obs.evaluation_failed,
             }
         )

                 "average_triangularity": obs.average_triangularity,
                 "edge_iota_over_nfp": obs.edge_iota_over_nfp,
                 "reward": obs.reward,
+                "evaluation_failed": obs.evaluation_failed,
             }
         )

baselines/high_fidelity_validation.py CHANGED Viewed

@@ -368,7 +368,7 @@ def _run_submit_trace(
 ) -> dict[str, Any]:
     env = StellaratorEnvironment()
     obs = env.reset(seed=seed)
-    initial_state = env.state
     actions = _parse_submit_sequence(action_sequence)
     trace: list[dict[str, Any]] = [
@@ -387,7 +387,7 @@ def _run_submit_trace(
             "budget_remaining": obs.budget_remaining,
             "evaluation_fidelity": obs.evaluation_fidelity,
             "done": obs.done,
-            "params": initial_state.current_params.model_dump(),
         }
     ]
@@ -442,8 +442,6 @@ def _run_submit_trace(
         "steps": trace,
         "final_best_low_fidelity_score": obs.best_low_fidelity_score,
         "final_best_low_fidelity_feasibility": obs.best_low_fidelity_feasibility,
-        "final_best_high_fidelity_score": obs.best_high_fidelity_score,
-        "final_best_high_fidelity_feasibility": obs.best_high_fidelity_feasibility,
         "final_diagnostics_text": obs.diagnostics_text,
     }
     _write_json(payload, trace_output)

 ) -> dict[str, Any]:
     env = StellaratorEnvironment()
     obs = env.reset(seed=seed)
+    reset_params = env.state.current_params.model_dump()
     actions = _parse_submit_sequence(action_sequence)
     trace: list[dict[str, Any]] = [
             "budget_remaining": obs.budget_remaining,
             "evaluation_fidelity": obs.evaluation_fidelity,
             "done": obs.done,
+            "params": reset_params,
         }
     ]
         "steps": trace,
         "final_best_low_fidelity_score": obs.best_low_fidelity_score,
         "final_best_low_fidelity_feasibility": obs.best_low_fidelity_feasibility,
         "final_diagnostics_text": obs.diagnostics_text,
     }
     _write_json(payload, trace_output)

baselines/replay_playtest.py CHANGED Viewed

@@ -180,7 +180,7 @@ EPISODE_5 = (
         _run("rotational_transform", "increase", "medium"),  # rt 1.5→1.6 (setup)
         _run("triangularity_scale", "increase", "medium"),  # tri 0.55→0.60 → cross feasibility
         _run("elongation", "decrease", "small"),  # feasible-side objective move
-        _submit(),  # explicit high-fidelity submit from feasible state
     ],
 )

         _run("rotational_transform", "increase", "medium"),  # rt 1.5→1.6 (setup)
         _run("triangularity_scale", "increase", "medium"),  # tri 0.55→0.60 → cross feasibility
         _run("elongation", "decrease", "small"),  # feasible-side objective move
+        _submit(),  # explicit terminal submit from feasible state
     ],
 )

baselines/submit_side_trace.json CHANGED Viewed

@@ -1,14 +1,14 @@
 {
   "trace_label": "submit_side_manual",
   "trace_profile": "run:rotational_transform:increase:medium,run:triangularity_scale:increase:medium,run:elongation:decrease:small,submit",
-  "timestamp_utc": "2026-03-08T07:07:43.478814+00:00",
   "n_field_periods": 3,
   "seed": 0,
-  "total_reward": 5.3296,
-  "final_score": 0.29605869964467535,
   "final_feasibility": 0.0008652388718514148,
   "final_constraints_satisfied": true,
-  "final_evaluation_fidelity": "high",
   "final_evaluation_failed": false,
   "steps": [
     {
@@ -37,7 +37,7 @@
       "step": 1,
       "intent": "run",
       "action": "rotational_transform increase medium",
-      "reward": -0.1,
       "score": 0.0,
       "feasibility": 0.05065283822502309,
       "constraints_satisfied": false,
@@ -53,7 +53,7 @@
       "step": 2,
       "intent": "run",
       "action": "triangularity_scale increase medium",
-      "reward": 3.1533,
       "score": 0.29165951078326,
       "feasibility": 0.0,
       "constraints_satisfied": true,
@@ -69,7 +69,7 @@
       "step": 3,
       "intent": "run",
       "action": "elongation decrease small",
-      "reward": 0.2665,
       "score": 0.2957311862720885,
       "feasibility": 0.0008652388718514148,
       "constraints_satisfied": true,
@@ -85,22 +85,20 @@
       "step": 4,
       "intent": "submit",
       "action": "submit",
-      "reward": 2.0098,
-      "score": 0.29605869964467535,
       "feasibility": 0.0008652388718514148,
       "constraints_satisfied": true,
       "feasibility_delta": 0.0,
-      "score_delta": 0.00032751337258685176,
-      "max_elongation": 7.335471703197922,
       "p1_feasibility": 0.0008652388718514148,
-      "budget_remaining": 3,
-      "evaluation_fidelity": "high",
       "done": true
     }
   ],
   "final_best_low_fidelity_score": 0.2957311862720885,
   "final_best_low_fidelity_feasibility": 0.0008652388718514148,
-  "final_best_high_fidelity_score": 0.29605869964467535,
-  "final_best_high_fidelity_feasibility": 0.0008652388718514148,
-  "final_diagnostics_text": "Submitted current_high_fidelity_score=0.296059, best_high_fidelity_score=0.296059, best_high_fidelity_feasibility=0.000865, constraints=SATISFIED.\n\nevaluation_fidelity=high\nevaluation_status=OK\nmax_elongation=7.3355\naspect_ratio=3.2897  (<= 4.0)\naverage_triangularity=-0.4996  (<= -0.5)\nedge_iota_over_nfp=0.3030  (>= 0.3)\nfeasibility=0.000865\nbest_low_fidelity_score=0.295731\nbest_low_fidelity_feasibility=0.000865\nbest_high_fidelity_score=0.296059\nbest_high_fidelity_feasibility=0.000865\nvacuum_well=-0.8079\nconstraints=SATISFIED\nstep=4  |  budget=3/6"
 }

 {
   "trace_label": "submit_side_manual",
   "trace_profile": "run:rotational_transform:increase:medium,run:triangularity_scale:increase:medium,run:elongation:decrease:small,submit",
+  "timestamp_utc": "2026-03-08T15:15:56.795168+00:00",
   "n_field_periods": 3,
   "seed": 0,
+  "total_reward": 6.1653,
+  "final_score": 0.2957311862720885,
   "final_feasibility": 0.0008652388718514148,
   "final_constraints_satisfied": true,
+  "final_evaluation_fidelity": "low",
   "final_evaluation_failed": false,
   "steps": [
     {
       "step": 1,
       "intent": "run",
       "action": "rotational_transform increase medium",
+      "reward": -0.0688,
       "score": 0.0,
       "feasibility": 0.05065283822502309,
       "constraints_satisfied": false,
       "step": 2,
       "intent": "run",
       "action": "triangularity_scale increase medium",
+      "reward": 4.1026,
       "score": 0.29165951078326,
       "feasibility": 0.0,
       "constraints_satisfied": true,
       "step": 3,
       "intent": "run",
       "action": "elongation decrease small",
+      "reward": 0.3195,
       "score": 0.2957311862720885,
       "feasibility": 0.0008652388718514148,
       "constraints_satisfied": true,
       "step": 4,
       "intent": "submit",
       "action": "submit",
+      "reward": 1.812,
+      "score": 0.2957311862720885,
       "feasibility": 0.0008652388718514148,
       "constraints_satisfied": true,
       "feasibility_delta": 0.0,
+      "score_delta": 0.0,
+      "max_elongation": 7.338419323551204,
       "p1_feasibility": 0.0008652388718514148,
+      "budget_remaining": 2,
+      "evaluation_fidelity": "low",
       "done": true
     }
   ],
   "final_best_low_fidelity_score": 0.2957311862720885,
   "final_best_low_fidelity_feasibility": 0.0008652388718514148,
+  "final_diagnostics_text": "Submitted current_score=0.295731, best_score=0.295731, best_feasibility=0.000865, constraints=SATISFIED.\n\nevaluation_fidelity=low\nevaluation_status=OK\nmax_elongation=7.3384\naspect_ratio=3.2897  (<= 4.0)\naverage_triangularity=-0.4996  (<= -0.5)\nedge_iota_over_nfp=0.3030  (abs(.) >= 0.3)\nfeasibility=0.000865\naspect_ratio_violation=0.000000\ntriangularity_violation=0.000865\niota_violation=0.000000\ndominant_constraint=average_triangularity\nbest_low_fidelity_score=0.295731\nbest_low_fidelity_feasibility=0.000865\nno_progress_steps=0\nvacuum_well=-0.8067\nconstraints=SATISFIED\nstep=4  |  budget=2/6\nreward_total=+1.8120\nreward_terms=terminal_improvement_bonus=+1.4787, terminal_budget_bonus=+0.3333\naction_clamped=False\naction_no_op=False\naction_repeat_state=False\nepisode_total_reward=+6.1653"
 }

baselines/sweep_results/measured_sweep_20260308T045600Z.json DELETED Viewed

@@ -1,1308 +0,0 @@
-{
-  "analysis": {
-    "total": 81,
-    "evaluated": 63,
-    "crashed": 18,
-    "feasible": 0,
-    "crash_rate": 0.2222222222222222,
-    "feasibility_rate": 0.0
-  },
-  "results": [
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2992442990666656,
-      "p1_score": 0.0,
-      "max_elongation": 3.16130576121601,
-      "aspect_ratio_out": 2.9974947988779532,
-      "average_triangularity": -0.3772870380738901,
-      "edge_iota_over_nfp": 0.2102267102800003,
-      "vacuum_well": -0.7692596885396071
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.4090894053777488,
-      "p1_score": 0.0,
-      "max_elongation": 4.152094100549045,
-      "aspect_ratio_out": 2.921555348457181,
-      "average_triangularity": -0.47193262838245775,
-      "edge_iota_over_nfp": 0.17727317838667536,
-      "vacuum_well": -0.9367956816738985
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.5042302640839985,
-      "p1_score": 0.0,
-      "max_elongation": 5.535803883650764,
-      "aspect_ratio_out": 2.8456158980364137,
-      "average_triangularity": -0.5427662288172865,
-      "edge_iota_over_nfp": 0.14873092077480043,
-      "vacuum_well": -1.1406175925392996
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.24542592385222362,
-      "p1_score": 0.0,
-      "max_elongation": 4.121228842521552,
-      "aspect_ratio_out": 2.997494798877953,
-      "average_triangularity": -0.3772870380738882,
-      "edge_iota_over_nfp": 0.3164284175757522,
-      "vacuum_well": -0.8497809691028027
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.12086983493481807,
-      "p1_score": 0.0,
-      "max_elongation": 5.454537868016638,
-      "aspect_ratio_out": 2.921555348457181,
-      "average_triangularity": -0.4719326283824573,
-      "edge_iota_over_nfp": 0.26373904951955457,
-      "vacuum_well": -1.022090883293349
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.28449084880096587,
-      "p1_score": 0.0,
-      "max_elongation": 7.3486347080873395,
-      "aspect_ratio_out": 2.845615898036414,
-      "average_triangularity": -0.5427662288172859,
-      "edge_iota_over_nfp": 0.21465274535971024,
-      "vacuum_well": -1.2227198660107412
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.24542592385222062,
-      "p1_score": 0.0,
-      "max_elongation": 5.506917831126787,
-      "aspect_ratio_out": 2.9974947988779492,
-      "average_triangularity": -0.3772870380738897,
-      "edge_iota_over_nfp": 0.4111851229739778,
-      "vacuum_well": -0.8487582268420396
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.3756331947507288,
-      "p1_score": 0.0,
-      "max_elongation": 3.415282869637771,
-      "aspect_ratio_out": 2.9873706820500727,
-      "average_triangularity": -0.38276544931263445,
-      "edge_iota_over_nfp": 0.18731004157478134,
-      "vacuum_well": -0.7354702188161674
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.4698235036344714,
-      "p1_score": 0.0,
-      "max_elongation": 4.397190759490295,
-      "aspect_ratio_out": 2.907634687818849,
-      "average_triangularity": -0.47640349211453187,
-      "edge_iota_over_nfp": 0.15905294890965857,
-      "vacuum_well": -0.8791543482410266
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.5428078225003115,
-      "p1_score": 0.0,
-      "max_elongation": 5.745675404632478,
-      "aspect_ratio_out": 2.827898693587622,
-      "average_triangularity": -0.5463927121513807,
-      "edge_iota_over_nfp": 0.13715765324990656,
-      "vacuum_well": -1.0487526494457182
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137473032,
-      "p1_score": 0.0,
-      "max_elongation": 4.409780538549243,
-      "aspect_ratio_out": 2.9873706820500683,
-      "average_triangularity": -0.38276544931263484,
-      "edge_iota_over_nfp": 0.2846591548163929,
-      "vacuum_well": -0.7976954392526402
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.19785195828765914,
-      "p1_score": 0.0,
-      "max_elongation": 5.717017037692497,
-      "aspect_ratio_out": 2.907634687818846,
-      "average_triangularity": -0.47640349211453215,
-      "edge_iota_over_nfp": 0.24064441251370225,
-      "vacuum_well": -0.9370717167513004
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.31745507935684175,
-      "p1_score": 0.0,
-      "max_elongation": 7.524424461976794,
-      "aspect_ratio_out": 2.827898693587622,
-      "average_triangularity": -0.5463927121513805,
-      "edge_iota_over_nfp": 0.20476347619294746,
-      "vacuum_well": -1.0979141176158662
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2344691013747291,
-      "p1_score": 0.0,
-      "max_elongation": 5.828745051952865,
-      "aspect_ratio_out": 2.98737068205007,
-      "average_triangularity": -0.38276544931263545,
-      "edge_iota_over_nfp": 0.37697102463283866,
-      "vacuum_well": -0.7940349587136619
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.41572876893682525,
-      "p1_score": 0.0,
-      "max_elongation": 3.643871627883724,
-      "aspect_ratio_out": 2.9727492396200192,
-      "average_triangularity": -0.3900787346843175,
-      "edge_iota_over_nfp": 0.1752813693189524,
-      "vacuum_well": -0.7062026867261749
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.511263141576597,
-      "p1_score": 0.0,
-      "max_elongation": 4.6308076495645185,
-      "aspect_ratio_out": 2.8875302044775286,
-      "average_triangularity": -0.4823961090018443,
-      "edge_iota_over_nfp": 0.1466210575270209,
-      "vacuum_well": -0.8495335820014714
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.575134593927855,
-      "p1_score": 0.0,
-      "max_elongation": 5.983792904658967,
-      "aspect_ratio_out": 2.802311169335037,
-      "average_triangularity": -0.5512332332730122,
-      "edge_iota_over_nfp": 0.12745962182164347,
-      "vacuum_well": -1.0099648537211192
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2198425306313636,
-      "p1_score": 0.0,
-      "max_elongation": 4.658846779436034,
-      "aspect_ratio_out": 2.9727492396200206,
-      "average_triangularity": -0.3900787346843182,
-      "edge_iota_over_nfp": 0.260614756973633,
-      "vacuum_well": -0.7619073642576293
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2548347634174995,
-      "p1_score": 0.0,
-      "max_elongation": 5.96315599581871,
-      "aspect_ratio_out": 2.8875302044775286,
-      "average_triangularity": -0.48239610900184327,
-      "edge_iota_over_nfp": 0.22354957097475014,
-      "vacuum_well": -0.8874237224309117
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.32546519894675746,
-      "p1_score": 0.0,
-      "max_elongation": 7.752053893932858,
-      "aspect_ratio_out": 2.8023111693350367,
-      "average_triangularity": -0.5512332332730115,
-      "edge_iota_over_nfp": 0.20236044031597275,
-      "vacuum_well": -1.025387025277758
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.21984253063136272,
-      "p1_score": 0.0,
-      "max_elongation": 6.09849446743415,
-      "aspect_ratio_out": 2.9727492396200192,
-      "average_triangularity": -0.39007873468431864,
-      "edge_iota_over_nfp": 0.34816514339419125,
-      "vacuum_well": -0.7642647530236962
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.2,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2454259238522214,
-      "p1_score": 0.0,
-      "max_elongation": 3.3941794666037,
-      "aspect_ratio_out": 3.297494798877951,
-      "average_triangularity": -0.3772870380738893,
-      "edge_iota_over_nfp": 0.25020340071179015,
-      "vacuum_well": -0.664703585980881
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.29748701762857843,
-      "p1_score": 0.0,
-      "max_elongation": 4.453965370200011,
-      "aspect_ratio_out": 3.221555348457185,
-      "average_triangularity": -0.47193262838245753,
-      "edge_iota_over_nfp": 0.21075389471142647,
-      "vacuum_well": -0.7999402027558562
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.41176221731566515,
-      "p1_score": 0.0,
-      "max_elongation": 5.935449908790214,
-      "aspect_ratio_out": 3.1456158980364113,
-      "average_triangularity": -0.5427662288172873,
-      "edge_iota_over_nfp": 0.17647133480530044,
-      "vacuum_well": -0.9605409707647019
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2454259238522163,
-      "p1_score": 0.0,
-      "max_elongation": 4.576580280082074,
-      "aspect_ratio_out": 3.2974947988779504,
-      "average_triangularity": -0.37728703807389186,
-      "edge_iota_over_nfp": 0.36414107926306327,
-      "vacuum_well": -0.7128435653443117
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.05613474323508394,
-      "p1_score": 0.0,
-      "max_elongation": 6.047857576381886,
-      "aspect_ratio_out": 3.2215553484571795,
-      "average_triangularity": -0.47193262838245803,
-      "edge_iota_over_nfp": 0.3054435838875495,
-      "vacuum_well": -0.8521401674735174
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.1656236239017676,
-      "p1_score": 0.0,
-      "max_elongation": 8.134786960902295,
-      "aspect_ratio_out": 3.1456158980364153,
-      "average_triangularity": -0.5427662288172861,
-      "edge_iota_over_nfp": 0.2503129128294697,
-      "vacuum_well": -1.0097109038318997
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2454259238522143,
-      "p1_score": 0.0,
-      "max_elongation": 6.330394186717037,
-      "aspect_ratio_out": 3.297494798877949,
-      "average_triangularity": -0.37728703807389286,
-      "edge_iota_over_nfp": 0.46201189807837234,
-      "vacuum_well": -0.6837429974221736
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2552162867390878,
-      "p1_score": 0.0,
-      "max_elongation": 3.645500992042136,
-      "aspect_ratio_out": 3.287370682050072,
-      "average_triangularity": -0.3827654493126362,
-      "edge_iota_over_nfp": 0.22343511397827365,
-      "vacuum_well": -0.6309996936433193
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.3652585910461875,
-      "p1_score": 0.0,
-      "max_elongation": 4.683962785437232,
-      "aspect_ratio_out": 3.2076346878188455,
-      "average_triangularity": -0.47640349211453326,
-      "edge_iota_over_nfp": 0.19042242268614373,
-      "vacuum_well": -0.7435543314082094
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.45145912595142856,
-      "p1_score": 0.0,
-      "max_elongation": 6.1087362987510785,
-      "aspect_ratio_out": 3.1278986935876247,
-      "average_triangularity": -0.5463927121513825,
-      "edge_iota_over_nfp": 0.16456226221457143,
-      "vacuum_well": -0.8743755327281046
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137472687,
-      "p1_score": 0.0,
-      "max_elongation": 4.869439306707605,
-      "aspect_ratio_out": 3.287370682050072,
-      "average_triangularity": -0.38276544931263656,
-      "edge_iota_over_nfp": 0.3311741122924938,
-      "vacuum_well": -0.6687744150602308
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.05509465022700962,
-      "p1_score": 0.0,
-      "max_elongation": 6.294387064409056,
-      "aspect_ratio_out": 3.2076346878188478,
-      "average_triangularity": -0.47640349211453337,
-      "edge_iota_over_nfp": 0.2834716049318971,
-      "vacuum_well": -0.7817459617243165
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.18795354892189217,
-      "p1_score": 0.0,
-      "max_elongation": 8.259847119894143,
-      "aspect_ratio_out": 3.127898693587626,
-      "average_triangularity": -0.5463927121513834,
-      "edge_iota_over_nfp": 0.24361393532343234,
-      "vacuum_well": -0.9101645505111563
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137472798,
-      "p1_score": 0.0,
-      "max_elongation": 6.666881773484368,
-      "aspect_ratio_out": 3.2873706820500743,
-      "average_triangularity": -0.382765449312636,
-      "edge_iota_over_nfp": 0.4272117184895036,
-      "vacuum_well": -0.6343193410902149
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.31238226470825553,
-      "p1_score": 0.0,
-      "max_elongation": 3.8692530187501073,
-      "aspect_ratio_out": 3.272749239620019,
-      "average_triangularity": -0.390078734684318,
-      "edge_iota_over_nfp": 0.20628532058752333,
-      "vacuum_well": -0.6043041164364112
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.4126167583963611,
-      "p1_score": 0.0,
-      "max_elongation": 4.903513329259712,
-      "aspect_ratio_out": 3.1875302044775253,
-      "average_triangularity": -0.482396109001844,
-      "edge_iota_over_nfp": 0.17621497248109166,
-      "vacuum_well": -0.7113384768069663
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.4733786617465044,
-      "p1_score": 0.0,
-      "max_elongation": 6.318251319708043,
-      "aspect_ratio_out": 3.1023111693350365,
-      "average_triangularity": -0.551233233273011,
-      "edge_iota_over_nfp": 0.15798640147604867,
-      "vacuum_well": -0.8271583951674183
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.21984253063136583,
-      "p1_score": 0.0,
-      "max_elongation": 5.118240837608963,
-      "aspect_ratio_out": 3.272749239620021,
-      "average_triangularity": -0.3900787346843171,
-      "edge_iota_over_nfp": 0.30467031557444413,
-      "vacuum_well": -0.6428347405255894
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.1135333246675809,
-      "p1_score": 0.0,
-      "max_elongation": 6.52496520982626,
-      "aspect_ratio_out": 3.1875302044775293,
-      "average_triangularity": -0.4823961090018447,
-      "edge_iota_over_nfp": 0.2659400025997257,
-      "vacuum_well": -0.7422184942989336
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.19066188850017865,
-      "p1_score": 0.0,
-      "max_elongation": 8.453464204422819,
-      "aspect_ratio_out": 3.102311169335036,
-      "average_triangularity": -0.5512332332730117,
-      "edge_iota_over_nfp": 0.2428014334499464,
-      "vacuum_well": -0.8527496798204878
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.21984253063136494,
-      "p1_score": 0.0,
-      "max_elongation": 6.9463560702338905,
-      "aspect_ratio_out": 3.272749239620025,
-      "average_triangularity": -0.39007873468431753,
-      "edge_iota_over_nfp": 0.3976618109725794,
-      "vacuum_well": -0.6148108774395119
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.5,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.24542592385222006,
-      "p1_score": 0.0,
-      "max_elongation": 3.655092009945489,
-      "aspect_ratio_out": 3.597494798877952,
-      "average_triangularity": -0.37728703807388997,
-      "edge_iota_over_nfp": 0.2893199762541339,
-      "vacuum_well": -0.5782263807621896
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.18656815973467256,
-      "p1_score": 0.0,
-      "max_elongation": 4.791820749012815,
-      "aspect_ratio_out": 3.5215553484571793,
-      "average_triangularity": -0.47193262838245903,
-      "edge_iota_over_nfp": 0.24402955207959823,
-      "vacuum_well": -0.6901925400158998
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.31951499196875227,
-      "p1_score": 0.0,
-      "max_elongation": 6.382367050051349,
-      "aspect_ratio_out": 3.4456158980364155,
-      "average_triangularity": -0.5427662288172861,
-      "edge_iota_over_nfp": 0.20414550240937432,
-      "vacuum_well": -0.8206695782389528
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.24542592385221906,
-      "p1_score": 0.0,
-      "max_elongation": 5.094101658298537,
-      "aspect_ratio_out": 3.5974947988779453,
-      "average_triangularity": -0.37728703807389047,
-      "edge_iota_over_nfp": 0.40945734796781447,
-      "vacuum_well": -0.5981804595302792
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.05613474323508627,
-      "p1_score": 0.0,
-      "max_elongation": 6.721805873813851,
-      "aspect_ratio_out": 3.5215553484571824,
-      "average_triangularity": -0.47193262838245686,
-      "edge_iota_over_nfp": 0.34411106640849015,
-      "vacuum_well": -0.7119763601489589
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.055638380310600866,
-      "p1_score": 0.0,
-      "max_elongation": 9.030344748230837,
-      "aspect_ratio_out": 3.445615898036414,
-      "average_triangularity": -0.5427662288172874,
-      "edge_iota_over_nfp": 0.28330848590681973,
-      "vacuum_well": -0.8388909971694075
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.24542592385222373,
-      "p1_score": 0.0,
-      "max_elongation": 7.330392814766231,
-      "aspect_ratio_out": 3.5974947988779458,
-      "average_triangularity": -0.37728703807388814,
-      "edge_iota_over_nfp": 0.5044776421055188,
-      "vacuum_well": -0.5662857208360687
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.2,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137472843,
-      "p1_score": 0.0,
-      "max_elongation": 3.9082135297904603,
-      "aspect_ratio_out": 3.5873706820500715,
-      "average_triangularity": -0.3827654493126358,
-      "edge_iota_over_nfp": 0.2601408639197091,
-      "vacuum_well": -0.546989318570786
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.25770258074325547,
-      "p1_score": 0.0,
-      "max_elongation": 5.011193704084697,
-      "aspect_ratio_out": 3.507634687818846,
-      "average_triangularity": -0.47640349211453364,
-      "edge_iota_over_nfp": 0.22268922577702335,
-      "vacuum_well": -0.6378508849467094
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.3563685202429026,
-      "p1_score": 0.0,
-      "max_elongation": 6.523908903139856,
-      "aspect_ratio_out": 3.427898693587626,
-      "average_triangularity": -0.5463927121513822,
-      "edge_iota_over_nfp": 0.1930894439271292,
-      "vacuum_well": -0.7422835451739921
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137472743,
-      "p1_score": 0.0,
-      "max_elongation": 5.4000662498579395,
-      "aspect_ratio_out": 3.58737068205007,
-      "average_triangularity": -0.3827654493126363,
-      "edge_iota_over_nfp": 0.37576442477917044,
-      "vacuum_well": -0.5593363795034076
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.047193015770934044,
-      "p1_score": 0.0,
-      "max_elongation": 6.95917543501653,
-      "aspect_ratio_out": 3.5076346878188462,
-      "average_triangularity": -0.476403492114533,
-      "edge_iota_over_nfp": 0.32390835631237563,
-      "vacuum_well": -0.6531086645842118
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.06608766640438413,
-      "p1_score": 0.0,
-      "max_elongation": 9.110492371135232,
-      "aspect_ratio_out": 3.4278986935876268,
-      "average_triangularity": -0.5463927121513822,
-      "edge_iota_over_nfp": 0.28017370007868475,
-      "vacuum_well": -0.7564369584107291
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.23446910137472732,
-      "p1_score": 0.0,
-      "max_elongation": 7.677673329183981,
-      "aspect_ratio_out": 3.5873706820500697,
-      "average_triangularity": -0.38276544931263634,
-      "edge_iota_over_nfp": 0.4707294226314962,
-      "vacuum_well": -0.5146202191204641
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.5,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.21984253063136272,
-      "p1_score": 0.0,
-      "max_elongation": 4.1304272049856765,
-      "aspect_ratio_out": 3.572749239620019,
-      "average_triangularity": -0.39007873468431864,
-      "edge_iota_over_nfp": 0.23918988633893049,
-      "vacuum_well": -0.5244590200054563
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.30941253114225103,
-      "p1_score": 0.0,
-      "max_elongation": 5.219891723892629,
-      "aspect_ratio_out": 3.4875302044775336,
-      "average_triangularity": -0.48239610900184327,
-      "edge_iota_over_nfp": 0.20717624065732468,
-      "vacuum_well": -0.6074137302875826
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.2,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.3721181780375554,
-      "p1_score": 0.0,
-      "max_elongation": 6.709958387787289,
-      "aspect_ratio_out": 3.4023111693350363,
-      "average_triangularity": -0.551233233273013,
-      "edge_iota_over_nfp": 0.18836454658873336,
-      "vacuum_well": -0.6969799270812338
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.21984253063136006,
-      "p1_score": 0.0,
-      "max_elongation": 5.655882151980431,
-      "aspect_ratio_out": 3.5727492396200193,
-      "average_triangularity": -0.39007873468431997,
-      "edge_iota_over_nfp": 0.3476959386568659,
-      "vacuum_well": -0.5401610577187229
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.55,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.03520778199631258,
-      "p1_score": 0.0,
-      "max_elongation": 7.180648727079949,
-      "aspect_ratio_out": 3.4875302044775283,
-      "average_triangularity": -0.4823961090018437,
-      "edge_iota_over_nfp": 0.30698149307999983,
-      "vacuum_well": -0.6211518335261255
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.55,
-      "triangularity_scale": 0.7,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.06277313159962161,
-      "p1_score": 0.0,
-      "max_elongation": 9.276836759242448,
-      "aspect_ratio_out": 3.402311169335038,
-      "average_triangularity": -0.5512332332730103,
-      "edge_iota_over_nfp": 0.2811680605201135,
-      "vacuum_well": -0.7118470668886681
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.4,
-      "crashed": false,
-      "failure_reason": "",
-      "feasible": false,
-      "p1_feasibility": 0.2198425306313675,
-      "p1_score": 0.0,
-      "max_elongation": 7.969505345435814,
-      "aspect_ratio_out": 3.572749239620025,
-      "average_triangularity": -0.39007873468431625,
-      "edge_iota_over_nfp": 0.44118859445417574,
-      "vacuum_well": -0.4933392101193987
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.55,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    },
-    {
-      "aspect_ratio": 3.8,
-      "elongation": 1.8,
-      "rotational_transform": 1.9,
-      "triangularity_scale": 0.7,
-      "crashed": true,
-      "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
-      "feasible": false,
-      "p1_feasibility": 1000000.0,
-      "p1_score": 0.0,
-      "max_elongation": 10.0,
-      "aspect_ratio_out": 0.0,
-      "average_triangularity": 0.0,
-      "edge_iota_over_nfp": 0.0,
-      "vacuum_well": 0.0
-    }
-  ]
-}

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -26,8 +26,8 @@ Completed:
 - `P1` is locked as the single benchmark task
 - the repaired 4-knob low-dimensional runtime is live in code
 - the official `constellaration` verifier path is wired
-- low-fidelity `run` and high-fidelity `submit` are separated clearly
-- terminal scoring and reporting are fidelity-consistent
 - explicit VMEC failure semantics are implemented
 - the Northflank smoke workflow is committed
 - the Northflank smoke test passed on the team H100
@@ -45,15 +45,15 @@ Still open:
 - decision on whether reset-seed pool should change from paired checks
 - HF Space deployment evidence
 - public Colab mirror or notebook submission link, if the submission surface still requires it
-- before/after trained-policy evidence on the current low-fidelity-only workflow
 - demo and README polish after the artifacts are real
 Current caution:
 - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
 - do not narrate low-fidelity rollout metrics as final submission truth
-- the standard notebook and `training/llm_rollout.py` `monitor` / `evaluate` paths now stay on low-fidelity `run` only and ignore `submit` by default
-- reserve VMEC-backed `submit` for replay/debug work, paired fixture checks, submit-side traces, and final evidence
 ## 3. Locked Decisions
@@ -113,7 +113,7 @@ Compute surfaces:
 - Northflank is the main compute workspace for verifier-heavy work
 - HF Space is the hosted environment surface
 - the public notebook artifact should show trained-policy behavior against the live environment and can be mirrored to Colab if the submission form still requires it
-- trained-policy work should still iterate on low-fidelity `run`; use high-fidelity `submit` only for sparse checkpoint evaluation and final evidence
 Evidence order:
@@ -135,21 +135,20 @@ The environment contract must stay narrow and legible:
 - one repaired low-dimensional boundary family derived from a rotating-ellipse seed
 - discrete `run | submit | restore_best` interaction
-- low-fidelity verifier for normal steps
-- high-fidelity verifier for `submit`
 - readable observation surface with explicit fidelity labeling
-- `Reward V1` kept verifier-native and repair-first, with official normalized violation telemetry
 The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
 ## 8. Execution Order
 - [x] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
-- [x] Pair the tracked low-fidelity fixtures with high-fidelity submit checks immediately after the PPO smoke pass.
 - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
 - [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
-- [ ] Save one fixed-seed untrained baseline with the low-fidelity-only `training/llm_rollout.py evaluate` workflow.
-- [ ] Run one short H100 GRPO pass with the repository notebook on that same low-fidelity-only workflow.
 - [ ] Re-run the same seeds after training and save one before/after artifact.
 - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
 - [x] Refresh the heuristic baseline using the repaired-family evidence.
@@ -203,7 +202,7 @@ Gate 9: trained-policy evidence is real
 - one fixed-seed untrained baseline exists
 - one short low-fidelity training pass exists on the same workflow
-- the repo can show a before/after comparison on the same seeds without relying on `submit`
 ## 10. Fallback Rules
@@ -211,8 +210,8 @@ If training evidence is weak:
 - keep claims conservative about policy quality
 - still ship a trained-policy demonstration and document its limitations plainly
-- do not skip the paired high-fidelity checks or submit-side manual trace
-- do not swap back to submit-included reward traces and present them as the current GRPO path
 If HF Space deployment is delayed:
@@ -239,7 +238,7 @@ If the repaired family is too easy:
 - [x] Check in tracked fixtures.
 - [x] Record the first manual playtest log.
 - [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
-- [x] Pair the tracked fixtures with high-fidelity submit checks.
 - [x] Record one submit-side manual trace.
 - [x] Refresh the heuristic baseline from that playtest evidence.
 - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.

 - `P1` is locked as the single benchmark task
 - the repaired 4-knob low-dimensional runtime is live in code
 - the official `constellaration` verifier path is wired
+- the live environment is now unified onto one low-fidelity reward and verifier surface
+- `submit` remains an explicit terminal action on that same live contract
 - explicit VMEC failure semantics are implemented
 - the Northflank smoke workflow is committed
 - the Northflank smoke test passed on the team H100
 - decision on whether reset-seed pool should change from paired checks
 - HF Space deployment evidence
 - public Colab mirror or notebook submission link, if the submission surface still requires it
+- before/after trained-policy evidence on the current unified low-fidelity workflow
 - demo and README polish after the artifacts are real
 Current caution:
 - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
 - do not narrate low-fidelity rollout metrics as final submission truth
+- the standard notebook and `training/llm_rollout.py` paths should stay on the same live low-fidelity contract as the environment, including explicit `submit`
+- reserve higher-fidelity validation for paired fixture checks, offline validation scripts, and final evidence
 ## 3. Locked Decisions
 - Northflank is the main compute workspace for verifier-heavy work
 - HF Space is the hosted environment surface
 - the public notebook artifact should show trained-policy behavior against the live environment and can be mirrored to Colab if the submission form still requires it
+- trained-policy work should iterate on the same live low-fidelity environment contract that will be demoed publicly
 Evidence order:
 - one repaired low-dimensional boundary family derived from a rotating-ellipse seed
 - discrete `run | submit | restore_best` interaction
+- one low-fidelity verifier surface for all live environment actions
 - readable observation surface with explicit fidelity labeling
+- `Reward V2` keeps the verifier-native `Reward V1` core and adds small best-so-far / anti-stagnation shaping for the low-fi repair loop
 The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
 ## 8. Execution Order
 - [x] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
+- [x] Pair the tracked low-fidelity fixtures with higher-fidelity validation checks immediately after the PPO smoke pass.
 - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
 - [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
+- [ ] Save one fixed-seed untrained baseline with the unified live `training/llm_rollout.py evaluate` workflow.
+- [ ] Run one short H100 GRPO pass with the repository notebook on that same unified low-fidelity workflow.
 - [ ] Re-run the same seeds after training and save one before/after artifact.
 - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
 - [x] Refresh the heuristic baseline using the repaired-family evidence.
 - one fixed-seed untrained baseline exists
 - one short low-fidelity training pass exists on the same workflow
+- the repo can show a before/after comparison on the same seeds using the live environment contract, including `submit`
 ## 10. Fallback Rules
 - keep claims conservative about policy quality
 - still ship a trained-policy demonstration and document its limitations plainly
+- do not skip the paired higher-fidelity validation artifacts
+- do not split the notebook back onto a different submit contract than the live environment
 If HF Space deployment is delayed:
 - [x] Check in tracked fixtures.
 - [x] Record the first manual playtest log.
 - [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
+- [x] Pair the tracked fixtures with higher-fidelity validation checks.
 - [x] Record one submit-side manual trace.
 - [x] Refresh the heuristic baseline from that playtest evidence.
 - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.

docs/P1_ENV_CONTRACT_V1.md CHANGED Viewed

@@ -34,7 +34,8 @@ Official verifier owns:
 - boundary in, metrics out
 - official `P1` feasibility semantics
 - objective direction and score ordering
-- low-fidelity and high-fidelity evaluation modes
 - explicit failure results when VMEC or forward-model evaluation fails
 Environment owns:
@@ -105,10 +106,9 @@ Required fields:
 - `failure_reason`
 - `step_number`
 - `budget_remaining`
 - `best_low_fidelity_score`
 - `best_low_fidelity_feasibility`
-- `best_high_fidelity_score`
-- `best_high_fidelity_feasibility`
 - `target_spec`
 - `diagnostics_text`
 - `reward_breakdown`
@@ -118,14 +118,14 @@ Required fields:
 Interpretation rules:
-- low-fidelity `run` metrics must be labeled as low-fidelity
-- high-fidelity `submit` metrics must be labeled as high-fidelity
-- low-fidelity and high-fidelity best-state reporting must stay separate
 - the observation must be understandable without hidden state
 - normalized constraint-violation telemetry must follow the official `P1` constraint scales
 - the dominant active constraint must be visible so a human can explain repair-phase rewards
 - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
-- action telemetry must expose parameter values before and after the action, including clamped and no-op moves
 ## 6. Episode Flow
@@ -133,7 +133,7 @@ Interpretation rules:
 2. Evaluate the initial state with low fidelity and return the first observation.
 3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity.
 4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget.
-5. On `submit`, end the episode and run the high-fidelity submit evaluation.
 6. End the episode on `submit` or budget exhaustion.
 Failure semantics:
@@ -155,8 +155,8 @@ At termination, the environment must provide:
 Terminal reporting rules:
-- keep submit-time reporting fidelity-consistent
-- do not compare high-fidelity submit results against low-fidelity baseline state as if they were the same truth surface
 ## 8. Verifier Contract
@@ -178,34 +178,47 @@ Do not treat parameterization-specific logic as verifier truth.
 VMEC preset mapping:
-- `run` steps use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence)
-- `submit` uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution)
 - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries
 Training and evaluation rule:
-- use low-fidelity `run` as the RL inner-loop surface
-- the standard repository notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on low-fidelity `run` only and ignore `submit` by default
-- keep higher-fidelity `submit` for terminal truth, explicit replay/debug work, paired fixture checks, and submit-side manual traces
-- do not move VMEC-backed submit evaluation into every training step unless the contract is deliberately redefined
-## 9. Reward V1
-`Reward V1` replaces `Reward V0` because the old infeasible shaping only used `Δ official_feasibility`.
-That was too coarse once the transferred P1 findings made the main pathology clear: official
-feasibility is a max over normalized constraint violations, so useful repair steps on
-non-dominant constraints could be nearly invisible to the reward.
 Target behavior:
 - infeasible to feasible crossing gets a clear positive bonus
 - feasible to infeasible regression gets a clear penalty
 - when both states are infeasible, reduced official feasibility violation should still help
 - when both states are infeasible, reduced normalized triangularity violation should help the most
 - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
 - when both states are feasible, lower `max_elongation` should help
 - larger `run` actions should pay a larger step cost than smaller `run` actions
 - `restore_best` should keep a flat non-submit step cost
 - `submit` should be better than passive exhaustion when the design is genuinely improved
 - recovery after a failed evaluation may receive a modest bounded bonus

 - boundary in, metrics out
 - official `P1` feasibility semantics
 - objective direction and score ordering
+- low-fidelity live evaluation mode
+- optional higher-fidelity offline validation mode
 - explicit failure results when VMEC or forward-model evaluation fails
 Environment owns:
 - `failure_reason`
 - `step_number`
 - `budget_remaining`
+- `no_progress_steps`
 - `best_low_fidelity_score`
 - `best_low_fidelity_feasibility`
 - `target_spec`
 - `diagnostics_text`
 - `reward_breakdown`
 Interpretation rules:
+- live environment metrics must be labeled as low-fidelity
+- best-state reporting should reflect the single live reward surface
 - the observation must be understandable without hidden state
 - normalized constraint-violation telemetry must follow the official `P1` constraint scales
 - the dominant active constraint must be visible so a human can explain repair-phase rewards
 - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
+- action telemetry must expose parameter values before and after the action, including clamped, no-op, and repeat-state moves
+- anti-stagnation state that can change reward must be visible in structured observation fields, not only free text
 ## 6. Episode Flow
 2. Evaluate the initial state with low fidelity and return the first observation.
 3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity.
 4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget.
+5. On `submit`, re-evaluate the current state with low fidelity, consume budget, and end the episode.
 6. End the episode on `submit` or budget exhaustion.
 Failure semantics:
 Terminal reporting rules:
+- keep submit-time reporting on the same live low-fidelity truth surface as the rest of the episode
+- keep any higher-fidelity validation artifacts explicitly outside the live environment observation contract
 ## 8. Verifier Contract
 VMEC preset mapping:
+- `run`, `restore_best`, and `submit` use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence)
+- higher-fidelity validation uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution) outside the live environment loop
 - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries
 Training and evaluation rule:
+- use the live low-fidelity environment contract, including explicit `submit`, as the RL surface
+- the standard repository notebook and `training/llm_rollout.py` workflows should stay aligned to that same action and reward contract
+- keep higher-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts
+- do not reintroduce a separate high-fidelity submit path into the live environment unless the contract is deliberately redefined
+## 9. Reward V2
+`Reward V2` keeps the verifier-native structure from `Reward V1` and adds a small amount of
+trajectory-aware shaping. `Reward V1` fixed the main coarse-signal pathology from `Reward V0`:
+pure `Δ official_feasibility` was too coarse because official feasibility is a max over
+normalized constraint violations, so useful repair steps on non-dominant constraints could be
+nearly invisible to the reward.
+The remaining `Reward V1` pathology was not verifier mismatch. It was short-horizon shaping:
+- the agent got no extra signal for setting a new best infeasible point
+- near-feasible progress below `0.02` had no milestone signal unless it crossed the full feasible boundary
+- feasible improvements only saw step-to-step objective deltas, not "new best feasible score" progress
+- repeated local loops or three-step stagnation had no explicit penalty beyond normal step cost
 Target behavior:
 - infeasible to feasible crossing gets a clear positive bonus
 - feasible to infeasible regression gets a clear penalty
 - when both states are infeasible, reduced official feasibility violation should still help
+- on low-fidelity `run` steps, setting a new best infeasible feasibility should help
+- entering the near-feasible corridor around `p1_feasibility <= 0.02` should get a small bounded bonus
 - when both states are infeasible, reduced normalized triangularity violation should help the most
 - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
 - when both states are feasible, lower `max_elongation` should help
+- on low-fidelity `run` steps, beating the previous best feasible score should help
 - larger `run` actions should pay a larger step cost than smaller `run` actions
 - `restore_best` should keep a flat non-submit step cost
+- repeated local revisits without improvement should pay a small penalty
+- three non-improving steps in a row should pay a small stagnation penalty
 - `submit` should be better than passive exhaustion when the design is genuinely improved
 - recovery after a failed evaluation may receive a modest bounded bonus

docs/p1_transfer/p1_reward_transfer.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# P1 reward transfer for fusion-design-lab
+## Core conclusion
+Use `scadena` as the main repair prior and `CreativeEngineer` as the top-score exploitation prior.
+## Reward schedule
+1. Infeasible phase: reduce official feasibility violation first.
+2. Within infeasible candidates, prioritize average triangularity repair first.
+3. After triangularity starts to clear, prefer aspect-ratio and edge-iota cleanup.
+4. Crossing the official feasibility threshold should receive a large bonus.
+5. Once feasible, optimize the official P1 score, which in practice tracks lower max elongation.
+## Practical priors
+- `scadena_seed.json`: best repair anchor. Raw near-P1 HF seeds crossed official feasibility by moving toward this family.
+- `creative_seed.json` and `creative_best.json`: best endpoint family. Use these for exploitation after feasibility is stable.
+- `samet_seed.json`: exact-feasible distinct family. Useful for diversity and non-scadena exploration.
+- `egodos_seed.json`: best near-feasible non-scadena source.
+- `egodos_sparse_rgroup_best.json`: proof that a sparse low-order `r_cos` move can create a new feasible non-scadena design.
+## Operators that worked
+- Coordinated move toward the `scadena` manifold for repairing raw near-P1 seeds.
+- Sparse low-order `r_cos` move from `egodos` toward `Samet`.
+- Small local `Samet` continuation on low-order modes.
+## Operators that did not work well
+- Whole-boundary interpolation between distant families.
+- Random full-space coefficient noise.
+- Sparse top-k replacement toward `scadena` without coordinated movement.
+## Suggested usage in fusion-design-lab
+- During repair-heavy exploration, add reward telemetry for:
+  - official feasibility
+  - average triangularity
+  - aspect ratio
+  - edge rotational transform over field periods
+  - max elongation
+- Bias mutation proposals toward:
+  - `scadena` direction for feasibility repair
+  - `CreativeEngineer` neighborhood for high-score exploitation
+  - `Samet` and `egodos` seeds for diversity maintenance

docs/p1_transfer/p1_seed_selection.md ADDED Viewed

	@@ -0,0 +1,31 @@

+# P1 curated seed selection
+## Included seeds
+- `creative_best.json`: best overall official P1 design found. Use as the score ceiling reference.
+- `creative_seed.json`: original CreativeEngineer leaderboard anchor. Use as the exploitation parent.
+- `scadena_seed.json`: strongest repair anchor found. Use when a candidate is close but still infeasible.
+- `scadena_repaired_best.json`: repaired raw-HF survivor showing that the scadena corridor generalizes.
+- `samet_seed.json`: exact-feasible distinct family. Use to prevent collapse into only the creative/scadena basin.
+- `egodos_seed.json`: near-feasible non-scadena target. Use as a repair-source seed.
+- `egodos_sparse_rgroup_best.json`: best new non-scadena feasible design found from a sparse grouped repair.
+## Family roles
+- `creative`: best objective region.
+- `scadena`: best feasibility-repair corridor.
+- `samet`: stable distinct feasible basin.
+- `egodos`: useful near-feasible source for non-scadena exploration.
+## Search pattern extracted
+- `CreativeEngineer` is the better endpoint family.
+- `scadena` is the better repair corridor.
+- `Samet` supports local feasible continuation.
+- `egodos` can be repaired into feasibility with a very small sparse low-order `r_cos` move toward `Samet`.
+## Recommended initialization mix
+- 40% from `scadena_seed.json` and `scadena_repaired_best.json`
+- 30% from `creative_seed.json` and `creative_best.json`
+- 20% from `samet_seed.json`
+- 10% from `egodos_seed.json` and `egodos_sparse_rgroup_best.json`
+## Why this is minimal
+This pack gives one strong exploitation family, one strong repair family, and one genuinely distinct non-scadena frontier, without dragging over the entire search archive.

fusion_lab/llm_agent.py CHANGED Viewed

@@ -2,7 +2,7 @@ from __future__ import annotations
 import json
 from dataclasses import asdict, dataclass
-from typing import Final, Sequence
 from fusion_lab.models import (
     DirectionName,
@@ -22,6 +22,12 @@ RUN_PARAMETERS: Final[tuple[ParameterName, ...]] = (
 RUN_DIRECTIONS: Final[tuple[DirectionName, ...]] = ("increase", "decrease")
 RUN_MAGNITUDES: Final[tuple[MagnitudeName, ...]] = ("small", "medium", "large")
 SYSTEM_PROMPT: Final[str] = """You are an expert stellarator designer.
 Goal:
@@ -39,8 +45,9 @@ Action rules:
 - each item must be either:
   - {"intent":"run","parameter":"<parameter>","direction":"increase|decrease","magnitude":"small|medium|large"}
   - {"intent":"restore_best"}
 - keep the plan short and within the remaining budget
-- do not output "submit"
 Constraint directions:
 - aspect_ratio <= 4.0
@@ -154,17 +161,24 @@ def format_observation(observation: StellaratorObservation) -> str:
         f"- evaluation_fidelity: {observation.evaluation_fidelity}\n"
         f"- evaluation_failed: {observation.evaluation_failed}\n"
         f"- budget_remaining: {observation.budget_remaining}\n"
         f"- best_low_fidelity_score: {observation.best_low_fidelity_score:.4f}\n"
         f"- best_low_fidelity_feasibility: {observation.best_low_fidelity_feasibility:.6f}\n"
         f"- diagnostics: {observation.diagnostics_text}\n"
     )
 def build_prompt(observation: StellaratorObservation) -> str:
     return (
-        f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
-        f"<|im_start|>user\n{format_observation(observation)}<|im_end|>\n"
-        "<|im_start|>assistant\n"
     )
@@ -202,7 +216,7 @@ def _parse_action_item(item: object) -> StellaratorAction | None:
     )
-def parse_action_plan(text: str, *, allow_submit: bool = False) -> list[StellaratorAction]:
     raw_plan = extract_json_plan(text)
     if raw_plan is None:
         return []
@@ -231,7 +245,7 @@ def run_episode_with_actions(
     *,
     seed_idx: int,
     auto_submit: bool = False,
-    allow_submit: bool = False,
 ) -> LLMEpisodeTrace:
     environment = StellaratorEnvironment()
     observation = environment.reset(seed=seed_idx)
@@ -266,6 +280,16 @@ def run_episode_with_actions(
     done = False
     step_index = 0
     rollout_actions = [action for action in actions if allow_submit or action.intent != "submit"]
     for step_index, action in enumerate(rollout_actions[:BUDGET], start=1):
         if _step_and_record(action, step_index):
             done = True

 import json
 from dataclasses import asdict, dataclass
+from typing import Final, Literal, Sequence, TypedDict
 from fusion_lab.models import (
     DirectionName,
 RUN_DIRECTIONS: Final[tuple[DirectionName, ...]] = ("increase", "decrease")
 RUN_MAGNITUDES: Final[tuple[MagnitudeName, ...]] = ("small", "medium", "large")
+class PromptMessage(TypedDict):
+    role: Literal["system", "user"]
+    content: str
 SYSTEM_PROMPT: Final[str] = """You are an expert stellarator designer.
 Goal:
 - each item must be either:
   - {"intent":"run","parameter":"<parameter>","direction":"increase|decrease","magnitude":"small|medium|large"}
   - {"intent":"restore_best"}
+  - {"intent":"submit"}
 - keep the plan short and within the remaining budget
+- use "submit" once when you want to stop and lock in the current design
 Constraint directions:
 - aspect_ratio <= 4.0
         f"- evaluation_fidelity: {observation.evaluation_fidelity}\n"
         f"- evaluation_failed: {observation.evaluation_failed}\n"
         f"- budget_remaining: {observation.budget_remaining}\n"
+        f"- no_progress_steps: {observation.no_progress_steps}\n"
         f"- best_low_fidelity_score: {observation.best_low_fidelity_score:.4f}\n"
         f"- best_low_fidelity_feasibility: {observation.best_low_fidelity_feasibility:.6f}\n"
         f"- diagnostics: {observation.diagnostics_text}\n"
     )
+def build_messages(observation: StellaratorObservation) -> tuple[PromptMessage, PromptMessage]:
+    return (
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": format_observation(observation)},
+    )
 def build_prompt(observation: StellaratorObservation) -> str:
+    system_message, user_message = build_messages(observation)
     return (
+        f"System:\n{system_message['content']}\n\nUser:\n{user_message['content']}\n\nAssistant:\n"
     )
     )
+def parse_action_plan(text: str, *, allow_submit: bool = True) -> list[StellaratorAction]:
     raw_plan = extract_json_plan(text)
     if raw_plan is None:
         return []
     *,
     seed_idx: int,
     auto_submit: bool = False,
+    allow_submit: bool = True,
 ) -> LLMEpisodeTrace:
     environment = StellaratorEnvironment()
     observation = environment.reset(seed=seed_idx)
     done = False
     step_index = 0
     rollout_actions = [action for action in actions if allow_submit or action.intent != "submit"]
+    if len(rollout_actions) > BUDGET:
+        submit_index = next(
+            (idx for idx, action in enumerate(rollout_actions) if action.intent == "submit"),
+            None,
+        )
+        if submit_index is not None and submit_index >= BUDGET:
+            # Keep terminal submit within the budget if the model over-runs plan length.
+            rollout_actions = rollout_actions[: BUDGET - 1] + [rollout_actions[submit_index]]
+        else:
+            rollout_actions = rollout_actions[:BUDGET]
     for step_index, action in enumerate(rollout_actions[:BUDGET], start=1):
         if _step_and_record(action, step_index):
             done = True

fusion_lab/models.py CHANGED Viewed

@@ -57,11 +57,16 @@ class RewardBreakdown(BaseModel):
     feasibility_crossing_bonus: float = 0.0
     feasibility_regression_penalty: float = 0.0
     feasibility_delta_reward: float = 0.0
     aspect_ratio_repair_reward: float = 0.0
     triangularity_repair_reward: float = 0.0
     iota_repair_reward: float = 0.0
     objective_delta_reward: float = 0.0
     step_cost: float = 0.0
     recovery_bonus: float = 0.0
     terminal_improvement_bonus: float = 0.0
     terminal_budget_bonus: float = 0.0
@@ -81,6 +86,7 @@ class ActionMonitor(BaseModel):
     params_after: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     clamped: bool = False
     no_op: bool = False
     used_best_params: bool = False
@@ -115,10 +121,9 @@ class StellaratorObservation(Observation):
     failure_reason: str = ""
     step_number: int = 0
     budget_remaining: int = 6
     best_low_fidelity_score: float = 0.0
     best_low_fidelity_feasibility: float = float("inf")
-    best_high_fidelity_score: float | None = None
-    best_high_fidelity_feasibility: float | None = None
     constraints_satisfied: bool = True
     target_spec: str = ""
     reward_breakdown: RewardBreakdown = Field(default_factory=default_reward_breakdown)
@@ -132,14 +137,13 @@ class StellaratorState(State):
     current_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     best_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     initial_low_fidelity_score: float = 0.0
-    initial_high_fidelity_score: float | None = None
     best_low_fidelity_score: float = 0.0
     best_low_fidelity_feasibility: float = float("inf")
-    best_high_fidelity_score: float | None = None
-    best_high_fidelity_feasibility: float | None = None
     budget_total: int = 6
     budget_remaining: int = 6
     episode_done: bool = False
     constraints_satisfied: bool = True
     total_reward: float = 0.0
     history: list[str] = Field(default_factory=list)

     feasibility_crossing_bonus: float = 0.0
     feasibility_regression_penalty: float = 0.0
     feasibility_delta_reward: float = 0.0
+    best_feasibility_bonus: float = 0.0
+    near_feasible_bonus: float = 0.0
     aspect_ratio_repair_reward: float = 0.0
     triangularity_repair_reward: float = 0.0
     iota_repair_reward: float = 0.0
     objective_delta_reward: float = 0.0
+    best_score_bonus: float = 0.0
     step_cost: float = 0.0
+    no_progress_penalty: float = 0.0
+    repeat_state_penalty: float = 0.0
     recovery_bonus: float = 0.0
     terminal_improvement_bonus: float = 0.0
     terminal_budget_bonus: float = 0.0
     params_after: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     clamped: bool = False
     no_op: bool = False
+    repeat_state: bool = False
     used_best_params: bool = False
     failure_reason: str = ""
     step_number: int = 0
     budget_remaining: int = 6
+    no_progress_steps: int = 0
     best_low_fidelity_score: float = 0.0
     best_low_fidelity_feasibility: float = float("inf")
     constraints_satisfied: bool = True
     target_spec: str = ""
     reward_breakdown: RewardBreakdown = Field(default_factory=default_reward_breakdown)
     current_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     best_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
     initial_low_fidelity_score: float = 0.0
     best_low_fidelity_score: float = 0.0
     best_low_fidelity_feasibility: float = float("inf")
     budget_total: int = 6
     budget_remaining: int = 6
     episode_done: bool = False
     constraints_satisfied: bool = True
     total_reward: float = 0.0
+    no_progress_steps: int = 0
+    visited_state_keys: list[str] = Field(default_factory=list)
     history: list[str] = Field(default_factory=list)

server/app.py CHANGED Viewed

@@ -66,7 +66,7 @@ def landing_page() -> str:
     <h2>Constraints</h2>
     <div class="constraint"><span class="name">aspect_ratio</span><span class="bound">&le; 4.0</span></div>
     <div class="constraint"><span class="name">average_triangularity</span><span class="bound">&le; &minus;0.5</span></div>
-    <div class="constraint"><span class="name">edge_iota_over_nfp</span><span class="bound">&ge; 0.3</span></div>
   </div>
   <div class="card">
@@ -98,7 +98,7 @@ def task_summary() -> dict[str, object]:
         "constraints": {
             "aspect_ratio_max": ASPECT_RATIO_MAX,
             "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
-            "edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
         },
         "n_field_periods": N_FIELD_PERIODS,
         "budget": BUDGET,
@@ -113,7 +113,7 @@ def task_summary() -> dict[str, object]:
         "magnitudes": ["small", "medium", "large"],
         "evaluation_modes": {
             "run": "low-fidelity constellaration evaluation",
-            "submit": "high-fidelity constellaration evaluation",
         },
     }

     <h2>Constraints</h2>
     <div class="constraint"><span class="name">aspect_ratio</span><span class="bound">&le; 4.0</span></div>
     <div class="constraint"><span class="name">average_triangularity</span><span class="bound">&le; &minus;0.5</span></div>
+    <div class="constraint"><span class="name">abs(edge_iota_over_nfp)</span><span class="bound">&ge; 0.3</span></div>
   </div>
   <div class="card">
         "constraints": {
             "aspect_ratio_max": ASPECT_RATIO_MAX,
             "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
+            "abs_edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
         },
         "n_field_periods": N_FIELD_PERIODS,
         "budget": BUDGET,
         "magnitudes": ["small", "medium", "large"],
         "evaluation_modes": {
             "run": "low-fidelity constellaration evaluation",
+            "submit": "low-fidelity constellaration terminal evaluation",
         },
     }

server/data/README.md DELETED Viewed

@@ -1,7 +0,0 @@
-Baseline VMEC inputs and related static assets belong here.
-Do not commit generated solver outputs or large transient artifacts.
-## Status
-- [ ] tracked `P1` fixture assets added under `server/data/p1/`

server/data/p1/bad_low_iota.json CHANGED Viewed

@@ -38,5 +38,5 @@
     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
-  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:19.629771+00:00"
 }

     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T15:20:53.640050+00:00"
 }

server/data/p1/boundary_default_reset.json CHANGED Viewed

@@ -38,5 +38,5 @@
     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
-  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:24.745385+00:00"
 }

     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T15:20:58.843405+00:00"
 }

server/data/p1/lowfi_feasible_local.json CHANGED Viewed

@@ -38,5 +38,5 @@
     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
-  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:29.939083+00:00"
 }

     "evaluation_fidelity": "high",
     "failure_reason": ""
   },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T15:21:04.110710+00:00"
 }

server/environment.py CHANGED Viewed

@@ -45,8 +45,8 @@ TARGET_SPEC: Final[str] = (
     "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
     "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
     "triangularity <= -0.5, abs(edge rotational transform / n_field_periods) >= 0.3. "
-    "Run actions use low-fidelity verification. Submit uses high-fidelity verification. "
-    "Budget: 6 evaluations."
 )
 FAILURE_PENALTY: Final[float] = -2.0
@@ -54,6 +54,13 @@ FEASIBILITY_DELTA_WEIGHT: Final[float] = 2.0
 TRIANGULARITY_REPAIR_WEIGHT: Final[float] = 2.0
 ASPECT_RATIO_REPAIR_WEIGHT: Final[float] = 1.0
 IOTA_REPAIR_WEIGHT: Final[float] = 1.0
 STEP_COST_BY_MAGNITUDE: Final[dict[MagnitudeName, float]] = {
     "small": -0.05,
     "medium": -0.1,
@@ -94,6 +101,7 @@ class StellaratorEnvironment(
             constraints_satisfied=metrics.constraints_satisfied,
             total_reward=0.0,
         )
         self._last_metrics = metrics
         self._last_successful_metrics = None if metrics.evaluation_failed else metrics
         return self._build_observation(
@@ -148,17 +156,24 @@ class StellaratorEnvironment(
             direction=action.direction,
             magnitude=action.magnitude,
         )
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=params_before,
             params_after=params,
             clamped=clamped,
             no_op=no_op,
         )
         metrics = self._evaluate_params(params, fidelity="low")
         self._state.current_params = params
         self._state.constraints_satisfied = metrics.constraints_satisfied
-        self._update_best(params, metrics)
         done = self._state.budget_remaining <= 0
         reward_breakdown = self._compute_reward_breakdown(
@@ -166,6 +181,11 @@ class StellaratorEnvironment(
             action.intent,
             done,
             magnitude=action.magnitude,
         )
         reward = reward_breakdown.total
         summary = self._summary_run(action, metrics, action_monitor)
@@ -186,23 +206,22 @@ class StellaratorEnvironment(
         )
     def _handle_submit(self) -> StellaratorObservation:
         action = StellaratorAction(intent="submit")
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=self._state.current_params,
             params_after=self._state.current_params,
         )
-        metrics = self._evaluate_params(self._state.current_params, fidelity="high")
-        initial_submit_score = self._initial_high_fidelity_score()
-        best_submit_metrics = self._refresh_best_high_fidelity_metrics(metrics)
         reward_breakdown = self._compute_reward_breakdown(
             metrics,
             "submit",
             done=True,
-            initial_reference_score=initial_submit_score,
         )
         reward = reward_breakdown.total
-        summary = self._summary_submit(metrics, best_submit_metrics)
         self._state.history.append(summary)
         self._state.total_reward = round(self._state.total_reward + reward, 4)
         self._state.episode_done = True
@@ -223,19 +242,36 @@ class StellaratorEnvironment(
         self._state.budget_remaining -= 1
         params_before = self._state.current_params
         self._state.current_params = self._state.best_params
         action = StellaratorAction(intent="restore_best")
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=params_before,
             params_after=self._state.current_params,
             no_op=params_before == self._state.current_params,
             used_best_params=True,
         )
         metrics = self._evaluate_params(self._state.current_params, fidelity="low")
         self._state.constraints_satisfied = metrics.constraints_satisfied
         done = self._state.budget_remaining <= 0
-        reward_breakdown = self._compute_reward_breakdown(metrics, "restore_best", done)
         reward = reward_breakdown.total
         summary = self._summary_restore(metrics, action_monitor)
         self._state.history.append(summary)
@@ -283,9 +319,25 @@ class StellaratorEnvironment(
         done: bool,
         magnitude: MagnitudeName | None = None,
         initial_reference_score: float | None = None,
     ) -> RewardBreakdown:
         recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
-        previous_metrics = self._reference_metrics(metrics)
         breakdown = RewardBreakdown(
             intent=intent,
             evaluation_failed=metrics.evaluation_failed,
@@ -296,10 +348,17 @@ class StellaratorEnvironment(
             reference_max_elongation=previous_metrics.max_elongation,
             initial_reference_score=initial_reference_score,
         )
         if metrics.evaluation_failed:
             breakdown.failure_penalty = FAILURE_PENALTY
-            if intent != "submit":
-                breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
             if intent == "submit":
                 breakdown.failure_submit_penalty = -1.0
             elif done:
@@ -312,14 +371,40 @@ class StellaratorEnvironment(
         if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
             breakdown.feasibility_regression_penalty = -3.0
         if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
             breakdown.objective_delta_reward = (
                 previous_metrics.max_elongation - metrics.max_elongation
             ) * 10.0
         else:
             breakdown.feasibility_delta_reward = (
                 previous_metrics.p1_feasibility - metrics.p1_feasibility
             ) * FEASIBILITY_DELTA_WEIGHT
             breakdown.triangularity_repair_reward = (
                 previous_metrics.triangularity_violation - metrics.triangularity_violation
             ) * TRIANGULARITY_REPAIR_WEIGHT
@@ -330,9 +415,6 @@ class StellaratorEnvironment(
                 previous_metrics.iota_violation - metrics.iota_violation
             ) * IOTA_REPAIR_WEIGHT
-        if intent != "submit":
-            breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
         if recovered_from_failure:
             breakdown.recovery_bonus = 1.0
@@ -375,8 +457,6 @@ class StellaratorEnvironment(
         )
         best_low_fidelity_score = self._state.best_low_fidelity_score
         best_low_fidelity_feasibility = self._state.best_low_fidelity_feasibility
-        best_high_fidelity_score = self._state.best_high_fidelity_score
-        best_high_fidelity_feasibility = self._state.best_high_fidelity_feasibility
         trajectory_summary = self._trajectory_summary()
         text_lines = [
             action_summary,
@@ -402,14 +482,7 @@ class StellaratorEnvironment(
                 f"dominant_constraint={metrics.dominant_constraint}",
                 f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
                 f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
-                (
-                    "best_high_fidelity_score="
-                    f"{self._format_optional_metric(best_high_fidelity_score)}"
-                ),
-                (
-                    "best_high_fidelity_feasibility="
-                    f"{self._format_optional_metric(best_high_fidelity_feasibility)}"
-                ),
                 f"vacuum_well={metrics.vacuum_well:.4f}",
                 f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
                 f"step={self._state.step_count}  |  budget={self._state.budget_remaining}/{self._state.budget_total}",
@@ -417,6 +490,7 @@ class StellaratorEnvironment(
                 f"reward_terms={self._reward_terms_text(reward_breakdown)}",
                 f"action_clamped={action_monitor.clamped}",
                 f"action_no_op={action_monitor.no_op}",
                 f"episode_total_reward={self._state.total_reward:+.4f}",
             ]
         )
@@ -439,10 +513,9 @@ class StellaratorEnvironment(
             failure_reason=metrics.failure_reason,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
             best_low_fidelity_score=best_low_fidelity_score,
             best_low_fidelity_feasibility=best_low_fidelity_feasibility,
-            best_high_fidelity_score=best_high_fidelity_score,
-            best_high_fidelity_feasibility=best_high_fidelity_feasibility,
             constraints_satisfied=metrics.constraints_satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
@@ -499,14 +572,13 @@ class StellaratorEnvironment(
     def _summary_submit(
         self,
         metrics: EvaluationMetrics,
-        best_submit_metrics: EvaluationMetrics,
     ) -> str:
         if metrics.evaluation_failed:
-            return f"Submit failed during high-fidelity evaluation: {metrics.failure_reason}"
         return (
-            f"Submitted current_high_fidelity_score={metrics.p1_score:.6f}, "
-            f"best_high_fidelity_score={best_submit_metrics.p1_score:.6f}, "
-            f"best_high_fidelity_feasibility={best_submit_metrics.p1_feasibility:.6f}, "
             f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
         )
@@ -573,36 +645,45 @@ class StellaratorEnvironment(
             return self._last_successful_metrics
         return fallback
-    def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
         return (
-            not metrics.evaluation_failed
-            and self._last_metrics is not None
-            and self._last_metrics.evaluation_failed
         )
-    def _initial_high_fidelity_score(self) -> float:
-        if self._state.initial_high_fidelity_score is not None:
-            return self._state.initial_high_fidelity_score
-        metrics = self._evaluate_params(self._state.initial_params, fidelity="high")
-        self._state.initial_high_fidelity_score = metrics.p1_score
-        return metrics.p1_score
-    def _refresh_best_high_fidelity_metrics(
         self,
-        current_submit_metrics: EvaluationMetrics,
-    ) -> EvaluationMetrics:
-        best_metrics = current_submit_metrics
-        if self._state.best_params != self._state.current_params:
-            best_metrics = self._evaluate_params(self._state.best_params, fidelity="high")
-        self._state.best_high_fidelity_score = best_metrics.p1_score
-        self._state.best_high_fidelity_feasibility = best_metrics.p1_feasibility
-        return best_metrics
-    def _format_optional_metric(self, value: float | None) -> str:
-        if value is None:
-            return "n/a"
-        return f"{value:.6f}"
     def _build_action_monitor(
         self,
@@ -612,6 +693,7 @@ class StellaratorEnvironment(
         params_after: LowDimBoundaryParams,
         clamped: bool = False,
         no_op: bool = False,
         used_best_params: bool = False,
     ) -> ActionMonitor:
         return ActionMonitor(
@@ -623,6 +705,7 @@ class StellaratorEnvironment(
             params_after=params_after,
             clamped=clamped,
             no_op=no_op,
             used_best_params=used_best_params,
         )
@@ -644,6 +727,24 @@ class StellaratorEnvironment(
             return "The requested move was clipped to stay inside the allowed parameter range. "
         return ""
     def _step_cost(self, *, intent: ActionIntent, magnitude: MagnitudeName | None) -> float:
         if intent == "restore_best":
             return RESTORE_STEP_COST
@@ -660,11 +761,16 @@ class StellaratorEnvironment(
             + breakdown.feasibility_crossing_bonus
             + breakdown.feasibility_regression_penalty
             + breakdown.feasibility_delta_reward
             + breakdown.aspect_ratio_repair_reward
             + breakdown.triangularity_repair_reward
             + breakdown.iota_repair_reward
             + breakdown.objective_delta_reward
             + breakdown.step_cost
             + breakdown.recovery_bonus
             + breakdown.terminal_improvement_bonus
             + breakdown.terminal_budget_bonus
@@ -681,11 +787,16 @@ class StellaratorEnvironment(
             ("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
             ("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
             ("feasibility_delta_reward", breakdown.feasibility_delta_reward),
             ("aspect_ratio_repair_reward", breakdown.aspect_ratio_repair_reward),
             ("triangularity_repair_reward", breakdown.triangularity_repair_reward),
             ("iota_repair_reward", breakdown.iota_repair_reward),
             ("objective_delta_reward", breakdown.objective_delta_reward),
             ("step_cost", breakdown.step_cost),
             ("recovery_bonus", breakdown.recovery_bonus),
             ("terminal_improvement_bonus", breakdown.terminal_improvement_bonus),
             ("terminal_budget_bonus", breakdown.terminal_budget_bonus),
@@ -705,15 +816,61 @@ class StellaratorEnvironment(
         if metrics.evaluation_failed:
             return
         current = (
             (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
         )
         best = (
-            (1, self._state.best_low_fidelity_score)
-            if self._state.best_low_fidelity_feasibility <= FEASIBILITY_TOLERANCE
-            else (0, -self._state.best_low_fidelity_feasibility)
         )
-        if current > best:
-            self._state.best_params = params
-            self._state.best_low_fidelity_score = metrics.p1_score
-            self._state.best_low_fidelity_feasibility = metrics.p1_feasibility

     "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
     "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
     "triangularity <= -0.5, abs(edge rotational transform / n_field_periods) >= 0.3. "
+    "All actions use low-fidelity verification. Submit ends the episode with an explicit "
+    "terminal evaluation and reward bonus. Budget: 6 evaluations including submit."
 )
 FAILURE_PENALTY: Final[float] = -2.0
 TRIANGULARITY_REPAIR_WEIGHT: Final[float] = 2.0
 ASPECT_RATIO_REPAIR_WEIGHT: Final[float] = 1.0
 IOTA_REPAIR_WEIGHT: Final[float] = 1.0
+BEST_FEASIBILITY_BONUS_WEIGHT: Final[float] = 1.5
+BEST_SCORE_BONUS_WEIGHT: Final[float] = 0.75
+NEAR_FEASIBILITY_THRESHOLD: Final[float] = 0.02
+NEAR_FEASIBILITY_BONUS: Final[float] = 1.0
+NO_PROGRESS_STEP_THRESHOLD: Final[int] = 3
+NO_PROGRESS_PENALTY: Final[float] = -0.2
+REPEAT_STATE_PENALTY: Final[float] = -0.15
 STEP_COST_BY_MAGNITUDE: Final[dict[MagnitudeName, float]] = {
     "small": -0.05,
     "medium": -0.1,
             constraints_satisfied=metrics.constraints_satisfied,
             total_reward=0.0,
         )
+        self._state.visited_state_keys = [self._state_key(params)]
         self._last_metrics = metrics
         self._last_successful_metrics = None if metrics.evaluation_failed else metrics
         return self._build_observation(
             direction=action.direction,
             magnitude=action.magnitude,
         )
+        repeat_state = self._is_repeat_state(params)
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=params_before,
             params_after=params,
             clamped=clamped,
             no_op=no_op,
+            repeat_state=repeat_state,
         )
         metrics = self._evaluate_params(params, fidelity="low")
         self._state.current_params = params
         self._state.constraints_satisfied = metrics.constraints_satisfied
+        (
+            best_low_fidelity_feasibility_before,
+            best_low_fidelity_score_before,
+            step_improved,
+            no_progress_steps,
+        ) = self._advance_low_fidelity_progress(params, metrics)
         done = self._state.budget_remaining <= 0
         reward_breakdown = self._compute_reward_breakdown(
             action.intent,
             done,
             magnitude=action.magnitude,
+            best_low_fidelity_feasibility_before=best_low_fidelity_feasibility_before,
+            best_low_fidelity_score_before=best_low_fidelity_score_before,
+            step_improved=step_improved,
+            no_progress_steps=no_progress_steps,
+            repeat_state=repeat_state,
         )
         reward = reward_breakdown.total
         summary = self._summary_run(action, metrics, action_monitor)
         )
     def _handle_submit(self) -> StellaratorObservation:
+        self._state.budget_remaining -= 1
         action = StellaratorAction(intent="submit")
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=self._state.current_params,
             params_after=self._state.current_params,
         )
+        metrics = self._evaluate_params(self._state.current_params, fidelity="low")
+        self._state.constraints_satisfied = metrics.constraints_satisfied
         reward_breakdown = self._compute_reward_breakdown(
             metrics,
             "submit",
             done=True,
         )
         reward = reward_breakdown.total
+        summary = self._summary_submit(metrics)
         self._state.history.append(summary)
         self._state.total_reward = round(self._state.total_reward + reward, 4)
         self._state.episode_done = True
         self._state.budget_remaining -= 1
         params_before = self._state.current_params
         self._state.current_params = self._state.best_params
+        repeat_state = self._is_repeat_state(self._state.current_params)
         action = StellaratorAction(intent="restore_best")
         action_monitor = self._build_action_monitor(
             action=action,
             params_before=params_before,
             params_after=self._state.current_params,
             no_op=params_before == self._state.current_params,
+            repeat_state=repeat_state,
             used_best_params=True,
         )
         metrics = self._evaluate_params(self._state.current_params, fidelity="low")
         self._state.constraints_satisfied = metrics.constraints_satisfied
+        (
+            best_low_fidelity_feasibility_before,
+            best_low_fidelity_score_before,
+            step_improved,
+            no_progress_steps,
+        ) = self._advance_low_fidelity_progress(self._state.current_params, metrics)
         done = self._state.budget_remaining <= 0
+        reward_breakdown = self._compute_reward_breakdown(
+            metrics,
+            "restore_best",
+            done,
+            best_low_fidelity_feasibility_before=best_low_fidelity_feasibility_before,
+            best_low_fidelity_score_before=best_low_fidelity_score_before,
+            step_improved=step_improved,
+            no_progress_steps=no_progress_steps,
+            repeat_state=repeat_state,
+        )
         reward = reward_breakdown.total
         summary = self._summary_restore(metrics, action_monitor)
         self._state.history.append(summary)
         done: bool,
         magnitude: MagnitudeName | None = None,
         initial_reference_score: float | None = None,
+        reference_metrics: EvaluationMetrics | None = None,
+        best_low_fidelity_feasibility_before: float | None = None,
+        best_low_fidelity_score_before: float | None = None,
+        step_improved: bool = False,
+        no_progress_steps: int = 0,
+        repeat_state: bool = False,
     ) -> RewardBreakdown:
         recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
+        previous_metrics = reference_metrics or self._reference_metrics(metrics)
+        best_low_fidelity_feasibility_before = (
+            self._state.best_low_fidelity_feasibility
+            if best_low_fidelity_feasibility_before is None
+            else best_low_fidelity_feasibility_before
+        )
+        best_low_fidelity_score_before = (
+            self._state.best_low_fidelity_score
+            if best_low_fidelity_score_before is None
+            else best_low_fidelity_score_before
+        )
         breakdown = RewardBreakdown(
             intent=intent,
             evaluation_failed=metrics.evaluation_failed,
             reference_max_elongation=previous_metrics.max_elongation,
             initial_reference_score=initial_reference_score,
         )
+        self._apply_step_penalties(
+            breakdown,
+            intent=intent,
+            magnitude=magnitude,
+            no_progress_steps=no_progress_steps,
+            repeat_state=repeat_state,
+            step_improved=step_improved,
+        )
         if metrics.evaluation_failed:
             breakdown.failure_penalty = FAILURE_PENALTY
             if intent == "submit":
                 breakdown.failure_submit_penalty = -1.0
             elif done:
         if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
             breakdown.feasibility_regression_penalty = -3.0
+        if (
+            previous_metrics.p1_feasibility > NEAR_FEASIBILITY_THRESHOLD
+            and metrics.p1_feasibility <= NEAR_FEASIBILITY_THRESHOLD
+        ):
+            breakdown.near_feasible_bonus = NEAR_FEASIBILITY_BONUS
         if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
             breakdown.objective_delta_reward = (
                 previous_metrics.max_elongation - metrics.max_elongation
             ) * 10.0
+            if intent != "submit" and best_low_fidelity_feasibility_before <= FEASIBILITY_TOLERANCE:
+                breakdown.best_score_bonus = (
+                    max(
+                        0.0,
+                        metrics.p1_score - best_low_fidelity_score_before,
+                    )
+                    * BEST_SCORE_BONUS_WEIGHT
+                )
         else:
             breakdown.feasibility_delta_reward = (
                 previous_metrics.p1_feasibility - metrics.p1_feasibility
             ) * FEASIBILITY_DELTA_WEIGHT
+            if (
+                intent != "submit"
+                and not metrics.constraints_satisfied
+                and best_low_fidelity_feasibility_before > FEASIBILITY_TOLERANCE
+            ):
+                breakdown.best_feasibility_bonus = (
+                    max(
+                        0.0,
+                        best_low_fidelity_feasibility_before - metrics.p1_feasibility,
+                    )
+                    * BEST_FEASIBILITY_BONUS_WEIGHT
+                )
             breakdown.triangularity_repair_reward = (
                 previous_metrics.triangularity_violation - metrics.triangularity_violation
             ) * TRIANGULARITY_REPAIR_WEIGHT
                 previous_metrics.iota_violation - metrics.iota_violation
             ) * IOTA_REPAIR_WEIGHT
         if recovered_from_failure:
             breakdown.recovery_bonus = 1.0
         )
         best_low_fidelity_score = self._state.best_low_fidelity_score
         best_low_fidelity_feasibility = self._state.best_low_fidelity_feasibility
         trajectory_summary = self._trajectory_summary()
         text_lines = [
             action_summary,
                 f"dominant_constraint={metrics.dominant_constraint}",
                 f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
                 f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
+                f"no_progress_steps={self._state.no_progress_steps}",
                 f"vacuum_well={metrics.vacuum_well:.4f}",
                 f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
                 f"step={self._state.step_count}  |  budget={self._state.budget_remaining}/{self._state.budget_total}",
                 f"reward_terms={self._reward_terms_text(reward_breakdown)}",
                 f"action_clamped={action_monitor.clamped}",
                 f"action_no_op={action_monitor.no_op}",
+                f"action_repeat_state={action_monitor.repeat_state}",
                 f"episode_total_reward={self._state.total_reward:+.4f}",
             ]
         )
             failure_reason=metrics.failure_reason,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
+            no_progress_steps=self._state.no_progress_steps,
             best_low_fidelity_score=best_low_fidelity_score,
             best_low_fidelity_feasibility=best_low_fidelity_feasibility,
             constraints_satisfied=metrics.constraints_satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
     def _summary_submit(
         self,
         metrics: EvaluationMetrics,
     ) -> str:
         if metrics.evaluation_failed:
+            return f"Submit failed during low-fidelity evaluation: {metrics.failure_reason}"
         return (
+            f"Submitted current_score={metrics.p1_score:.6f}, "
+            f"best_score={self._state.best_low_fidelity_score:.6f}, "
+            f"best_feasibility={self._state.best_low_fidelity_feasibility:.6f}, "
             f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
         )
             return self._last_successful_metrics
         return fallback
+    def _best_low_fidelity_snapshot(self) -> tuple[float, float]:
         return (
+            self._state.best_low_fidelity_feasibility,
+            self._state.best_low_fidelity_score,
         )
+    def _advance_low_fidelity_progress(
         self,
+        params: LowDimBoundaryParams,
+        metrics: EvaluationMetrics,
+    ) -> tuple[float, float, bool, int]:
+        best_low_fidelity_feasibility_before, best_low_fidelity_score_before = (
+            self._best_low_fidelity_snapshot()
+        )
+        step_improved = self._is_better_than_reference(
+            metrics,
+            self._previous_step_metrics(metrics),
+        )
+        self._update_best(params, metrics)
+        no_progress_steps = self._advance_no_progress(step_improved=step_improved)
+        self._record_visited_state(params)
+        return (
+            best_low_fidelity_feasibility_before,
+            best_low_fidelity_score_before,
+            step_improved,
+            no_progress_steps,
+        )
+    def _previous_step_metrics(self, fallback: EvaluationMetrics) -> EvaluationMetrics:
+        if self._last_metrics is not None:
+            return self._last_metrics
+        return fallback
+    def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
+        return (
+            not metrics.evaluation_failed
+            and self._last_metrics is not None
+            and self._last_metrics.evaluation_failed
+        )
     def _build_action_monitor(
         self,
         params_after: LowDimBoundaryParams,
         clamped: bool = False,
         no_op: bool = False,
+        repeat_state: bool = False,
         used_best_params: bool = False,
     ) -> ActionMonitor:
         return ActionMonitor(
             params_after=params_after,
             clamped=clamped,
             no_op=no_op,
+            repeat_state=repeat_state,
             used_best_params=used_best_params,
         )
             return "The requested move was clipped to stay inside the allowed parameter range. "
         return ""
+    def _apply_step_penalties(
+        self,
+        breakdown: RewardBreakdown,
+        *,
+        intent: ActionIntent,
+        magnitude: MagnitudeName | None,
+        no_progress_steps: int,
+        repeat_state: bool,
+        step_improved: bool,
+    ) -> None:
+        if intent == "submit":
+            return
+        breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
+        if intent == "run" and no_progress_steps >= NO_PROGRESS_STEP_THRESHOLD:
+            breakdown.no_progress_penalty = NO_PROGRESS_PENALTY
+        if intent == "run" and repeat_state and not step_improved:
+            breakdown.repeat_state_penalty = REPEAT_STATE_PENALTY
     def _step_cost(self, *, intent: ActionIntent, magnitude: MagnitudeName | None) -> float:
         if intent == "restore_best":
             return RESTORE_STEP_COST
             + breakdown.feasibility_crossing_bonus
             + breakdown.feasibility_regression_penalty
             + breakdown.feasibility_delta_reward
+            + breakdown.best_feasibility_bonus
+            + breakdown.near_feasible_bonus
             + breakdown.aspect_ratio_repair_reward
             + breakdown.triangularity_repair_reward
             + breakdown.iota_repair_reward
             + breakdown.objective_delta_reward
+            + breakdown.best_score_bonus
             + breakdown.step_cost
+            + breakdown.no_progress_penalty
+            + breakdown.repeat_state_penalty
             + breakdown.recovery_bonus
             + breakdown.terminal_improvement_bonus
             + breakdown.terminal_budget_bonus
             ("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
             ("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
             ("feasibility_delta_reward", breakdown.feasibility_delta_reward),
+            ("best_feasibility_bonus", breakdown.best_feasibility_bonus),
+            ("near_feasible_bonus", breakdown.near_feasible_bonus),
             ("aspect_ratio_repair_reward", breakdown.aspect_ratio_repair_reward),
             ("triangularity_repair_reward", breakdown.triangularity_repair_reward),
             ("iota_repair_reward", breakdown.iota_repair_reward),
             ("objective_delta_reward", breakdown.objective_delta_reward),
+            ("best_score_bonus", breakdown.best_score_bonus),
             ("step_cost", breakdown.step_cost),
+            ("no_progress_penalty", breakdown.no_progress_penalty),
+            ("repeat_state_penalty", breakdown.repeat_state_penalty),
             ("recovery_bonus", breakdown.recovery_bonus),
             ("terminal_improvement_bonus", breakdown.terminal_improvement_bonus),
             ("terminal_budget_bonus", breakdown.terminal_budget_bonus),
         if metrics.evaluation_failed:
             return
+        if self._is_better_than_best(
+            metrics,
+            best_low_fidelity_feasibility=self._state.best_low_fidelity_feasibility,
+            best_low_fidelity_score=self._state.best_low_fidelity_score,
+        ):
+            self._state.best_params = params
+            self._state.best_low_fidelity_score = metrics.p1_score
+            self._state.best_low_fidelity_feasibility = metrics.p1_feasibility
+    def _is_better_than_best(
+        self,
+        metrics: EvaluationMetrics,
+        *,
+        best_low_fidelity_feasibility: float,
+        best_low_fidelity_score: float,
+    ) -> bool:
         current = (
             (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
         )
         best = (
+            (1, best_low_fidelity_score)
+            if best_low_fidelity_feasibility <= FEASIBILITY_TOLERANCE
+            else (0, -best_low_fidelity_feasibility)
+        )
+        return current > best
+    def _is_better_than_reference(
+        self,
+        metrics: EvaluationMetrics,
+        reference_metrics: EvaluationMetrics,
+    ) -> bool:
+        return self._metrics_rank(metrics) > self._metrics_rank(reference_metrics)
+    def _metrics_rank(self, metrics: EvaluationMetrics) -> tuple[int, float]:
+        if metrics.evaluation_failed:
+            return (-1, float("-inf"))
+        if metrics.constraints_satisfied:
+            return (1, metrics.p1_score)
+        return (0, -metrics.p1_feasibility)
+    def _advance_no_progress(self, *, step_improved: bool) -> int:
+        if step_improved:
+            self._state.no_progress_steps = 0
+        else:
+            self._state.no_progress_steps += 1
+        return self._state.no_progress_steps
+    def _is_repeat_state(self, params: LowDimBoundaryParams) -> bool:
+        return self._state_key(params) in self._state.visited_state_keys
+    def _record_visited_state(self, params: LowDimBoundaryParams) -> None:
+        self._state.visited_state_keys.append(self._state_key(params))
+    def _state_key(self, params: LowDimBoundaryParams) -> str:
+        return (
+            f"{params.aspect_ratio:.6f}|{params.elongation:.6f}|"
+            f"{params.rotational_transform:.6f}|{params.triangularity_scale:.6f}"
         )

training/README.md CHANGED Viewed

@@ -4,9 +4,9 @@ This repository treats notebooks and trained-policy runs as supporting evidence
 Training policy:
-- train on the low-fidelity `run` surface for the normal RL inner loop
-- keep the standard `training/llm_rollout.py` monitor/evaluate workflow on low-fidelity `run` only
-- use high-fidelity `submit` only for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence
 ## Status
@@ -50,8 +50,8 @@ Use that module as the source of truth for:
 - local rollout replay
 - rollout telemetry structure used by the monitor command
-For `monitor` and `evaluate`, the rollout stays on low-fidelity `run` steps only and ignores `submit`.
-Use `replay` when you explicitly want to exercise the full environment path including terminal `submit`.
 For `evaluate`, the completion command reads the prompt from `stdin` and writes a raw completion to `stdout`.
 The current seed is exposed as the `FUSION_LAB_SEED` environment variable so the same command can be used

 Training policy:
+- train on the live low-fidelity environment surface, including explicit `submit`
+- keep the standard `training/llm_rollout.py` monitor/evaluate workflow on the same live contract as the notebook
+- keep high-fidelity validation in offline tooling such as `baselines/high_fidelity_validation.py`
 ## Status
 - local rollout replay
 - rollout telemetry structure used by the monitor command
+For `prompt`, `monitor`, `evaluate`, and the notebook, the shared helper contract now includes the live `submit` action.
+Use offline validation scripts when you explicitly want high-fidelity checks outside the environment loop.
 For `evaluate`, the completion command reads the prompt from `stdin` and writes a raw completion to `stdout`.
 The current seed is exposed as the `FUSION_LAB_SEED` environment variable so the same command can be used

training/llm_rollout.py CHANGED Viewed

@@ -10,10 +10,11 @@ from pathlib import Path
 from typing import Final
 from fusion_lab.llm_agent import (
     build_prompt,
     parse_action_plan,
     run_episode_with_actions,
-    LLMEpisodeTrace,
 )
 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
@@ -130,6 +131,7 @@ def prompt_payload(seed: int) -> dict[str, object]:
     return {
         "created_at_utc": datetime.now(UTC).isoformat(),
         "seed": seed,
         "prompt": build_prompt(observation),
         "target_spec": observation.target_spec,
         "budget_remaining": observation.budget_remaining,
@@ -138,7 +140,7 @@ def prompt_payload(seed: int) -> dict[str, object]:
 def parse_actions(
-    args: argparse.Namespace, *, allow_submit: bool = False
 ) -> tuple[str, list[StellaratorAction]]:
     if args.action_plan_file is not None:
         text = args.action_plan_file.read_text()
@@ -253,30 +255,29 @@ def _pearson_correlation(xs: list[float], ys: list[float]) -> float | None:
 def summarize_traces(traces: list[LLMEpisodeTrace]) -> dict[str, object]:
     feasible_count = sum(1 for trace in traces if trace.constraints_satisfied)
-    high_fidelity_traces = [trace for trace in traces if trace.final_evaluation_fidelity == "high"]
-    high_fidelity_count = len(high_fidelity_traces)
     failed_count = sum(1 for trace in traces if trace.evaluation_failed)
     total_rewards = [trace.total_reward for trace in traces]
     final_scores = [trace.final_score for trace in traces]
     final_feasibilities = [trace.final_feasibility for trace in traces]
-    high_fidelity_scores = [trace.final_score for trace in high_fidelity_traces]
-    high_fidelity_feasibilities = [trace.final_feasibility for trace in high_fidelity_traces]
     feasible_flags = [1.0 if trace.constraints_satisfied else 0.0 for trace in traces]
     episode_count = len(traces)
     return {
         "episode_count": episode_count,
         "feasible_episode_count": feasible_count,
-        "high_fidelity_episode_count": high_fidelity_count,
         "evaluation_failed_episode_count": failed_count,
         "feasible_rate": _round_metric(feasible_count / episode_count),
-        "high_fidelity_rate": _round_metric(high_fidelity_count / episode_count),
         "evaluation_failed_rate": _round_metric(failed_count / episode_count),
         "mean_total_reward": _round_metric(_mean(total_rewards)),
         "mean_final_score": _round_metric(_mean(final_scores)),
         "mean_final_feasibility": _round_metric(_mean(final_feasibilities)),
-        "mean_high_fidelity_score": _round_metric(_mean(high_fidelity_scores)),
-        "mean_high_fidelity_feasibility": _round_metric(_mean(high_fidelity_feasibilities)),
         "reward_final_score_correlation": _round_metric(
             _pearson_correlation(total_rewards, final_scores)
         ),
@@ -347,6 +348,7 @@ def evaluate_payload(
         evaluations.append(
             {
                 "seed": seed,
                 "prompt": prompt,
                 "completion": completion,
                 "parsed_action_count": len(actions),
@@ -370,10 +372,10 @@ def write_monitor_summary(payload: dict[str, object]) -> None:
     print(
         "episodes="
         f"{summary['episode_count']} feasible={summary['feasible_episode_count']} "
-        f"high_fidelity={summary['high_fidelity_episode_count']} "
         f"failed={summary['evaluation_failed_episode_count']} "
         f"mean_total_reward={_format_metric(summary['mean_total_reward'], signed=True)} "
-        f"mean_high_fidelity_score={_format_metric(summary['mean_high_fidelity_score'], signed=True)} "
         f"reward_score_corr={summary['reward_final_score_correlation']}"
     )
     for episode in payload["episodes"]:

 from typing import Final
 from fusion_lab.llm_agent import (
+    build_messages,
     build_prompt,
+    LLMEpisodeTrace,
     parse_action_plan,
     run_episode_with_actions,
 )
 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
     return {
         "created_at_utc": datetime.now(UTC).isoformat(),
         "seed": seed,
+        "messages": list(build_messages(observation)),
         "prompt": build_prompt(observation),
         "target_spec": observation.target_spec,
         "budget_remaining": observation.budget_remaining,
 def parse_actions(
+    args: argparse.Namespace, *, allow_submit: bool = True
 ) -> tuple[str, list[StellaratorAction]]:
     if args.action_plan_file is not None:
         text = args.action_plan_file.read_text()
 def summarize_traces(traces: list[LLMEpisodeTrace]) -> dict[str, object]:
     feasible_count = sum(1 for trace in traces if trace.constraints_satisfied)
+    submitted_count = sum(
+        1
+        for trace in traces
+        if trace.steps and trace.steps[-1].reward_breakdown.get("intent") == "submit"
+    )
     failed_count = sum(1 for trace in traces if trace.evaluation_failed)
     total_rewards = [trace.total_reward for trace in traces]
     final_scores = [trace.final_score for trace in traces]
     final_feasibilities = [trace.final_feasibility for trace in traces]
     feasible_flags = [1.0 if trace.constraints_satisfied else 0.0 for trace in traces]
     episode_count = len(traces)
     return {
         "episode_count": episode_count,
         "feasible_episode_count": feasible_count,
+        "submitted_episode_count": submitted_count,
         "evaluation_failed_episode_count": failed_count,
         "feasible_rate": _round_metric(feasible_count / episode_count),
+        "submitted_rate": _round_metric(submitted_count / episode_count),
         "evaluation_failed_rate": _round_metric(failed_count / episode_count),
         "mean_total_reward": _round_metric(_mean(total_rewards)),
         "mean_final_score": _round_metric(_mean(final_scores)),
         "mean_final_feasibility": _round_metric(_mean(final_feasibilities)),
         "reward_final_score_correlation": _round_metric(
             _pearson_correlation(total_rewards, final_scores)
         ),
         evaluations.append(
             {
                 "seed": seed,
+                "messages": list(build_messages(observation)),
                 "prompt": prompt,
                 "completion": completion,
                 "parsed_action_count": len(actions),
     print(
         "episodes="
         f"{summary['episode_count']} feasible={summary['feasible_episode_count']} "
+        f"submitted={summary['submitted_episode_count']} "
         f"failed={summary['evaluation_failed_episode_count']} "
         f"mean_total_reward={_format_metric(summary['mean_total_reward'], signed=True)} "
+        f"mean_final_score={_format_metric(summary['mean_final_score'], signed=True)} "
         f"reward_score_corr={summary['reward_final_score_correlation']}"
     )
     for episode in payload["episodes"]:

training/notebooks/README.md CHANGED Viewed

@@ -26,10 +26,10 @@ Operational defaults:
 - use the same Python dependency set as the repo runtime
 - keep heavy verifier and training work on Northflank
-- keep low-fidelity `run` as the training inner loop; do not put high-fidelity `submit` in every RL step
-- use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, manual traces, and final evidence
 - keep the repository GRPO notebook aligned to the shared helper contract in `fusion_lab/llm_agent.py`
-- the standard notebook reward/eval path is low-fidelity-only and ignores `submit` by default
 - keep the public submission notebook focused on connecting to the deployed HF Space and exporting visible traces
 - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook

 - use the same Python dependency set as the repo runtime
 - keep heavy verifier and training work on Northflank
+- keep the live notebook and environment on one low-fidelity reward surface, including explicit `submit`
+- keep high-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts
 - keep the repository GRPO notebook aligned to the shared helper contract in `fusion_lab/llm_agent.py`
+- the standard notebook reward/eval path uses the same action contract as the environment, including `submit`
 - keep the public submission notebook focused on connecting to the deployed HF Space and exporting visible traces
 - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook

training/notebooks/fusion_design_lab_training.ipynb CHANGED Viewed

@@ -4,22 +4,7 @@
    "cell_type": "markdown",
    "id": "7fb27b941602401d91542211134fc71a",
    "metadata": {},
-   "source": [
-    "# Fusion Design Lab — GRPO Training\n",
-    "\n",
-    "Train an LLM to optimize stellarator fusion reactor designs using **GRPO** (Group Relative Policy Optimization) with **Unsloth** and **TRL**.\n",
-    "\n",
-    "The agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n",
-    "- `aspect_ratio ≤ 4.0`\n",
-    "- `average_triangularity ≤ -0.5`\n",
-    "- `abs(edge_iota_over_nfp) ≥ 0.3`\n",
-    "\n",
-    "Each episode has **6 evaluations** budgeted. The agent produces a plan of actions and the environment scores it via the `constellaration` physics verifier.\n",
-    "\n",
-    "**Environment deployed at**: https://creativeengineer-fusion-design-lab.hf.space\n",
-    "\n",
-    "**Runtime**: Select GPU (T4 or better) via `Runtime > Change runtime type`."
-   ]
   },
   {
    "cell_type": "markdown",
@@ -35,7 +20,74 @@
    "id": "9a63283cbaf04dbcab1f6479b197f3a8",
    "metadata": {},
    "outputs": [],
-   "source": "%%capture\n# Build deps for constellaration (booz-xform compiles from source)\n!apt-get update -qq && apt-get install -y -qq cmake ninja-build g++ gfortran libnetcdf-dev libnetcdff-dev > /dev/null\n\n!pip install trl peft bitsandbytes datasets matplotlib accelerate\n!pip install \"transformers>=4.51\" \"huggingface-hub<1.0\""
   },
   {
    "cell_type": "markdown",
@@ -49,13 +101,61 @@
    "id": "72eea5119410473aa328ad9291626812",
    "metadata": {},
    "outputs": [],
-   "source": "import importlib\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\nfrom peft import LoraConfig, get_peft_model\n\nMODEL_NAME = \"Qwen/Qwen3.5-4B\"\nMAX_SEQ_LENGTH = 2048\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_use_double_quant=True,\n    bnb_4bit_compute_dtype=torch.bfloat16,\n)\n\nattn_impl = \"flash_attention_2\" if importlib.util.find_spec(\"flash_attn\") else \"sdpa\"\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    MODEL_NAME,\n    quantization_config=bnb_config,\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    attn_implementation=attn_impl,\n)\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\nif tokenizer.pad_token is None:\n    tokenizer.pad_token = tokenizer.eos_token\n\nlora_config = LoraConfig(\n    r=32,\n    lora_alpha=32,\n    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n    lora_dropout=0.0,\n    task_type=\"CAUSAL_LM\",\n)\nmodel = get_peft_model(model, lora_config)\nmodel.gradient_checkpointing_enable()\nmodel.print_trainable_parameters()\nprint(f\"Model loaded: {MODEL_NAME} (attn: {attn_impl})\")"
   },
   {
    "cell_type": "markdown",
    "id": "8edb47106e1a46a883d545849b8ab81b",
    "metadata": {},
-   "source": "## 3. Setup Stellarator Environment\n\nInstall the environment package directly from the HF Space repository so training runs locally (no network latency per step). The package also includes the typed `FusionLabClient` and Pydantic models for remote OpenEnv sessions."
   },
   {
    "cell_type": "code",
@@ -65,9 +165,55 @@
    "outputs": [],
    "source": [
     "%%capture\n",
-    "# Install the fusion-design-lab environment (includes constellaration physics engine)\n",
-    "# This takes ~3 minutes due to booz-xform compilation\n",
-    "!pip install \"fusion-design-lab @ git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab\""
    ]
   },
   {
@@ -77,20 +223,24 @@
    "metadata": {},
    "outputs": [],
    "source": [
     "import json\n",
     "from typing import Final\n",
     "\n",
     "from fusion_lab.llm_agent import (\n",
     "    RUN_DIRECTIONS,\n",
     "    RUN_MAGNITUDES,\n",
     "    RUN_PARAMETERS,\n",
-    "    build_prompt,\n",
     "    parse_action_plan,\n",
     "    run_episode_with_actions,\n",
     ")\n",
     "from fusion_lab.models import StellaratorAction\n",
     "from server.contract import RESET_SEEDS\n",
     "from server.environment import BUDGET, StellaratorEnvironment\n",
     "\n",
     "RUN_ACTION_SPECS: Final[list[dict[str, str]]] = [\n",
     "    {\"intent\": \"run\", \"parameter\": p, \"direction\": d, \"magnitude\": m}\n",
@@ -108,7 +258,40 @@
     "print(\n",
     "    f\"Environment ready. Initial score: {obs.p1_score:.4f}, feasibility: {obs.p1_feasibility:.4f}\"\n",
     ")\n",
-    "print(f\"Budget: {obs.budget_remaining}, Constraints satisfied: {obs.constraints_satisfied}\")"
    ]
   },
   {
@@ -131,7 +314,7 @@
     "# Shared helper smoke test\n",
     "env = StellaratorEnvironment()\n",
     "obs = env.reset(seed=0)\n",
-    "prompt = build_prompt(obs)\n",
     "print(prompt[:500])\n",
     "print(\"...\")\n",
     "\n",
@@ -171,7 +354,7 @@
     "prompts = []\n",
     "for seed_idx in range(len(RESET_SEEDS)):\n",
     "    obs = StellaratorEnvironment().reset(seed=seed_idx)\n",
-    "    prompt = build_prompt(obs)\n",
     "    # Repeat each seed to create a larger training set\n",
     "    for _ in range(50):\n",
     "        prompts.append({\"prompt\": prompt, \"seed_idx\": seed_idx})\n",
@@ -185,13 +368,7 @@
    "cell_type": "markdown",
    "id": "504fb2a444614c0babb325280ed9130a",
    "metadata": {},
-   "source": [
-    "## 6. Reward Function\n",
-    "\n",
-    "The environment reward executes each generated action plan in the stellarator environment and returns the cumulative low-fidelity Reward V1 from the live environment. The environment's built-in reward decomposes feasibility (+3/-3 crossing bonuses, official feasibility progress, weighted triangularity/aspect/iota repair terms), objective (max elongation improvement), step costs, and failure penalties — see `server/environment.py:_compute_reward_breakdown(...)`.\n",
-    "\n",
-    "For the current training workflow, the notebook ignores `submit` and does not auto-submit. GRPO therefore optimizes the low-fidelity `run` path only. The live observation telemetry still exposes `reward_breakdown` and `action_monitor` for debugging reward behavior.\n"
-   ]
   },
   {
    "cell_type": "code",
@@ -200,35 +377,30 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import traceback\n",
-    "\n",
-    "\n",
     "def environment_reward_fn(\n",
     "    completions: list[str], seed_idx: list[int] | None = None, **kwargs\n",
     ") -> list[float]:\n",
     "    \"\"\"Execute each action plan in the environment and return cumulative reward.\n",
     "\n",
     "    This is the sole GRPO training signal in the notebook. It uses the live\n",
-    "    low-fidelity environment reward path and ignores submit so the trainer\n",
-    "    optimizes only the `run` surface. Empty or unparseable outputs still\n",
-    "    receive a trainer-side fallback penalty of -3.0.\n",
     "    \"\"\"\n",
     "    rewards = []\n",
     "    seeds = seed_idx if seed_idx is not None else [0] * len(completions)\n",
     "    for i, completion in enumerate(completions):\n",
-    "        try:\n",
-    "            actions = parse_action_plan(completion)\n",
-    "            if len(actions) == 0:\n",
-    "                rewards.append(-3.0)\n",
-    "                continue\n",
-    "            trace = run_episode_with_actions(\n",
-    "                actions,\n",
-    "                seed_idx=int(seeds[i]) % len(RESET_SEEDS),\n",
-    "            )\n",
-    "            rewards.append(trace.total_reward)\n",
-    "        except Exception:\n",
-    "            traceback.print_exc()\n",
     "            rewards.append(-3.0)\n",
     "    return rewards\n",
     "\n",
     "\n",
@@ -249,20 +421,133 @@
     "        },\n",
     "    ]\n",
     ")\n",
-    "print(f\"Environment reward (low-fi only): {environment_reward_fn([test_plan], seed_idx=[0])}\")\n",
     "\n",
-    "# Test short plan with no explicit submit\n",
-    "test_short = json.dumps(\n",
     "    [\n",
     "        {\n",
     "            \"intent\": \"run\",\n",
     "            \"parameter\": \"triangularity_scale\",\n",
     "            \"direction\": \"increase\",\n",
-    "            \"magnitude\": \"medium\",\n",
     "        },\n",
     "    ]\n",
     ")\n",
-    "print(f\"Environment reward (short plan): {environment_reward_fn([test_short], seed_idx=[0])}\")"
    ]
   },
   {
@@ -281,7 +566,140 @@
    "id": "8a65eabff63a45729fe45fb5ade58bdc",
    "metadata": {},
    "outputs": [],
-   "source": "from trl import GRPOConfig, GRPOTrainer\n\nMAX_PROMPT_LENGTH = 768\nMAX_COMPLETION_LENGTH = MAX_SEQ_LENGTH - MAX_PROMPT_LENGTH\n\ntraining_args = GRPOConfig(\n    output_dir=\"./grpo_fusion_output\",\n    learning_rate=5e-5,\n    num_generations=8,\n    max_completion_length=MAX_COMPLETION_LENGTH,\n    max_prompt_length=MAX_PROMPT_LENGTH,\n    per_device_train_batch_size=8,\n    gradient_accumulation_steps=1,\n    max_steps=60,\n    temperature=1.0,\n    logging_steps=1,\n    save_steps=20,\n    bf16=True,\n    report_to=\"none\",\n    seed=42,\n)\n\ntrainer = GRPOTrainer(\n    model=model,\n    processing_class=tokenizer,\n    reward_funcs=[environment_reward_fn],\n    args=training_args,\n    train_dataset=dataset,\n)\n\nprint(\"Starting GRPO training...\")\ntrain_result = trainer.train()\nprint(f\"Training complete. Total steps: {train_result.global_step}\")"
   },
   {
    "cell_type": "markdown",
@@ -290,7 +708,7 @@
    "source": [
     "## 8. Training Results\n",
     "\n",
-    "Visualize reward improvement over training steps."
    ]
   },
   {
@@ -300,8 +718,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import matplotlib.pyplot as plt\n",
-    "\n",
     "log_history = trainer.state.log_history\n",
     "steps = [entry[\"step\"] for entry in log_history if \"loss\" in entry]\n",
     "losses = [entry[\"loss\"] for entry in log_history if \"loss\" in entry]\n",
@@ -346,11 +762,7 @@
    "cell_type": "markdown",
    "id": "8309879909854d7188b41380fd92a7c3",
    "metadata": {},
-   "source": [
-    "## 9. Evaluate Trained Policy\n",
-    "\n",
-    "Generate action plans from the trained model and compare against random baselines."
-   ]
   },
   {
    "cell_type": "code",
@@ -358,13 +770,13 @@
    "id": "3ed186c9a28b402fb0bc4494df01f08d",
    "metadata": {},
    "outputs": [],
-   "source": "import random\n\nmodel.eval()\n\n\ndef reward_term_summary(step_or_obs: object) -> str:\n    breakdown_obj = getattr(step_or_obs, \"reward_breakdown\")\n    breakdown = (\n        breakdown_obj.model_dump() if hasattr(breakdown_obj, \"model_dump\") else breakdown_obj\n    )\n    terms = []\n    for key, value in breakdown.items():\n        if key in {\n            \"intent\",\n            \"total\",\n            \"evaluation_failed\",\n            \"recovered_from_failure\",\n            \"reference_constraints_satisfied\",\n            \"reference_score\",\n            \"reference_feasibility\",\n            \"reference_max_elongation\",\n            \"initial_reference_score\",\n            \"terminal_score_ratio\",\n        }:\n            continue\n        if isinstance(value, (int, float)) and float(value) != 0.0:\n            terms.append(f\"{key}={float(value):+.3f}\")\n    return \", \".join(terms) if terms else \"none\"\n\n\ndef run_episode_with_model(seed_idx: int) -> tuple[float, list[str]]:\n    \"\"\"Run one episode using the trained model.\"\"\"\n    env = StellaratorEnvironment()\n    obs = env.reset(seed=seed_idx)\n    prompt = build_prompt(obs)\n    inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n    with torch.no_grad():\n        outputs = model.generate(\n            **inputs,\n            max_new_tokens=MAX_COMPLETION_LENGTH,\n            temperature=0.7,\n            do_sample=True,\n        )\n    completion = tokenizer.decode(\n        outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n    )\n    actions = parse_action_plan(completion)\n    episode = run_episode_with_actions(actions, seed_idx=seed_idx)\n    trace = [\n        (\n            f\"{step.action_label} → reward={step.reward:.3f} \"\n            f\"score={step.p1_score:.4f} feasible={step.constraints_satisfied} \"\n            f\"terms={reward_term_summary(step)}\"\n        )\n        for step in episode.steps\n    ]\n    return episode.total_reward, trace\n\n\ndef run_random_episode(seed_idx: int) -> float:\n    \"\"\"Run one episode with random actions for comparison.\"\"\"\n    actions = [StellaratorAction(**random.choice(RUN_ACTION_SPECS)) for _ in range(BUDGET)]\n    return run_episode_with_actions(actions, seed_idx=seed_idx).total_reward\n\n\n# Evaluate\nprint(\"=\" * 60)\nprint(\"TRAINED MODEL EPISODES\")\nprint(\"=\" * 60)\ntrained_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n    reward, trace = run_episode_with_model(seed)\n    trained_rewards.append(reward)\n    print(f\"\\nSeed {seed} — Total reward: {reward:.3f}\")\n    for line in trace:\n        print(f\"  {line}\")\n\nprint(f\"\\nMean trained reward: {sum(trained_rewards) / len(trained_rewards):.3f}\")\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"RANDOM BASELINE (10 episodes per seed)\")\nprint(\"=\" * 60)\nrandom_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n    seed_rewards = [run_random_episode(seed) for _ in range(10)]\n    random_rewards.extend(seed_rewards)\n    print(\n        f\"Seed {seed} — Mean: {sum(seed_rewards) / len(seed_rewards):.3f}, Best: {max(seed_rewards):.3f}\"\n    )\n\nprint(f\"\\nMean random reward: {sum(random_rewards) / len(random_rewards):.3f}\")\nprint(f\"Mean trained reward: {sum(trained_rewards) / len(trained_rewards):.3f}\")"
   },
   {
    "cell_type": "markdown",
    "id": "cb1e1581032b452c9409d6c6813c49d1",
    "metadata": {},
-   "source": "## 10. Connect to Deployed HF Space\n\nDemonstrate connecting to the live environment on Hugging Face Spaces through the typed OpenEnv client and running the trained model against it."
   },
   {
    "cell_type": "code",
@@ -372,7 +784,119 @@
    "id": "379cbbc1e968416e875cc15c1202d7eb",
    "metadata": {},
    "outputs": [],
-   "source": "import requests\n\nfrom fusion_lab.client import FusionLabClient\n\nHF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n\n# Check health\nhealth = requests.get(f\"{HF_SPACE_URL}/health\").json()\nprint(f\"HF Space status: {health['status']}\")\n\n# Get task description\ntask = requests.get(f\"{HF_SPACE_URL}/task\").json()\nprint(f\"\\nTask: {task['description']}\")\nprint(f\"Constraints: {task['constraints']}\")\nprint(f\"Budget: {task['budget']}\")\n\nwith FusionLabClient(base_url=HF_SPACE_URL) as env:\n    reset_result = env.reset(seed=42)\n    remote_obs = reset_result.observation\n    print(f\"\\nRemote reset — max_elongation: {remote_obs.max_elongation:.4f}\")\n    print(f\"  aspect_ratio: {remote_obs.aspect_ratio:.4f}\")\n    print(f\"  constraints_satisfied: {remote_obs.constraints_satisfied}\")\n    print(f\"  budget_remaining: {remote_obs.budget_remaining}\")\n\n    # Generate an action plan from the trained model\n    prompt = build_prompt(remote_obs)\n    inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n    with torch.no_grad():\n        outputs = model.generate(\n            **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n        )\n    completion = tokenizer.decode(\n        outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n    )\n    actions = parse_action_plan(completion)\n\n    print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n    for i, action in enumerate(actions[:BUDGET], start=1):\n        if action.intent == \"submit\":\n            continue\n        result = env.step(action)\n        step_obs = result.observation\n        reward = float(result.reward) if result.reward is not None else 0.0\n        print(\n            f\"  Step {i}: {action.intent} {action.parameter or ''} \"\n            f\"{action.direction or ''} {action.magnitude or ''} \"\n            f\"→ reward={reward:.3f}, score={step_obs.p1_score:.4f}, terms={reward_term_summary(step_obs)}\"\n        )\n        if result.done:\n            print(f\"  Episode done. Final score: {step_obs.p1_score:.4f}\")\n            break\nprint(\"\\nEnvironment is live and accessible for training and evaluation.\")"
   }
  ],
  "metadata": {

    "cell_type": "markdown",
    "id": "7fb27b941602401d91542211134fc71a",
    "metadata": {},
+   "source": "# Fusion Design Lab — GRPO Training\n\nTrain an LLM to optimize stellarator fusion reactor designs using **GRPO** (Group Relative Policy Optimization) with **HF TRL**.\n\nThe agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n- `aspect_ratio ≤ 4.0`\n- `average_triangularity ≤ -0.5`\n- `abs(edge_iota_over_nfp) ≥ 0.3`\n\nEach episode has **6 evaluations** budgeted. The notebook now trains on the same live **low-fidelity environment contract** used by the repo runtime: `run`, `restore_best`, and explicit terminal `submit` all stay on the same verifier surface. Higher-fidelity checks live outside the notebook reward loop.\n\n**Environment deployed at**: https://creativeengineer-fusion-design-lab.hf.space\n\n**Runtime**: Select GPU via `Runtime > Change runtime type`. The notebook automatically uses `fp16` on T4/V100-class GPUs and `bf16` on Ampere-or-newer GPUs."
   },
   {
    "cell_type": "markdown",
    "id": "9a63283cbaf04dbcab1f6479b197f3a8",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "%%capture\n",
+    "import importlib.util\n",
+    "import os\n",
+    "import shutil\n",
+    "import subprocess\n",
+    "import sys\n",
+    "\n",
+    "\n",
+    "def run_checked(command: list[str]) -> None:\n",
+    "    subprocess.run(command, check=True)\n",
+    "\n",
+    "\n",
+    "def maybe_install_build_deps() -> None:\n",
+    "    if sys.platform != \"linux\":\n",
+    "        return\n",
+    "    apt_get = shutil.which(\"apt-get\")\n",
+    "    if apt_get is None or os.geteuid() != 0:\n",
+    "        return\n",
+    "    run_checked([apt_get, \"update\", \"-qq\"])\n",
+    "    run_checked(\n",
+    "        [\n",
+    "            apt_get,\n",
+    "            \"install\",\n",
+    "            \"-y\",\n",
+    "            \"-qq\",\n",
+    "            \"cmake\",\n",
+    "            \"ninja-build\",\n",
+    "            \"g++\",\n",
+    "            \"gfortran\",\n",
+    "            \"libnetcdf-dev\",\n",
+    "            \"libnetcdff-dev\",\n",
+    "        ]\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "def ensure_pip() -> None:\n",
+    "    if importlib.util.find_spec(\"pip\") is None:\n",
+    "        run_checked([sys.executable, \"-m\", \"ensurepip\", \"--upgrade\"])\n",
+    "\n",
+    "\n",
+    "def install_python_deps() -> None:\n",
+    "    ensure_pip()\n",
+    "    run_checked(\n",
+    "        [\n",
+    "            sys.executable,\n",
+    "            \"-m\",\n",
+    "            \"pip\",\n",
+    "            \"install\",\n",
+    "            \"trl==0.29.0\",\n",
+    "            \"peft>=0.15.0,<1.0\",\n",
+    "            \"bitsandbytes>=0.45.0,<1.0\",\n",
+    "            \"datasets>=3.0.0,<4.0\",\n",
+    "            \"matplotlib>=3.9.0,<4.0\",\n",
+    "            \"accelerate>=1.3.0,<2.0\",\n",
+    "        ]\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "maybe_install_build_deps()\n",
+    "install_python_deps()\n",
+    "\n",
+    "if importlib.util.find_spec(\"torch\") is None:\n",
+    "    raise RuntimeError(\n",
+    "        \"PyTorch is not installed in this kernel. Use a CUDA-enabled Colab runtime \"\n",
+    "        \"or a Northflank PyTorch GPU notebook image before running this notebook.\"\n",
+    "    )"
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "72eea5119410473aa328ad9291626812",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "import importlib\n",
+    "import torch\n",
+    "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
+    "from peft import LoraConfig, get_peft_model\n",
+    "\n",
+    "MODEL_NAME = \"Qwen/Qwen3-4B\"\n",
+    "MAX_SEQ_LENGTH = 2048\n",
+    "\n",
+    "if not torch.cuda.is_available():\n",
+    "    raise RuntimeError(\"This notebook requires a CUDA GPU runtime.\")\n",
+    "gpu_major, _ = torch.cuda.get_device_capability()\n",
+    "use_bf16 = gpu_major >= 8\n",
+    "compute_dtype = torch.bfloat16 if use_bf16 else torch.float16\n",
+    "\n",
+    "bnb_config = BitsAndBytesConfig(\n",
+    "    load_in_4bit=True,\n",
+    "    bnb_4bit_quant_type=\"nf4\",\n",
+    "    bnb_4bit_use_double_quant=True,\n",
+    "    bnb_4bit_compute_dtype=compute_dtype,\n",
+    ")\n",
+    "\n",
+    "attn_impl = \"flash_attention_2\" if importlib.util.find_spec(\"flash_attn\") else \"sdpa\"\n",
+    "\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    MODEL_NAME,\n",
+    "    quantization_config=bnb_config,\n",
+    "    torch_dtype=compute_dtype,\n",
+    "    device_map=\"auto\",\n",
+    "    attn_implementation=attn_impl,\n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
+    "if tokenizer.pad_token is None:\n",
+    "    tokenizer.pad_token = tokenizer.eos_token\n",
+    "\n",
+    "lora_config = LoraConfig(\n",
+    "    r=32,\n",
+    "    lora_alpha=32,\n",
+    "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
+    "    lora_dropout=0.0,\n",
+    "    task_type=\"CAUSAL_LM\",\n",
+    ")\n",
+    "model = get_peft_model(model, lora_config)\n",
+    "model.gradient_checkpointing_enable()\n",
+    "model.print_trainable_parameters()\n",
+    "dtype_name = \"bf16\" if use_bf16 else \"fp16\"\n",
+    "print(f\"Model loaded: {MODEL_NAME} (attn: {attn_impl}, dtype: {dtype_name})\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "8edb47106e1a46a883d545849b8ab81b",
    "metadata": {},
+   "source": "## 3. Setup Stellarator Environment\n\nInstall the environment from the checked-out Fusion Design Lab repository when it is available in the runtime. If the notebook is running in a fresh Colab session, clone the public repo first and then install it in editable mode. This keeps the notebook bound to the same `server/environment.py` Reward V2 code and `server/physics.py` verifier code that ship with the notebook, instead of a potentially stale deployment copy."
   },
   {
    "cell_type": "code",
    "outputs": [],
    "source": [
     "%%capture\n",
+    "from pathlib import Path\n",
+    "from typing import Final\n",
+    "\n",
+    "REPO_URL = \"https://github.com/jungdaesuh/fusion-design-lab.git\"\n",
+    "EXPECTED_REPO_FILES: Final[tuple[str, ...]] = (\n",
+    "    \"pyproject.toml\",\n",
+    "    \"server/environment.py\",\n",
+    "    \"server/physics.py\",\n",
+    "    \"fusion_lab/models.py\",\n",
+    "    \"fusion_lab/llm_agent.py\",\n",
+    "    \"training/notebooks/fusion_design_lab_training.ipynb\",\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def _is_valid_repo_root(candidate: Path) -> bool:\n",
+    "    return candidate.is_dir() and all((candidate / item).exists() for item in EXPECTED_REPO_FILES)\n",
+    "\n",
+    "\n",
+    "def resolve_repo_root() -> Path:\n",
+    "    candidates = [\n",
+    "        Path.cwd(),\n",
+    "        Path.cwd().parent,\n",
+    "        Path(\"/content/fusion-design-lab\"),\n",
+    "        Path(\"/home/jovyan/fusion-design-lab\"),\n",
+    "        Path.home() / \"fusion-design-lab\",\n",
+    "    ]\n",
+    "    for candidate in candidates:\n",
+    "        if _is_valid_repo_root(candidate):\n",
+    "            return candidate.resolve()\n",
+    "\n",
+    "    target = (\n",
+    "        Path(\"/content/fusion-design-lab\")\n",
+    "        if \"google.colab\" in sys.modules\n",
+    "        else Path.home() / \"fusion-design-lab\"\n",
+    "    )\n",
+    "    if not target.exists():\n",
+    "        subprocess.run([\"git\", \"clone\", REPO_URL, str(target)], check=True)\n",
+    "    if not _is_valid_repo_root(target):\n",
+    "        raise RuntimeError(\n",
+    "            \"Could not locate a complete fusion-design-lab repository at {target}.\".format(\n",
+    "                target=target\n",
+    "            )\n",
+    "        )\n",
+    "    return target.resolve()\n",
+    "\n",
+    "\n",
+    "REPO_ROOT = resolve_repo_root()\n",
+    "os.chdir(REPO_ROOT)\n",
+    "subprocess.run([sys.executable, \"-m\", \"pip\", \"install\", \"-e\", str(REPO_ROOT)], check=True)"
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
+    "import inspect\n",
     "import json\n",
+    "import os\n",
+    "from pathlib import Path\n",
     "from typing import Final\n",
     "\n",
     "from fusion_lab.llm_agent import (\n",
     "    RUN_DIRECTIONS,\n",
     "    RUN_MAGNITUDES,\n",
     "    RUN_PARAMETERS,\n",
+    "    build_messages,\n",
     "    parse_action_plan,\n",
     "    run_episode_with_actions,\n",
     ")\n",
     "from fusion_lab.models import StellaratorAction\n",
     "from server.contract import RESET_SEEDS\n",
     "from server.environment import BUDGET, StellaratorEnvironment\n",
+    "from server.physics import evaluate_boundary\n",
     "\n",
     "RUN_ACTION_SPECS: Final[list[dict[str, str]]] = [\n",
     "    {\"intent\": \"run\", \"parameter\": p, \"direction\": d, \"magnitude\": m}\n",
     "print(\n",
     "    f\"Environment ready. Initial score: {obs.p1_score:.4f}, feasibility: {obs.p1_feasibility:.4f}\"\n",
     ")\n",
+    "print(f\"Budget: {obs.budget_remaining}, Constraints satisfied: {obs.constraints_satisfied}\")\n",
+    "\n",
+    "\n",
+    "def _assert_expected_source(source: str | Path, *, expected: Path, label: str) -> Path:\n",
+    "    source_path = Path(source or \"\").resolve()\n",
+    "    if source_path.name != expected.name:\n",
+    "        raise RuntimeError(f\"Expected {label} to come from {expected}, got {source_path}\")\n",
+    "    if source_path != expected.resolve():\n",
+    "        raise RuntimeError(\n",
+    "            f\"Expected {label} to come from {expected}, got {source_path}. This indicates an environment or module path mismatch.\"\n",
+    "        )\n",
+    "    return source_path\n",
+    "\n",
+    "\n",
+    "reward_source = _assert_expected_source(\n",
+    "    inspect.getsourcefile(StellaratorEnvironment._compute_reward_breakdown),\n",
+    "    expected=REPO_ROOT / \"server\" / \"environment.py\",\n",
+    "    label=\"Reward V2\",\n",
+    ")\n",
+    "verifier_source = _assert_expected_source(\n",
+    "    inspect.getsourcefile(evaluate_boundary),\n",
+    "    expected=REPO_ROOT / \"server\" / \"physics.py\",\n",
+    "    label=\"Verifier\",\n",
+    ")\n",
+    "print(f\"Reward source bound to: {reward_source}\")\n",
+    "print(f\"Verifier source bound to: {verifier_source}\")\n",
+    "\n",
+    "\n",
+    "def render_generation_prompt(observation):\n",
+    "    return tokenizer.apply_chat_template(\n",
+    "        list(build_messages(observation)),\n",
+    "        tokenize=False,\n",
+    "        add_generation_prompt=True,\n",
+    "    )"
    ]
   },
   {
     "# Shared helper smoke test\n",
     "env = StellaratorEnvironment()\n",
     "obs = env.reset(seed=0)\n",
+    "prompt = render_generation_prompt(obs)\n",
     "print(prompt[:500])\n",
     "print(\"...\")\n",
     "\n",
     "prompts = []\n",
     "for seed_idx in range(len(RESET_SEEDS)):\n",
     "    obs = StellaratorEnvironment().reset(seed=seed_idx)\n",
+    "    prompt = render_generation_prompt(obs)\n",
     "    # Repeat each seed to create a larger training set\n",
     "    for _ in range(50):\n",
     "        prompts.append({\"prompt\": prompt, \"seed_idx\": seed_idx})\n",
    "cell_type": "markdown",
    "id": "504fb2a444614c0babb325280ed9130a",
    "metadata": {},
+   "source": "## 6. Reward Function\n\nThe GRPO training signal comes from **Reward V2**, the environment's built-in reward computed per step in `server/environment.py:_compute_reward_breakdown(...)`. Each generated action plan is rolled out in the local environment, and the cumulative reward across all steps becomes the single scalar GRPO optimizes.\n\nThe notebook now uses the same live action contract as the environment itself: plans may include explicit `submit`, and `submit` stays on the same low-fidelity verifier surface as the rest of the episode. Empty or unparseable outputs receive a trainer-side fallback penalty of **−3.0**. Right below, the notebook runs a `submit` smoke so Colab and Northflank confirm the live terminal submit path is wired correctly.\n\n---\n\n### Reward V2 Breakdown\n\nEvery step's reward is the sum of the applicable terms below.\n\n#### 1. Step Costs (every non-submit step)\n\n| Term | Value | Condition |\n|------|-------|-----------|\n| `step_cost` | −0.05 / −0.10 / −0.20 | `run` small / medium / large magnitude |\n| `step_cost` | −0.10 | `restore_best` |\n| `no_progress_penalty` | −0.20 | `no_progress_steps ≥ 3` (consecutive non-improving steps) |\n| `repeat_state_penalty` | −0.15 | Revisiting a previously seen parameter state without improvement |\n| `invalid_action_penalty` | −1.0 | `run` action missing parameter, direction, or magnitude |\n\n#### 2. Evaluation Failure\n\n| Term | Value | Condition |\n|------|-------|-----------|\n| `failure_penalty` | −2.0 | Physics evaluation failed |\n| `failure_submit_penalty` | −1.0 | Failed evaluation on `submit` (additional) |\n| `failure_budget_penalty` | −0.5 | Failed evaluation on last budget step (additional) |\n| `recovery_bonus` | +1.0 | Recovering from a previously failed evaluation |\n\nIf evaluation fails, **only** failure terms and step costs apply — the feasibility/objective terms below are skipped.\n\n#### 3. Feasibility Path (constraints NOT all satisfied)\n\nWhen current or previous state has violated constraints:\n\n| Term | Formula / Value | Purpose |\n|------|-----------------|---------|\n| `feasibility_crossing_bonus` | +3.0 | Crossing from infeasible → feasible |\n| `feasibility_regression_penalty` | −3.0 | Crossing from feasible → infeasible |\n| `near_feasible_bonus` | +1.0 | Feasibility dropping below 0.02 threshold |\n| `feasibility_delta_reward` | `(prev_feasibility − curr_feasibility) × 2.0` | Progress toward satisfying constraints |\n| `best_feasibility_bonus` | `max(0, best_feas_before − curr_feas) × 1.5` | New-best feasibility while still infeasible |\n| `triangularity_repair_reward` | `(prev_tri_violation − curr_tri_violation) × 2.0` | Reducing triangularity constraint gap |\n| `aspect_ratio_repair_reward` | `(prev_ar_violation − curr_ar_violation) × 1.0` | Reducing aspect ratio constraint gap |\n| `iota_repair_reward` | `(prev_iota_violation − curr_iota_violation) × 1.0` | Reducing iota constraint gap |\n\n#### 4. Objective Path (both prev and curr constraints satisfied)\n\nWhen the design is feasible and stays feasible:\n\n| Term | Formula / Value | Purpose |\n|------|-----------------|---------|\n| `objective_delta_reward` | `(prev_max_elongation − curr_max_elongation) × 10.0` | Lowering max elongation (the optimization target) |\n| `best_score_bonus` | `max(0, curr_score − best_score_before) × 0.75` | New-best P1 score while feasible |\n\n#### 5. Terminal Bonus (on `submit` or final budget step)\n\n| Term | Formula / Value | Condition |\n|------|-----------------|-----------|\n| `terminal_improvement_bonus` | `5.0 × ratio` (submit) / `2.0 × ratio` (last step) | Feasible and score > initial score |\n| `terminal_budget_bonus` | `budget_remaining / budget_total` | Submit only, with improvement |\n| `terminal_no_improvement_penalty` | −1.0 (submit) / −0.5 (last step) | No improvement over initial |\n\nWhere `ratio = (curr_score − base_score) / max(1.0 − base_score, 1e-6)`.\n\n---\n\n**Constants** (from `server/environment.py`): `FAILURE_PENALTY=−2.0`, `FEASIBILITY_DELTA_WEIGHT=2.0`, `TRIANGULARITY_REPAIR_WEIGHT=2.0`, `ASPECT_RATIO_REPAIR_WEIGHT=1.0`, `IOTA_REPAIR_WEIGHT=1.0`, `BEST_FEASIBILITY_BONUS_WEIGHT=1.5`, `BEST_SCORE_BONUS_WEIGHT=0.75`, `NEAR_FEASIBILITY_THRESHOLD=0.02`, `NEAR_FEASIBILITY_BONUS=1.0`, `NO_PROGRESS_STEP_THRESHOLD=3`, `NO_PROGRESS_PENALTY=−0.2`, `REPEAT_STATE_PENALTY=−0.15`, `RESTORE_STEP_COST=−0.1`, `STEP_COST_BY_MAGNITUDE={small: −0.05, medium: −0.10, large: −0.20}`."
   },
   {
    "cell_type": "code",
    "metadata": {},
    "outputs": [],
    "source": [
     "def environment_reward_fn(\n",
     "    completions: list[str], seed_idx: list[int] | None = None, **kwargs\n",
     ") -> list[float]:\n",
     "    \"\"\"Execute each action plan in the environment and return cumulative reward.\n",
     "\n",
     "    This is the sole GRPO training signal in the notebook. It uses the live\n",
+    "    low-fidelity environment reward path and allows explicit submit so the\n",
+    "    trainer stays aligned to the same action contract as the environment.\n",
+    "    Empty or unparseable outputs still receive a trainer-side fallback\n",
+    "    penalty of -3.0. Environment/runtime bugs should raise directly so they\n",
+    "    are not misclassified as bad model outputs.\n",
     "    \"\"\"\n",
     "    rewards = []\n",
     "    seeds = seed_idx if seed_idx is not None else [0] * len(completions)\n",
     "    for i, completion in enumerate(completions):\n",
+    "        actions = parse_action_plan(completion)\n",
+    "        if len(actions) == 0:\n",
     "            rewards.append(-3.0)\n",
+    "            continue\n",
+    "        trace = run_episode_with_actions(\n",
+    "            actions,\n",
+    "            seed_idx=int(seeds[i]) % len(RESET_SEEDS),\n",
+    "        )\n",
+    "        rewards.append(trace.total_reward)\n",
     "    return rewards\n",
     "\n",
     "\n",
     "        },\n",
     "    ]\n",
     ")\n",
+    "print(f\"Environment reward: {environment_reward_fn([test_plan], seed_idx=[0])}\")\n",
     "\n",
+    "# Test terminal submit path on the same live verifier surface\n",
+    "submit_smoke_plan = json.dumps(\n",
     "    [\n",
     "        {\n",
     "            \"intent\": \"run\",\n",
     "            \"parameter\": \"triangularity_scale\",\n",
     "            \"direction\": \"increase\",\n",
+    "            \"magnitude\": \"small\",\n",
     "        },\n",
+    "        {\"intent\": \"submit\"},\n",
     "    ]\n",
     ")\n",
+    "submit_smoke_trace = run_episode_with_actions(\n",
+    "    parse_action_plan(submit_smoke_plan),\n",
+    "    seed_idx=0,\n",
+    ")\n",
+    "final_step = submit_smoke_trace.steps[-1]\n",
+    "if final_step.reward_breakdown.get(\"intent\") != \"submit\":\n",
+    "    raise RuntimeError(\"Submit smoke did not end on the terminal submit reward path.\")\n",
+    "if final_step.evaluation_fidelity != \"low\":\n",
+    "    raise RuntimeError(\n",
+    "        f\"Expected unified low-fidelity submit path, got {final_step.evaluation_fidelity!r}.\"\n",
+    "    )\n",
+    "print(\"Submit smoke confirmed live terminal path:\")\n",
+    "print(\n",
+    "    f\"  final action={final_step.action_label}, fidelity={final_step.evaluation_fidelity}, \"\n",
+    "    f\"reward={final_step.reward:+.3f}\"\n",
+    ")\n",
+    "print(f\"  submit reward terms={final_step.reward_breakdown}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "hprgv01ibkq",
+   "metadata": {},
+   "source": "## 6b. Untrained Model Baseline\n\nEvaluate the base model **before any GRPO training** on all 3 seeds using **greedy decoding** (`do_sample=False`). Greedy decoding is deterministic — the same model + prompt always produces the same output — so the before/after comparison is fully reproducible across reruns."
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77dt4zyn6it",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "MAX_PROMPT_LENGTH = 768\n",
+    "MAX_COMPLETION_LENGTH = MAX_SEQ_LENGTH - MAX_PROMPT_LENGTH\n",
+    "\n",
+    "N_RANDOM_ROLLOUTS = 10\n",
+    "\n",
+    "\n",
+    "def reward_term_summary(step_or_obs: object) -> str:\n",
+    "    \"\"\"Format non-zero reward terms for display.\"\"\"\n",
+    "    breakdown_obj = getattr(step_or_obs, \"reward_breakdown\")\n",
+    "    breakdown = (\n",
+    "        breakdown_obj.model_dump() if hasattr(breakdown_obj, \"model_dump\") else breakdown_obj\n",
+    "    )\n",
+    "    terms = []\n",
+    "    for key, value in breakdown.items():\n",
+    "        if key in {\n",
+    "            \"intent\",\n",
+    "            \"total\",\n",
+    "            \"evaluation_failed\",\n",
+    "            \"recovered_from_failure\",\n",
+    "            \"reference_constraints_satisfied\",\n",
+    "            \"reference_score\",\n",
+    "            \"reference_feasibility\",\n",
+    "            \"reference_max_elongation\",\n",
+    "            \"initial_reference_score\",\n",
+    "            \"terminal_score_ratio\",\n",
+    "        }:\n",
+    "            continue\n",
+    "        if isinstance(value, (int, float)) and float(value) != 0.0:\n",
+    "            terms.append(f\"{key}={float(value):+.3f}\")\n",
+    "    return \", \".join(terms) if terms else \"none\"\n",
+    "\n",
+    "\n",
+    "def run_episode_with_model(seed_idx: int) -> tuple[float, list[str], bool]:\n",
+    "    \"\"\"Run one episode using the current model state (greedy decoding).\n",
+    "\n",
+    "    Greedy decoding (do_sample=False) makes the output fully deterministic\n",
+    "    for a given model state and seed, so a single rollout per seed is\n",
+    "    sufficient for reproducible evaluation.\n",
+    "    \"\"\"\n",
+    "    env = StellaratorEnvironment()\n",
+    "    obs = env.reset(seed=seed_idx)\n",
+    "    prompt = render_generation_prompt(obs)\n",
+    "    inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
+    "    with torch.no_grad():\n",
+    "        outputs = model.generate(\n",
+    "            **inputs,\n",
+    "            max_new_tokens=MAX_COMPLETION_LENGTH,\n",
+    "            do_sample=False,\n",
+    "        )\n",
+    "    completion = tokenizer.decode(\n",
+    "        outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
+    "    )\n",
+    "    actions = parse_action_plan(completion)\n",
+    "    if len(actions) == 0:\n",
+    "        return -3.0, [\"(no valid actions parsed)\"], False\n",
+    "    episode = run_episode_with_actions(actions, seed_idx=seed_idx)\n",
+    "    trace = [\n",
+    "        (\n",
+    "            f\"{step.action_label} → reward={step.reward:.3f} \"\n",
+    "            f\"score={step.p1_score:.4f} feasible={step.constraints_satisfied}\"\n",
+    "        )\n",
+    "        for step in episode.steps\n",
+    "    ]\n",
+    "    return episode.total_reward, trace, episode.constraints_satisfied\n",
+    "\n",
+    "\n",
+    "model.eval()\n",
+    "print(\"=\" * 60)\n",
+    "print(\"UNTRAINED MODEL BASELINE (before GRPO) — greedy, deterministic\")\n",
+    "print(\"=\" * 60)\n",
+    "untrained_rewards = []\n",
+    "for seed in range(len(RESET_SEEDS)):\n",
+    "    reward, trace, feasible = run_episode_with_model(seed)\n",
+    "    untrained_rewards.append(reward)\n",
+    "    print(f\"\\nSeed {seed} — Total reward: {reward:.3f}, Feasible: {feasible}\")\n",
+    "    for line in trace:\n",
+    "        print(f\"  {line}\")\n",
+    "\n",
+    "untrained_mean = sum(untrained_rewards) / len(untrained_rewards)\n",
+    "print(f\"\\nUntrained mean reward: {untrained_mean:.3f}\")\n",
+    "print(\"Snapshot saved. Will compare against trained model after GRPO.\")"
    ]
   },
   {
    "id": "8a65eabff63a45729fe45fb5ade58bdc",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from IPython.display import clear_output, display\n",
+    "from transformers import TrainerCallback\n",
+    "from trl import GRPOConfig, GRPOTrainer\n",
+    "\n",
+    "\n",
+    "def extract_logged_reward(logs: dict[str, object]) -> float | None:\n",
+    "    reward_value = logs.get(\"reward\")\n",
+    "    if reward_value is None:\n",
+    "        reward_value = logs.get(\"rewards/environment_reward_fn\")\n",
+    "    if isinstance(reward_value, (int, float)):\n",
+    "        return float(reward_value)\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "class LiveTrainingMonitorCallback(TrainerCallback):\n",
+    "    def __init__(self, max_steps: int) -> None:\n",
+    "        self.max_steps = max_steps\n",
+    "        self.loss_steps: list[int] = []\n",
+    "        self.losses: list[float] = []\n",
+    "        self.reward_steps: list[int] = []\n",
+    "        self.rewards: list[float] = []\n",
+    "\n",
+    "    def _render(self, step: int) -> None:\n",
+    "        clear_output(wait=True)\n",
+    "        latest_loss = self.losses[-1] if self.losses else None\n",
+    "        latest_reward = self.rewards[-1] if self.rewards else None\n",
+    "        best_reward = max(self.rewards) if self.rewards else None\n",
+    "        latest_loss_text = f\"{latest_loss:.4f}\" if latest_loss is not None else \"n/a\"\n",
+    "        latest_reward_text = f\"{latest_reward:+.4f}\" if latest_reward is not None else \"n/a\"\n",
+    "        best_reward_text = f\"{best_reward:+.4f}\" if best_reward is not None else \"n/a\"\n",
+    "\n",
+    "        print(\"GRPO live monitor\")\n",
+    "        print(f\"step: {step}/{self.max_steps}\")\n",
+    "        print(f\"latest loss: {latest_loss_text}\")\n",
+    "        print(f\"latest reward: {latest_reward_text}\")\n",
+    "        print(f\"best reward so far: {best_reward_text}\")\n",
+    "\n",
+    "        fig, axes = plt.subplots(1, 2, figsize=(14, 4))\n",
+    "        if self.losses:\n",
+    "            axes[0].plot(self.loss_steps, self.losses, color=\"#0b6efd\", linewidth=2)\n",
+    "            axes[0].scatter(self.loss_steps[-1], self.losses[-1], color=\"#0b6efd\", s=40)\n",
+    "        else:\n",
+    "            axes[0].text(0.5, 0.5, \"waiting for loss logs\", ha=\"center\", va=\"center\")\n",
+    "        axes[0].set_xlabel(\"Step\")\n",
+    "        axes[0].set_ylabel(\"Loss\")\n",
+    "        axes[0].set_title(\"Training Loss\")\n",
+    "        axes[0].grid(True, alpha=0.3)\n",
+    "\n",
+    "        if self.rewards:\n",
+    "            axes[1].plot(\n",
+    "                self.reward_steps,\n",
+    "                self.rewards,\n",
+    "                color=\"#198754\",\n",
+    "                linewidth=2,\n",
+    "                marker=\"o\",\n",
+    "                markersize=3,\n",
+    "            )\n",
+    "            axes[1].scatter(self.reward_steps[-1], self.rewards[-1], color=\"#198754\", s=40)\n",
+    "        else:\n",
+    "            axes[1].text(0.5, 0.5, \"waiting for reward logs\", ha=\"center\", va=\"center\")\n",
+    "        axes[1].axhline(0.0, color=\"0.7\", linewidth=1, linestyle=\"--\")\n",
+    "        axes[1].set_xlabel(\"Step\")\n",
+    "        axes[1].set_ylabel(\"Mean Reward\")\n",
+    "        axes[1].set_title(\"Environment Reward\")\n",
+    "        axes[1].grid(True, alpha=0.3)\n",
+    "\n",
+    "        fig.suptitle(\"Fusion Design Lab — Live GRPO Monitor\", fontsize=14, fontweight=\"bold\")\n",
+    "        fig.tight_layout()\n",
+    "        display(fig)\n",
+    "        plt.close(fig)\n",
+    "\n",
+    "    def on_log(self, args, state, control, logs=None, **kwargs):\n",
+    "        if not state.is_world_process_zero or not logs:\n",
+    "            return\n",
+    "\n",
+    "        step = int(state.global_step)\n",
+    "        loss_value = logs.get(\"loss\")\n",
+    "        if isinstance(loss_value, (int, float)):\n",
+    "            if self.loss_steps and self.loss_steps[-1] == step:\n",
+    "                self.losses[-1] = float(loss_value)\n",
+    "            else:\n",
+    "                self.loss_steps.append(step)\n",
+    "                self.losses.append(float(loss_value))\n",
+    "\n",
+    "        reward_value = extract_logged_reward(logs)\n",
+    "        if reward_value is not None:\n",
+    "            if self.reward_steps and self.reward_steps[-1] == step:\n",
+    "                self.rewards[-1] = reward_value\n",
+    "            else:\n",
+    "                self.reward_steps.append(step)\n",
+    "                self.rewards.append(reward_value)\n",
+    "\n",
+    "        self._render(step)\n",
+    "\n",
+    "    def on_train_end(self, args, state, control, **kwargs):\n",
+    "        if state.is_world_process_zero:\n",
+    "            self._render(int(state.global_step))\n",
+    "\n",
+    "\n",
+    "training_args = GRPOConfig(\n",
+    "    output_dir=\"./grpo_fusion_output\",\n",
+    "    learning_rate=5e-5,\n",
+    "    num_generations=8,\n",
+    "    max_completion_length=MAX_COMPLETION_LENGTH,\n",
+    "    per_device_train_batch_size=8,\n",
+    "    gradient_accumulation_steps=1,\n",
+    "    max_steps=60,\n",
+    "    temperature=1.0,\n",
+    "    logging_steps=1,\n",
+    "    save_steps=20,\n",
+    "    bf16=use_bf16,\n",
+    "    fp16=not use_bf16,\n",
+    "    report_to=\"none\",\n",
+    "    seed=42,\n",
+    ")\n",
+    "\n",
+    "live_training_callback = LiveTrainingMonitorCallback(max_steps=training_args.max_steps)\n",
+    "\n",
+    "trainer = GRPOTrainer(\n",
+    "    model=model,\n",
+    "    processing_class=tokenizer,\n",
+    "    reward_funcs=[environment_reward_fn],\n",
+    "    args=training_args,\n",
+    "    train_dataset=dataset,\n",
+    "    callbacks=[live_training_callback],\n",
+    ")\n",
+    "\n",
+    "print(\"Starting GRPO training...\")\n",
+    "train_result = trainer.train()\n",
+    "print(f\"Training complete. Total steps: {train_result.global_step}\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "source": [
     "## 8. Training Results\n",
     "\n",
+    "The training cell above renders a live dashboard while GRPO runs. This section saves a clean post-training summary figure."
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
     "log_history = trainer.state.log_history\n",
     "steps = [entry[\"step\"] for entry in log_history if \"loss\" in entry]\n",
     "losses = [entry[\"loss\"] for entry in log_history if \"loss\" in entry]\n",
    "cell_type": "markdown",
    "id": "8309879909854d7188b41380fd92a7c3",
    "metadata": {},
+   "source": "## 9. Evaluate Trained Policy\n\nCompare the GRPO-trained model against the **untrained baseline** (captured in Section 6b) and random action selection on the same **live low-fidelity environment contract** used during GRPO, including explicit terminal `submit`.\n\n- **Model evaluations** use deterministic greedy decoding (`do_sample=False`), so results are fully reproducible across reruns. One rollout per seed suffices.\n- **Random baseline** remains stochastic, so it averages 10 rollouts per seed for a stable estimate, then explicitly submits its final candidate to stay on the same terminal contract."
   },
   {
    "cell_type": "code",
    "id": "3ed186c9a28b402fb0bc4494df01f08d",
    "metadata": {},
    "outputs": [],
+   "source": "import random\n\nmodel.eval()\n\n# --- Trained model (greedy = deterministic, 1 rollout per seed) ---\nprint(\"=\" * 60)\nprint(\"TRAINED MODEL (after GRPO) — greedy, deterministic\")\nprint(\"=\" * 60)\ntrained_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n    reward, trace, feasible = run_episode_with_model(seed)\n    trained_rewards.append(reward)\n    print(f\"\\nSeed {seed} — Total reward: {reward:.3f}, Feasible: {feasible}\")\n    for line in trace:\n        print(f\"  {line}\")\n\ntrained_mean = sum(trained_rewards) / len(trained_rewards)\n\n# --- Random baseline (stochastic, averaged over N_RANDOM_ROLLOUTS per seed) ---\nprint(\"\\n\" + \"=\" * 60)\nprint(f\"RANDOM BASELINE ({N_RANDOM_ROLLOUTS} episodes per seed)\")\nprint(\"=\" * 60)\nrandom_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n    seed_rewards = []\n    for _ in range(N_RANDOM_ROLLOUTS):\n        random_plan = [\n            StellaratorAction(**random.choice(RUN_ACTION_SPECS)) for _ in range(max(BUDGET - 1, 0))\n        ]\n        random_plan.append(StellaratorAction(intent=\"submit\"))\n        seed_rewards.append(\n            run_episode_with_actions(\n                random_plan,\n                seed_idx=seed,\n            ).total_reward\n        )\n    random_rewards.extend(seed_rewards)\n    print(\n        f\"Seed {seed} — Mean: {sum(seed_rewards) / len(seed_rewards):.3f}, \"\n        f\"Best: {max(seed_rewards):.3f}\"\n    )\n\nrandom_mean = sum(random_rewards) / len(random_rewards)\n\n# --- Before/After comparison ---\nprint(\"\\n\" + \"=\" * 60)\nprint(\"BEFORE / AFTER COMPARISON\")\nprint(\"=\" * 60)\nprint(f\"  Model evals: greedy (deterministic), 1 rollout × {len(RESET_SEEDS)} seeds\")\nprint(f\"  Random baseline: {N_RANDOM_ROLLOUTS} rollouts × {len(RESET_SEEDS)} seeds (averaged)\")\nprint()\nprint(f\"{'Agent':<25} {'Mean Reward':>12}\")\nprint(\"-\" * 39)\nprint(f\"{'Random':<25} {random_mean:>+12.3f}\")\nprint(f\"{'Untrained Qwen 3-4B':<25} {untrained_mean:>+12.3f}\")\nprint(f\"{'GRPO-trained (60 steps)':<25} {trained_mean:>+12.3f}\")\nprint()\nimprovement = trained_mean - untrained_mean\nprint(f\"GRPO improvement over untrained: {improvement:+.3f}\")\nprint(f\"GRPO improvement over random:    {trained_mean - random_mean:+.3f}\")"
   },
   {
    "cell_type": "markdown",
    "id": "cb1e1581032b452c9409d6c6813c49d1",
    "metadata": {},
+   "source": "## 10. Connect to Deployed HF Space (Optional)\n\nDemonstrate connecting to the live environment on Hugging Face Spaces through the typed OpenEnv client and running the trained model against it. This section is optional and will skip cleanly if the deployment is unavailable.\n\n**Contract compatibility check:** Before running any episodes, the cell verifies that the remote `/task` endpoint returns the same budget, constraints, parameters, directions, and magnitudes as the local source code. If any field mismatches, the demo is skipped with a diagnostic message — this prevents silent reward or behavior divergence between the notebook and a stale deployment."
   },
   {
    "cell_type": "code",
    "id": "379cbbc1e968416e875cc15c1202d7eb",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "import requests\n",
+    "\n",
+    "from fusion_lab.client import FusionLabClient\n",
+    "from server.physics import (\n",
+    "    ASPECT_RATIO_MAX,\n",
+    "    AVERAGE_TRIANGULARITY_MAX,\n",
+    "    EDGE_IOTA_OVER_NFP_MIN,\n",
+    ")\n",
+    "\n",
+    "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
+    "REQUEST_TIMEOUT_SECONDS = 10\n",
+    "\n",
+    "try:\n",
+    "    health_response = requests.get(f\"{HF_SPACE_URL}/health\", timeout=REQUEST_TIMEOUT_SECONDS)\n",
+    "except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as exc:\n",
+    "    health_response = None\n",
+    "    print(f\"Skipping remote demo — network error reaching HF Space: {exc}\")\n",
+    "\n",
+    "if health_response is not None and health_response.status_code != 200:\n",
+    "    print(\n",
+    "        \"Skipping remote demo because the HF Space is unavailable: \"\n",
+    "        f\"/health returned {health_response.status_code}.\"\n",
+    "    )\n",
+    "    health_response = None\n",
+    "\n",
+    "if health_response is not None:\n",
+    "    health = health_response.json()\n",
+    "    print(f\"HF Space status: {health['status']}\")\n",
+    "\n",
+    "    try:\n",
+    "        task_response = requests.get(f\"{HF_SPACE_URL}/task\", timeout=REQUEST_TIMEOUT_SECONDS)\n",
+    "    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as exc:\n",
+    "        task_response = None\n",
+    "        print(f\"Skipping remote demo — network error reaching /task: {exc}\")\n",
+    "\n",
+    "    if task_response is not None and task_response.status_code != 200:\n",
+    "        print(\n",
+    "            \"Skipping remote demo because the HF Space task endpoint is unavailable: \"\n",
+    "            f\"/task returned {task_response.status_code}.\"\n",
+    "        )\n",
+    "        task_response = None\n",
+    "\n",
+    "    # ── Contract compatibility check ──────────────────────────────────────\n",
+    "    if task_response is not None:\n",
+    "        task = task_response.json()\n",
+    "        expected_contract = {\n",
+    "            \"budget\": BUDGET,\n",
+    "            \"constraints\": {\n",
+    "                \"aspect_ratio_max\": ASPECT_RATIO_MAX,\n",
+    "                \"average_triangularity_max\": AVERAGE_TRIANGULARITY_MAX,\n",
+    "                \"abs_edge_iota_over_nfp_min\": EDGE_IOTA_OVER_NFP_MIN,\n",
+    "            },\n",
+    "            \"parameters\": list(RUN_PARAMETERS),\n",
+    "            \"directions\": list(RUN_DIRECTIONS),\n",
+    "            \"magnitudes\": list(RUN_MAGNITUDES),\n",
+    "        }\n",
+    "        mismatches: list[str] = []\n",
+    "        for key in expected_contract:\n",
+    "            remote_val = task.get(key)\n",
+    "            local_val = expected_contract[key]\n",
+    "            if remote_val != local_val:\n",
+    "                mismatches.append(f\"  {key}: local={local_val!r}  remote={remote_val!r}\")\n",
+    "\n",
+    "        if mismatches:\n",
+    "            print(\"Skipping remote demo — contract mismatch between local code and HF Space:\")\n",
+    "            for m in mismatches:\n",
+    "                print(m)\n",
+    "            print(\"Redeploy the Space with the current code to fix this.\")\n",
+    "            task_response = None\n",
+    "        else:\n",
+    "            print(\"Contract check passed — remote matches local code.\")\n",
+    "            print(f\"\\nTask: {task['description']}\")\n",
+    "            print(f\"Constraints: {task['constraints']}\")\n",
+    "            print(f\"Budget: {task['budget']}\")\n",
+    "\n",
+    "    # ── Run trained model against remote environment ──────────────────────\n",
+    "    if task_response is not None:\n",
+    "        with FusionLabClient(base_url=HF_SPACE_URL) as env:\n",
+    "            reset_result = env.reset(seed=42)\n",
+    "            remote_obs = reset_result.observation\n",
+    "            print(f\"\\nRemote reset — max_elongation: {remote_obs.max_elongation:.4f}\")\n",
+    "            print(f\"  aspect_ratio: {remote_obs.aspect_ratio:.4f}\")\n",
+    "            print(f\"  constraints_satisfied: {remote_obs.constraints_satisfied}\")\n",
+    "            print(f\"  budget_remaining: {remote_obs.budget_remaining}\")\n",
+    "\n",
+    "            prompt = render_generation_prompt(remote_obs)\n",
+    "            inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
+    "            with torch.no_grad():\n",
+    "                outputs = model.generate(\n",
+    "                    **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, do_sample=False\n",
+    "                )\n",
+    "            completion = tokenizer.decode(\n",
+    "                outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
+    "            )\n",
+    "            actions = parse_action_plan(completion)\n",
+    "\n",
+    "            print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n",
+    "            for i, action in enumerate(actions[:BUDGET], start=1):\n",
+    "                result = env.step(action)\n",
+    "                step_obs = result.observation\n",
+    "                reward = float(result.reward) if result.reward is not None else 0.0\n",
+    "                print(\n",
+    "                    f\"  Step {i}: {action.intent} {action.parameter or ''} \"\n",
+    "                    f\"{action.direction or ''} {action.magnitude or ''} \"\n",
+    "                    f\"→ reward={reward:.3f}, score={step_obs.p1_score:.4f}, \"\n",
+    "                    f\"terms={reward_term_summary(step_obs)}\"\n",
+    "                )\n",
+    "                if result.done:\n",
+    "                    print(f\"  Episode done. Final score: {step_obs.p1_score:.4f}\")\n",
+    "                    break\n",
+    "        print(\"\\nEnvironment is live and accessible for training and evaluation.\")"
+   ]
   }
  ],
  "metadata": {