CreativeEngineer Claude Opus 4.6 commited on
Commit
cdc237b
Β·
1 Parent(s): 1ddf57a

feat: reward verifier alignment, notebook hardening, model name fix

Browse files

Squashed 18 commits for HF Space deployment:
- fix: correct model name from Qwen3.5-4B to Qwen3-4B
- fix: harden notebook dependency bootstrap
- fix: align notebook prompts with chat templates
- fix: clean Dockerfile PYTHONPATH for HF build
- fix: stabilize environment submit flow and training references
- fix: add contract compatibility check to remote HF Space demo
- fix: write full Reward V2 breakdown and harden notebook
- fix: make notebook evaluation reproducible
- fix: tighten reward v2 bookkeeping
- feat: add untrained model baseline and before/after comparison
- chore: remove auto-generated matplotlib PNG artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

server/Dockerfile β†’ Dockerfile RENAMED
@@ -18,7 +18,7 @@ RUN pip install --no-cache-dir \
18
 
19
  COPY . /app/env
20
 
21
- ENV PYTHONPATH="/app/env:$PYTHONPATH"
22
  ENV ENABLE_WEB_INTERFACE=true
23
 
24
  EXPOSE 8000
 
18
 
19
  COPY . /app/env
20
 
21
+ ENV PYTHONPATH="/app/env"
22
  ENV ENABLE_WEB_INTERFACE=true
23
 
24
  EXPOSE 8000
README.md CHANGED
@@ -1,9 +1,16 @@
 
 
 
 
 
 
 
1
  # Fusion Design Lab
2
 
3
  Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
4
 
5
  **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
6
- **Training Notebook**: [Repository Notebook (GRPO + Unsloth)](training/notebooks/fusion_design_lab_training.ipynb)
7
 
8
  ## What It Does
9
 
@@ -15,7 +22,7 @@ An RL environment where agents optimize stellarator fusion reactor designs by ad
15
  | `average_triangularity` | ≀ -0.5 |
16
  | `abs(edge_iota_over_nfp)` | β‰₯ 0.3 |
17
 
18
- The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier β€” low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. The live environment still exposes **26 discrete actions** (4 parameters Γ— 2 directions Γ— 3 magnitudes + restore_best + submit), but the standard GRPO notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on the low-fidelity `run` surface and ignore `submit` by default.
19
 
20
  ## Architecture
21
 
@@ -23,7 +30,7 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
23
  - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
24
  - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
25
  - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
26
- - **Training** (`training/`): GRPO notebook (Unsloth + TRL) and PPO smoke test
27
 
28
  ## Current Status
29
 
@@ -33,7 +40,8 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
33
  - GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
34
  - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
35
  - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
36
- - Before/after trained-policy evidence on the current low-fidelity-only workflow is still open
 
37
 
38
  ## Execution Status
39
 
@@ -52,11 +60,10 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
52
  - [x] Split boundary construction from boundary evaluation in `server/physics.py`
53
  - [x] Update the action contract from 3 knobs to the repaired low-dimensional family
54
  - [x] Add explicit VMEC failure semantics to the environment contract
55
- - [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
56
- - [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
57
  - [x] Add tracked `P1` fixtures under `server/data/p1/`
58
  - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
59
- - [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
60
  - [x] Refresh the heuristic baseline for the real verifier path
61
  - [x] Deploy the real environment to HF Space
62
  - [x] Add the public training notebook under `training/notebooks`
@@ -64,16 +71,13 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
64
  ## Known Gaps
65
 
66
  - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
67
- - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
68
  - The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
69
- - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `from_boundary_resolution`; do not present step-time metrics as final submission metrics.
70
- - The standard LLM training and evaluation workflow is now low-fidelity-only: the repo notebook and `training/llm_rollout.py` `monitor` / `evaluate` ignore `submit` by default. Reserve `submit` for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence.
71
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
72
- - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
73
- - Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
74
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
75
- - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
76
- - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
77
  - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
78
 
79
  Current mode:
@@ -134,11 +138,11 @@ uv sync --extra notebooks
134
  ## Immediate Next Steps
135
 
136
  - [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
137
- - [x] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
138
- - [x] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
139
  - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
140
  - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
141
- - [ ] Run one short H100 GRPO pass with the repository notebook on the same low-fidelity-only workflow.
142
  - [ ] Re-run the same seeds after training and save one before/after artifact.
143
  - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
144
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
 
1
+ ---
2
+ title: Fusion Design Lab
3
+ sdk: docker
4
+ app_port: 8000
5
+ short_description: OpenEnv stellarator design optimization environment
6
+ ---
7
+
8
  # Fusion Design Lab
9
 
10
  Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
11
 
12
  **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
13
+ **Training Notebook**: [Repository Notebook (GRPO + HF TRL)](training/notebooks/fusion_design_lab_training.ipynb)
14
 
15
  ## What It Does
16
 
 
22
  | `average_triangularity` | ≀ -0.5 |
23
  | `abs(edge_iota_over_nfp)` | β‰₯ 0.3 |
24
 
25
+ The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the live low-fidelity physics verifier (~0.6s) for every in-environment evaluation. The live environment still exposes **26 discrete actions** (4 parameters Γ— 2 directions Γ— 3 magnitudes + restore_best + submit), and `submit` remains an explicit terminal action on that same reward surface rather than a separate high-fidelity mode.
26
 
27
  ## Architecture
28
 
 
30
  - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
31
  - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
32
  - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
33
+ - **Training** (`training/`): GRPO notebook (HF TRL) and PPO smoke test
34
 
35
  ## Current Status
36
 
 
40
  - GRPO training notebook is checked into the repo and aligned with the shared `fusion_lab/llm_agent.py` contract
41
  - LLM rollout tooling can now generate fresh model completions per seed and save fixed-seed reward/outcome summaries
42
  - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
43
+ - The live low-fidelity reward is now `Reward V2`: verifier-native repair shaping plus bounded best-so-far / anti-stagnation terms
44
+ - Before/after trained-policy evidence on the current unified low-fidelity workflow is still open
45
 
46
  ## Execution Status
47
 
 
60
  - [x] Split boundary construction from boundary evaluation in `server/physics.py`
61
  - [x] Update the action contract from 3 knobs to the repaired low-dimensional family
62
  - [x] Add explicit VMEC failure semantics to the environment contract
63
+ - [x] Collapse the live environment to one low-fidelity truth surface while keeping explicit `submit`
 
64
  - [x] Add tracked `P1` fixtures under `server/data/p1/`
65
  - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
66
+ - [x] Complete paired high-fidelity validation artifacts outside the live environment path
67
  - [x] Refresh the heuristic baseline for the real verifier path
68
  - [x] Deploy the real environment to HF Space
69
  - [x] Add the public training notebook under `training/notebooks`
 
71
  ## Known Gaps
72
 
73
  - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
74
+ - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity validation checks.
75
  - The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
76
+ - The live environment now uses one low-fidelity verifier surface for `run`, `restore_best`, and `submit`. Keep high-fidelity checks in `baselines/high_fidelity_validation.py` and other offline validation artifacts rather than mixing them back into the environment reward loop.
 
77
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 
 
78
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
79
+ - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible submitted finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
80
+ - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired offline high-fidelity evidence.
81
  - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
82
 
83
  Current mode:
 
138
  ## Immediate Next Steps
139
 
140
  - [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
141
+ - [x] Pair the tracked low-fidelity fixtures with high-fidelity validation spot checks immediately after the PPO smoke run.
142
+ - [x] Run at least one explicit-submit manual trace before any broader training push, then record the first real reward pathology, if any.
143
  - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
144
  - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
145
+ - [ ] Run one short H100 GRPO pass with the repository notebook on the same unified low-fidelity workflow.
146
  - [ ] Re-run the same seeds after training and save one before/after artifact.
147
  - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
148
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
TODO.md CHANGED
@@ -61,7 +61,7 @@ flowchart TD
61
  P --> F["Tiny PPO Smoke"]
62
  F --> E["Fixture Checks"]
63
  E --> G["Submit-side Manual Playtest"]
64
- G --> H["Reward V1"]
65
  H --> I["Baselines"]
66
  I --> J["HF Space Deploy"]
67
  J --> K["Colab Notebook"]
@@ -112,7 +112,7 @@ flowchart TD
112
  - [x] Replace the synthetic physics path with `constellaration` wiring
113
  Files:
114
  [server/physics.py](server/physics.py),
115
- [server/Dockerfile](server/Dockerfile),
116
  [pyproject.toml](pyproject.toml)
117
 
118
  - [x] Update the API/task surface to match `P1`
@@ -220,6 +220,13 @@ flowchart TD
220
  [AGENTS.md](AGENTS.md),
221
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
222
 
 
 
 
 
 
 
 
223
  - [x] Write down why `Reward V0` did not survive unchanged
224
  Goal:
225
  document the concrete pathology: pure `Ξ” official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
@@ -294,6 +301,6 @@ flowchart TD
294
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
295
  - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
296
  - [ ] Do not describe the current baseline reset state as feasible or near-feasible
297
- - [x] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
298
  Note:
299
- completed by recording the concrete `Reward V0` pathology and only then moving to `Reward V1`
 
61
  P --> F["Tiny PPO Smoke"]
62
  F --> E["Fixture Checks"]
63
  E --> G["Submit-side Manual Playtest"]
64
+ G --> H["Reward V2"]
65
  H --> I["Baselines"]
66
  I --> J["HF Space Deploy"]
67
  J --> K["Colab Notebook"]
 
112
  - [x] Replace the synthetic physics path with `constellaration` wiring
113
  Files:
114
  [server/physics.py](server/physics.py),
115
+ [Dockerfile](Dockerfile),
116
  [pyproject.toml](pyproject.toml)
117
 
118
  - [x] Update the API/task surface to match `P1`
 
220
  [AGENTS.md](AGENTS.md),
221
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
222
 
223
+ - [x] Update reward from `V1` to `V2` after the verifier-native shaping exposed short-horizon gaps
224
+ Goal:
225
+ add bounded new-best, near-feasible, and anti-stagnation terms without breaking the verifier-native reward story
226
+ Related:
227
+ [AGENTS.md](AGENTS.md),
228
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
229
+
230
  - [x] Write down why `Reward V0` did not survive unchanged
231
  Goal:
232
  document the concrete pathology: pure `Ξ” official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
 
301
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
302
  - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
303
  - [ ] Do not describe the current baseline reset state as feasible or near-feasible
304
+ - [x] Do not force a new reward-version story until the previous reward version shows a real pathology
305
  Note:
306
+ completed by recording the concrete `Reward V0` pathology before `Reward V1`, then recording the concrete short-horizon `Reward V1` gaps before `Reward V2`
assets/p1_seeds/creative_best.json ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 0.0,
9
+ 0.0,
10
+ 0.0,
11
+ 0.0,
12
+ 1.0,
13
+ 0.3178024006853376,
14
+ -0.00494453968429039,
15
+ -9.008828074894216e-05,
16
+ 0.00034826523984284985,
17
+ 0.0,
18
+ 0.0,
19
+ 0.0,
20
+ 0.0
21
+ ],
22
+ [
23
+ 0.0,
24
+ 0.0,
25
+ 1.0284150340556629e-05,
26
+ 0.0,
27
+ 0.00022783626301464338,
28
+ -0.002504825752146429,
29
+ -0.0019791928302356574,
30
+ -0.05986009847084577,
31
+ 0.2378930884212573,
32
+ 0.07041809177925817,
33
+ -0.03405649158367229,
34
+ -0.001255290887707734,
35
+ 0.00015598389817458503,
36
+ 0.0,
37
+ 0.0,
38
+ 0.0,
39
+ 0.0
40
+ ],
41
+ [
42
+ 1.2017696110684103e-08,
43
+ 0.0,
44
+ 1.0284150340556629e-05,
45
+ 0.0,
46
+ 0.00017649409443341567,
47
+ -0.0008786797258682598,
48
+ 0.00871051319329453,
49
+ -0.006108510773329939,
50
+ 0.012799177446456245,
51
+ 0.02540372085366101,
52
+ 0.0061202246568943935,
53
+ 0.005782073039163714,
54
+ -0.00032573388857629895,
55
+ 0.0,
56
+ 0.0,
57
+ 0.0,
58
+ 0.0
59
+ ],
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 4.6090364178986136e-05,
65
+ -0.00018625377697293322,
66
+ 0.0018402574023466017,
67
+ -0.0028694583032823347,
68
+ 0.003249729685005616,
69
+ -0.0032546505923570497,
70
+ 0.0028927525886110798,
71
+ -0.005727300326564687,
72
+ 0.0009349924265612791,
73
+ -0.00029069423934959806,
74
+ 0.0,
75
+ 0.0,
76
+ 0.0,
77
+ 0.0
78
+ ],
79
+ [
80
+ 0.0,
81
+ 0.0,
82
+ 1.6454640544890612e-06,
83
+ 0.0,
84
+ -0.00013796254756594706,
85
+ -7.113785778163368e-05,
86
+ -3.70039013071251e-05,
87
+ -9.26230222850333e-05,
88
+ -7.55348144625171e-05,
89
+ -5.890789481852012e-05,
90
+ -0.00016787611941031008,
91
+ -0.00013512182402120827,
92
+ -0.0007577573777222754,
93
+ 0.0,
94
+ 0.0,
95
+ 0.0,
96
+ 0.0
97
+ ],
98
+ [
99
+ 0.0,
100
+ 0.0,
101
+ 1.6454640544890612e-06,
102
+ 0.0,
103
+ 0.0,
104
+ 0.0,
105
+ 7.374458268637783e-06,
106
+ 0.0,
107
+ 7.374458268637783e-06,
108
+ 7.374458268637783e-06,
109
+ 0.0,
110
+ 7.374458268637783e-06,
111
+ 0.0,
112
+ 0.0,
113
+ 0.0,
114
+ 0.0,
115
+ 0.0
116
+ ],
117
+ [
118
+ 0.0,
119
+ 0.0,
120
+ 1.6454640544890612e-06,
121
+ 1.6454640544890612e-06,
122
+ 1.6454640544890612e-06,
123
+ 0.0,
124
+ 1.6454640544890612e-06,
125
+ 0.0,
126
+ 0.0,
127
+ 1.6454640544890612e-06,
128
+ 0.0,
129
+ 1.6454640544890612e-06,
130
+ 0.0,
131
+ 0.0,
132
+ 0.0,
133
+ 0.0,
134
+ 0.0
135
+ ],
136
+ [
137
+ 0.0,
138
+ 0.0,
139
+ 0.0,
140
+ 0.0,
141
+ 0.0,
142
+ 0.0,
143
+ 0.0,
144
+ 0.0,
145
+ 0.0,
146
+ 0.0,
147
+ 0.0,
148
+ 0.0,
149
+ 0.0,
150
+ 0.0,
151
+ 0.0,
152
+ 0.0,
153
+ 0.0
154
+ ],
155
+ [
156
+ 0.0,
157
+ 0.0,
158
+ 0.0,
159
+ 0.0,
160
+ 0.0,
161
+ 0.0,
162
+ 0.0,
163
+ 0.0,
164
+ 0.0,
165
+ 0.0,
166
+ 0.0,
167
+ 0.0,
168
+ 0.0,
169
+ 0.0,
170
+ 0.0,
171
+ 0.0,
172
+ 0.0
173
+ ]
174
+ ],
175
+ "z_sin": [
176
+ [
177
+ 0.0,
178
+ 0.0,
179
+ 0.0,
180
+ 0.0,
181
+ 0.0,
182
+ 0.0,
183
+ 0.0,
184
+ 0.0,
185
+ 0.0,
186
+ -0.3682437645980358,
187
+ -0.010313325093545838,
188
+ 0.0009509826733591118,
189
+ 8.731728723274532e-05,
190
+ 0.0,
191
+ 0.0,
192
+ 0.0,
193
+ 0.0
194
+ ],
195
+ [
196
+ 0.0,
197
+ 0.0,
198
+ 0.0,
199
+ 0.0,
200
+ -0.00021820594976462235,
201
+ 0.00257055829045463,
202
+ -0.0127795602890544,
203
+ -0.05705253192342194,
204
+ 0.25012256718258646,
205
+ 0.012207198333313168,
206
+ 0.0340313223723876,
207
+ 0.0003576776007283744,
208
+ -0.0002845557347781907,
209
+ 0.0,
210
+ 0.0,
211
+ 0.0,
212
+ 0.0
213
+ ],
214
+ [
215
+ 0.0,
216
+ 0.0,
217
+ 1.6454640544890612e-06,
218
+ 0.0,
219
+ 0.0006581156987357039,
220
+ 0.0001351500860080824,
221
+ 0.021441581178606544,
222
+ -0.009487259768838647,
223
+ 0.023875626799357026,
224
+ 0.018329471230646432,
225
+ 0.03202330538363405,
226
+ -0.002402806268419791,
227
+ -0.00021251611687155453,
228
+ 0.0,
229
+ 0.0,
230
+ 0.0,
231
+ 0.0
232
+ ],
233
+ [
234
+ 0.0,
235
+ 0.0,
236
+ 1.6454640544890612e-06,
237
+ 7.374458268637783e-06,
238
+ 0.00042403289373939767,
239
+ -0.0009267683930298048,
240
+ 0.002547132301658598,
241
+ 0.0018252150534381778,
242
+ 0.0025447442599817994,
243
+ -0.0006139539204201418,
244
+ 0.0040519500435168615,
245
+ -0.002119370745245054,
246
+ 0.0006644491009208615,
247
+ 0.0,
248
+ 0.0,
249
+ 0.0,
250
+ 0.0
251
+ ],
252
+ [
253
+ 0.0,
254
+ 0.0,
255
+ 0.0,
256
+ 0.0,
257
+ 0.0003314760556756545,
258
+ 0.0003278655337955934,
259
+ 0.00013156289898347628,
260
+ -2.5890394467765182e-05,
261
+ 0.0005822073549505646,
262
+ 3.0278685234777375e-05,
263
+ -0.0001386996202989576,
264
+ 0.0005453186709654603,
265
+ 0.00024046539821892854,
266
+ 0.0,
267
+ 0.0,
268
+ 0.0,
269
+ 0.0
270
+ ],
271
+ [
272
+ 0.0,
273
+ 0.0,
274
+ 0.0,
275
+ 0.0,
276
+ 0.0,
277
+ 0.0,
278
+ 0.0,
279
+ 0.0,
280
+ 0.0,
281
+ 0.0,
282
+ 0.0,
283
+ 0.0,
284
+ 0.0,
285
+ 0.0,
286
+ 0.0,
287
+ 0.0,
288
+ 0.0
289
+ ],
290
+ [
291
+ 0.0,
292
+ 0.0,
293
+ 0.0,
294
+ 0.0,
295
+ 0.0,
296
+ 0.0,
297
+ 0.0,
298
+ 0.0,
299
+ 0.0,
300
+ 0.0,
301
+ 0.0,
302
+ 0.0,
303
+ 0.0,
304
+ 0.0,
305
+ 0.0,
306
+ 0.0,
307
+ 0.0
308
+ ],
309
+ [
310
+ 0.0,
311
+ 0.0,
312
+ 0.0,
313
+ 0.0,
314
+ 0.0,
315
+ 0.0,
316
+ 0.0,
317
+ 0.0,
318
+ 0.0,
319
+ 0.0,
320
+ 0.0,
321
+ 0.0,
322
+ 0.0,
323
+ 0.0,
324
+ 0.0,
325
+ 0.0,
326
+ 0.0
327
+ ],
328
+ [
329
+ 0.0,
330
+ 0.0,
331
+ 0.0,
332
+ 0.0,
333
+ 0.0,
334
+ 0.0,
335
+ 0.0,
336
+ 0.0,
337
+ 0.0,
338
+ 0.0,
339
+ 0.0,
340
+ 0.0,
341
+ 0.0,
342
+ 0.0,
343
+ 0.0,
344
+ 0.0,
345
+ 0.0
346
+ ]
347
+ ],
348
+ "n_field_periods": 3,
349
+ "n_periodicity": 1,
350
+ "is_stellarator_symmetric": true
351
+ }
assets/p1_seeds/creative_seed.json ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 0.0,
9
+ 0.0,
10
+ 0.0,
11
+ 0.0,
12
+ 1.0,
13
+ 0.3178024006853376,
14
+ -0.00494453968429039,
15
+ -9.008828074894216e-05,
16
+ 0.00034826523984284985,
17
+ 0.0,
18
+ 0.0,
19
+ 0.0,
20
+ 0.0
21
+ ],
22
+ [
23
+ 0.0,
24
+ 0.0,
25
+ 1.0284150340556629e-05,
26
+ 0.0,
27
+ 0.00022783626301464338,
28
+ -0.002504825752146429,
29
+ -0.0019791928302356574,
30
+ -0.05986009847084577,
31
+ 0.2378930884212573,
32
+ 0.07041809177925817,
33
+ -0.03405649158367229,
34
+ -0.001255290887707734,
35
+ 0.00015598389817458503,
36
+ 0.0,
37
+ 0.0,
38
+ 0.0,
39
+ 0.0
40
+ ],
41
+ [
42
+ 5.120176961106841e-07,
43
+ 0.0,
44
+ 1.0284150340556629e-05,
45
+ 0.0,
46
+ 0.00017649409443341567,
47
+ -0.0008786797258682598,
48
+ 0.00871051319329453,
49
+ -0.006108510773329939,
50
+ 0.012799177446456245,
51
+ 0.02540372085366101,
52
+ 0.0061202246568943935,
53
+ 0.005782073039163714,
54
+ -0.00032573388857629895,
55
+ 0.0,
56
+ 0.0,
57
+ 0.0,
58
+ 0.0
59
+ ],
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 4.6090364178986136e-05,
65
+ -0.00018625377697293322,
66
+ 0.0018402574023466017,
67
+ -0.0028694583032823347,
68
+ 0.003249729685005616,
69
+ -0.0032546505923570497,
70
+ 0.0028927525886110798,
71
+ -0.005727300326564687,
72
+ 0.0009349924265612791,
73
+ -0.00029069423934959806,
74
+ 0.0,
75
+ 0.0,
76
+ 0.0,
77
+ 0.0
78
+ ],
79
+ [
80
+ 0.0,
81
+ 0.0,
82
+ 1.6454640544890612e-06,
83
+ 0.0,
84
+ -0.00013796254756594706,
85
+ -7.113785778163368e-05,
86
+ -3.70039013071251e-05,
87
+ -9.26230222850333e-05,
88
+ -7.55348144625171e-05,
89
+ -5.890789481852012e-05,
90
+ -0.00016787611941031008,
91
+ -0.00013512182402120827,
92
+ -0.0007577573777222754,
93
+ 0.0,
94
+ 0.0,
95
+ 0.0,
96
+ 0.0
97
+ ],
98
+ [
99
+ 0.0,
100
+ 0.0,
101
+ 1.6454640544890612e-06,
102
+ 0.0,
103
+ 0.0,
104
+ 0.0,
105
+ 7.374458268637783e-06,
106
+ 0.0,
107
+ 7.374458268637783e-06,
108
+ 7.374458268637783e-06,
109
+ 0.0,
110
+ 7.374458268637783e-06,
111
+ 0.0,
112
+ 0.0,
113
+ 0.0,
114
+ 0.0,
115
+ 0.0
116
+ ],
117
+ [
118
+ 0.0,
119
+ 0.0,
120
+ 1.6454640544890612e-06,
121
+ 1.6454640544890612e-06,
122
+ 1.6454640544890612e-06,
123
+ 0.0,
124
+ 1.6454640544890612e-06,
125
+ 0.0,
126
+ 0.0,
127
+ 1.6454640544890612e-06,
128
+ 0.0,
129
+ 1.6454640544890612e-06,
130
+ 0.0,
131
+ 0.0,
132
+ 0.0,
133
+ 0.0,
134
+ 0.0
135
+ ],
136
+ [
137
+ 0.0,
138
+ 0.0,
139
+ 0.0,
140
+ 0.0,
141
+ 0.0,
142
+ 0.0,
143
+ 0.0,
144
+ 0.0,
145
+ 0.0,
146
+ 0.0,
147
+ 0.0,
148
+ 0.0,
149
+ 0.0,
150
+ 0.0,
151
+ 0.0,
152
+ 0.0,
153
+ 0.0
154
+ ],
155
+ [
156
+ 0.0,
157
+ 0.0,
158
+ 0.0,
159
+ 0.0,
160
+ 0.0,
161
+ 0.0,
162
+ 0.0,
163
+ 0.0,
164
+ 0.0,
165
+ 0.0,
166
+ 0.0,
167
+ 0.0,
168
+ 0.0,
169
+ 0.0,
170
+ 0.0,
171
+ 0.0,
172
+ 0.0
173
+ ]
174
+ ],
175
+ "z_sin": [
176
+ [
177
+ 0.0,
178
+ 0.0,
179
+ 0.0,
180
+ 0.0,
181
+ 0.0,
182
+ 0.0,
183
+ 0.0,
184
+ 0.0,
185
+ 0.0,
186
+ -0.3682437645980358,
187
+ -0.010313325093545838,
188
+ 0.0009509826733591118,
189
+ 8.731728723274532e-05,
190
+ 0.0,
191
+ 0.0,
192
+ 0.0,
193
+ 0.0
194
+ ],
195
+ [
196
+ 0.0,
197
+ 0.0,
198
+ 0.0,
199
+ 0.0,
200
+ -0.00021820594976462235,
201
+ 0.00257055829045463,
202
+ -0.0127795602890544,
203
+ -0.05705253192342194,
204
+ 0.25012256718258646,
205
+ 0.012207198333313168,
206
+ 0.0340313223723876,
207
+ 0.0003576776007283744,
208
+ -0.0002845557347781907,
209
+ 0.0,
210
+ 0.0,
211
+ 0.0,
212
+ 0.0
213
+ ],
214
+ [
215
+ 0.0,
216
+ 0.0,
217
+ 1.6454640544890612e-06,
218
+ 0.0,
219
+ 0.0006581156987357039,
220
+ 0.0001351500860080824,
221
+ 0.021441581178606544,
222
+ -0.009487259768838647,
223
+ 0.023875626799357026,
224
+ 0.018329471230646432,
225
+ 0.03202330538363405,
226
+ -0.002402806268419791,
227
+ -0.00021251611687155453,
228
+ 0.0,
229
+ 0.0,
230
+ 0.0,
231
+ 0.0
232
+ ],
233
+ [
234
+ 0.0,
235
+ 0.0,
236
+ 1.6454640544890612e-06,
237
+ 7.374458268637783e-06,
238
+ 0.00042403289373939767,
239
+ -0.0009267683930298048,
240
+ 0.002547132301658598,
241
+ 0.0018252150534381778,
242
+ 0.0025447442599817994,
243
+ -0.0006139539204201418,
244
+ 0.0040519500435168615,
245
+ -0.002119370745245054,
246
+ 0.0006644491009208615,
247
+ 0.0,
248
+ 0.0,
249
+ 0.0,
250
+ 0.0
251
+ ],
252
+ [
253
+ 0.0,
254
+ 0.0,
255
+ 0.0,
256
+ 0.0,
257
+ 0.0003314760556756545,
258
+ 0.0003278655337955934,
259
+ 0.00013156289898347628,
260
+ -2.5890394467765182e-05,
261
+ 0.0005822073549505646,
262
+ 3.0278685234777375e-05,
263
+ -0.0001386996202989576,
264
+ 0.0005453186709654603,
265
+ 0.00024046539821892854,
266
+ 0.0,
267
+ 0.0,
268
+ 0.0,
269
+ 0.0
270
+ ],
271
+ [
272
+ 0.0,
273
+ 0.0,
274
+ 0.0,
275
+ 0.0,
276
+ 0.0,
277
+ 0.0,
278
+ 0.0,
279
+ 0.0,
280
+ 0.0,
281
+ 0.0,
282
+ 0.0,
283
+ 0.0,
284
+ 0.0,
285
+ 0.0,
286
+ 0.0,
287
+ 0.0,
288
+ 0.0
289
+ ],
290
+ [
291
+ 0.0,
292
+ 0.0,
293
+ 0.0,
294
+ 0.0,
295
+ 0.0,
296
+ 0.0,
297
+ 0.0,
298
+ 0.0,
299
+ 0.0,
300
+ 0.0,
301
+ 0.0,
302
+ 0.0,
303
+ 0.0,
304
+ 0.0,
305
+ 0.0,
306
+ 0.0,
307
+ 0.0
308
+ ],
309
+ [
310
+ 0.0,
311
+ 0.0,
312
+ 0.0,
313
+ 0.0,
314
+ 0.0,
315
+ 0.0,
316
+ 0.0,
317
+ 0.0,
318
+ 0.0,
319
+ 0.0,
320
+ 0.0,
321
+ 0.0,
322
+ 0.0,
323
+ 0.0,
324
+ 0.0,
325
+ 0.0,
326
+ 0.0
327
+ ],
328
+ [
329
+ 0.0,
330
+ 0.0,
331
+ 0.0,
332
+ 0.0,
333
+ 0.0,
334
+ 0.0,
335
+ 0.0,
336
+ 0.0,
337
+ 0.0,
338
+ 0.0,
339
+ 0.0,
340
+ 0.0,
341
+ 0.0,
342
+ 0.0,
343
+ 0.0,
344
+ 0.0,
345
+ 0.0
346
+ ]
347
+ ],
348
+ "n_field_periods": 3,
349
+ "n_periodicity": 1,
350
+ "is_stellarator_symmetric": true
351
+ }
assets/p1_seeds/egodos_seed.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 0.9889786243438721,
9
+ 0.29489704966545105,
10
+ -0.029949292540550232,
11
+ 0.007402452640235424,
12
+ -0.0021754007320851088
13
+ ],
14
+ [
15
+ 3.0306517146527767e-05,
16
+ -0.017258530482649803,
17
+ 0.11244918406009674,
18
+ -0.00027375295758247375,
19
+ 0.4030463397502899,
20
+ 0.05457010865211487,
21
+ 0.0050158146768808365,
22
+ -0.009017078205943108,
23
+ 0.00023299557506106794
24
+ ],
25
+ [
26
+ -0.0035085747949779034,
27
+ -0.007740889210253954,
28
+ -0.019238369539380074,
29
+ -0.004338215570896864,
30
+ -0.01707017421722412,
31
+ -0.01595107652246952,
32
+ -0.008797697722911835,
33
+ -0.0027677465695887804,
34
+ -0.0003153726283926517
35
+ ],
36
+ [
37
+ 0.0012443774612620473,
38
+ 0.0018073361134156585,
39
+ -0.007023670244961977,
40
+ 0.000234402425121516,
41
+ 0.0017306806985288858,
42
+ 0.003982230089604855,
43
+ -0.002272964920848608,
44
+ 0.0021430065389722586,
45
+ -0.0004695240349974483
46
+ ],
47
+ [
48
+ 0.0004951803712174296,
49
+ 0.00010301961447112262,
50
+ 0.0006218982161954045,
51
+ -3.61714992322959e-05,
52
+ 0.000459781993413344,
53
+ -0.0011883215047419071,
54
+ 0.0015523011097684503,
55
+ 0.001801402191631496,
56
+ 0.0007655859808437526
57
+ ]
58
+ ],
59
+ "z_sin": [
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 0.0,
65
+ 0.0,
66
+ -0.2114829421043396,
67
+ -0.04368766397237778,
68
+ 0.011270688846707344,
69
+ -0.0033141719177365303
70
+ ],
71
+ [
72
+ -0.0012609786354005337,
73
+ -0.008882338181138039,
74
+ 0.04093347489833832,
75
+ 0.06044723838567734,
76
+ -0.40492475032806396,
77
+ -0.04713256657123566,
78
+ -0.00028245686553418636,
79
+ 0.00870777852833271,
80
+ 0.001710485783405602
81
+ ],
82
+ [
83
+ -0.0005412710597738624,
84
+ -0.009137776680290699,
85
+ -0.013082815334200859,
86
+ -0.01053211372345686,
87
+ -0.01276348065584898,
88
+ 0.017182836309075356,
89
+ -0.012362353503704071,
90
+ -0.001533929375000298,
91
+ -0.0028038574382662773
92
+ ],
93
+ [
94
+ -0.0011634031543508172,
95
+ 0.007165416143834591,
96
+ -0.014393662102520466,
97
+ 0.0011076449882239103,
98
+ -0.006598849315196276,
99
+ 0.006964890286326408,
100
+ -0.008261557668447495,
101
+ 0.0032563884742558002,
102
+ -0.0006506771314889193
103
+ ],
104
+ [
105
+ -0.0008520428673364222,
106
+ -0.00014924361312296242,
107
+ -0.001169409602880478,
108
+ 0.002478198613971472,
109
+ 0.0025256099179387093,
110
+ -0.001493512187153101,
111
+ -0.0013979775831103325,
112
+ 0.0012794585200026631,
113
+ -0.0007043574005365372
114
+ ]
115
+ ],
116
+ "r_sin": null,
117
+ "z_cos": null,
118
+ "n_field_periods": 3,
119
+ "is_stellarator_symmetric": true
120
+ }
assets/p1_seeds/egodos_sparse_rgroup_best.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 0.9889786243438721,
9
+ 0.29489704966545105,
10
+ -0.024982345699897754,
11
+ 0.007402452640235424,
12
+ -0.0021754007320851088
13
+ ],
14
+ [
15
+ 0.000030306517146527767,
16
+ -0.017258530482649803,
17
+ 0.11244918406009674,
18
+ -0.00027375295758247375,
19
+ 0.3946223671947347,
20
+ 0.06473962641387607,
21
+ 0.0050158146768808365,
22
+ -0.009017078205943108,
23
+ 0.00023299557506106794
24
+ ],
25
+ [
26
+ -0.0035085747949779034,
27
+ -0.007740889210253954,
28
+ -0.019238369539380074,
29
+ -0.004338215570896864,
30
+ -0.01707017421722412,
31
+ -0.01595107652246952,
32
+ -0.008797697722911835,
33
+ -0.0027677465695887804,
34
+ -0.0003153726283926517
35
+ ],
36
+ [
37
+ 0.0012443774612620473,
38
+ 0.0018073361134156585,
39
+ -0.007023670244961977,
40
+ 0.000234402425121516,
41
+ 0.0017306806985288858,
42
+ 0.003982230089604855,
43
+ -0.002272964920848608,
44
+ 0.0021430065389722586,
45
+ -0.0004695240349974483
46
+ ],
47
+ [
48
+ 0.0004951803712174296,
49
+ 0.00010301961447112262,
50
+ 0.0006218982161954045,
51
+ -0.0000361714992322959,
52
+ 0.000459781993413344,
53
+ -0.0011883215047419071,
54
+ 0.0015523011097684503,
55
+ 0.001801402191631496,
56
+ 0.0007655859808437526
57
+ ]
58
+ ],
59
+ "z_sin": [
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 0.0,
65
+ 0.0,
66
+ -0.2114829421043396,
67
+ -0.04368766397237778,
68
+ 0.011270688846707344,
69
+ -0.0033141719177365303
70
+ ],
71
+ [
72
+ -0.0012609786354005337,
73
+ -0.008882338181138039,
74
+ 0.04093347489833832,
75
+ 0.06044723838567734,
76
+ -0.40492475032806396,
77
+ -0.04713256657123566,
78
+ -0.00028245686553418636,
79
+ 0.00870777852833271,
80
+ 0.001710485783405602
81
+ ],
82
+ [
83
+ -0.0005412710597738624,
84
+ -0.009137776680290699,
85
+ -0.013082815334200859,
86
+ -0.01053211372345686,
87
+ -0.01276348065584898,
88
+ 0.017182836309075356,
89
+ -0.012362353503704071,
90
+ -0.001533929375000298,
91
+ -0.0028038574382662773
92
+ ],
93
+ [
94
+ -0.0011634031543508172,
95
+ 0.007165416143834591,
96
+ -0.014393662102520466,
97
+ 0.0011076449882239103,
98
+ -0.006598849315196276,
99
+ 0.006964890286326408,
100
+ -0.008261557668447495,
101
+ 0.0032563884742558002,
102
+ -0.0006506771314889193
103
+ ],
104
+ [
105
+ -0.0008520428673364222,
106
+ -0.00014924361312296242,
107
+ -0.001169409602880478,
108
+ 0.002478198613971472,
109
+ 0.0025256099179387093,
110
+ -0.001493512187153101,
111
+ -0.0013979775831103325,
112
+ 0.0012794585200026631,
113
+ -0.0007043574005365372
114
+ ]
115
+ ],
116
+ "r_sin": null,
117
+ "z_cos": null,
118
+ "n_field_periods": 3,
119
+ "is_stellarator_symmetric": true
120
+ }
assets/p1_seeds/manifest.json ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bundle": "p1_seed_transfer_2026-03-08",
3
+ "source_repo": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs",
4
+ "target_repo": "/Users/suhjungdae/code/fusion-design-lab",
5
+ "selection_principle": "small high-value P1 seed pack for reward design and policy initialization",
6
+ "entries": [
7
+ {
8
+ "file": "creative_best.json",
9
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p04_best_r_cos_2_0_down.json",
10
+ "family": "creative",
11
+ "role": "best_endpoint",
12
+ "origin": "CreativeEngineer micro-perturbation winner family",
13
+ "score": 0.9701411603598098,
14
+ "feasibility": 0.009487821019544596,
15
+ "objective": 1.268729556761712
16
+ },
17
+ {
18
+ "file": "creative_seed.json",
19
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p02_best_submission_seed.json",
20
+ "family": "creative",
21
+ "role": "exploitation_anchor",
22
+ "origin": "CreativeEngineer leaderboard seed",
23
+ "score": 0.9701409584443864,
24
+ "feasibility": 0.00949088322352376,
25
+ "objective": 1.2687313740005226
26
+ },
27
+ {
28
+ "file": "scadena_seed.json",
29
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_official_high_fidelity_inputs/p01_scadena_seed.json",
30
+ "family": "scadena",
31
+ "role": "repair_anchor",
32
+ "origin": "scadena-pf leaderboard seed",
33
+ "score": 0.9694573991433482,
34
+ "feasibility": 0.0001722869358491049,
35
+ "objective": 1.2748834077098663
36
+ },
37
+ {
38
+ "file": "scadena_repaired_best.json",
39
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_hf_repaired_scadena_cluster_20260308/s03_DAMXY_t099.json",
40
+ "family": "scadena",
41
+ "role": "repaired_diverse_feasible",
42
+ "origin": "raw HF near-P1 seed repaired along scadena corridor",
43
+ "score": 0.9696318182039995,
44
+ "feasibility": 0.008515139661142479,
45
+ "objective": 1.2733136361640056,
46
+ "blend_t": 0.99
47
+ },
48
+ {
49
+ "file": "samet_seed.json",
50
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_seed_pack_20260308/02_samet_exact_feasible_distinct_SametKokoslocke_2026-02-20T16-31-36.961612.json",
51
+ "family": "samet",
52
+ "role": "diverse_feasible_anchor",
53
+ "origin": "SametKokoslocke exact-feasible distinct family",
54
+ "score": 0.7797358473578075,
55
+ "feasibility": 0.0,
56
+ "objective": 2.982377373779732
57
+ },
58
+ {
59
+ "file": "egodos_seed.json",
60
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_seed_pack_20260308/03_egodos_near_feasible_repair_target_egodos_2026-02-15T19-23-28.679506.json",
61
+ "family": "egodos",
62
+ "role": "near_feasible_target",
63
+ "origin": "best non-scadena near-feasible repair source",
64
+ "score": 0.0,
65
+ "feasibility": 0.012140230868772806,
66
+ "objective": 2.1191483320378808
67
+ },
68
+ {
69
+ "file": "egodos_sparse_rgroup_best.json",
70
+ "source_path": "/Users/suhjungdae/code/software/proxima_fusion/ai-sci-feasible-designs/artifacts/p1_non_scadena_frontier_20260308/egodos_rgroup_to_samet_t0p07475.json",
71
+ "family": "egodos_samet_bridge",
72
+ "role": "non_scadena_best_new_frontier",
73
+ "origin": "small sparse low-order r_cos move from egodos toward Samet",
74
+ "score": 0.8646431004091908,
75
+ "feasibility": 0.009999602734502733,
76
+ "objective": 2.2182120963172833,
77
+ "operator": "sparse_r_group_to_samet",
78
+ "threshold_t": 0.07475
79
+ }
80
+ ]
81
+ }
assets/p1_seeds/samet_seed.json ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 1.0,
8
+ 0.2998406753373,
9
+ 0.03649815683272709,
10
+ -0.0023934004574476322
11
+ ],
12
+ [
13
+ 0.004923194200362755,
14
+ 0.0019538314697309803,
15
+ -0.030393426440835376,
16
+ 0.2903510547261398,
17
+ 0.19061716901012418,
18
+ 0.010663078794311078,
19
+ 0.0004915361883997499
20
+ ],
21
+ [
22
+ 0.004461085403616445,
23
+ 0.0018480423209113024,
24
+ 0.008322853001526121,
25
+ -0.0016888734032345434,
26
+ 0.029738966870065234,
27
+ -0.017367616857085766,
28
+ 0.005096201721111707
29
+ ],
30
+ [
31
+ 0.003660255000690111,
32
+ -0.0008161620102724097,
33
+ 0.002659988210077753,
34
+ -0.0011090243735975415,
35
+ 0.0014976808264200786,
36
+ 0.0006788200984889833,
37
+ 0.003049742178214439
38
+ ]
39
+ ],
40
+ "z_sin": [
41
+ [
42
+ 0.0,
43
+ 0.0,
44
+ 0.0,
45
+ 0.0,
46
+ -0.18809278786125713,
47
+ -0.004276023017828923,
48
+ 0.004322589614437422
49
+ ],
50
+ [
51
+ 0.0034290834645214264,
52
+ -0.0009718600955131964,
53
+ -0.04584371642254841,
54
+ 0.33895425254520006,
55
+ -0.11542379428754948,
56
+ -0.006266355167467825,
57
+ 0.004540925438226553
58
+ ],
59
+ [
60
+ 0.002276104811351958,
61
+ 0.007696920953052154,
62
+ -0.00560829420301698,
63
+ 0.01756845689719225,
64
+ 0.019251214823981313,
65
+ 0.02607207531935209,
66
+ 0.0012839605524015184
67
+ ],
68
+ [
69
+ 0.0005920371380011517,
70
+ 0.003256903574999701,
71
+ 0.0007021997737304861,
72
+ 0.0034139505822126832,
73
+ 0.0017613753357548154,
74
+ -0.0013703743967947743,
75
+ -0.0017751147642294824
76
+ ]
77
+ ],
78
+ "r_sin": null,
79
+ "z_cos": null,
80
+ "n_field_periods": 3,
81
+ "is_stellarator_symmetric": true
82
+ }
assets/p1_seeds/scadena_repaired_best.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 0.9995558823937495,
9
+ 0.3145330963488296,
10
+ -0.004852928784117629,
11
+ -0.00047853952359897824,
12
+ 0.00014028553286944676
13
+ ],
14
+ [
15
+ 0.000021060845809522364,
16
+ -0.002388888945772252,
17
+ -0.0010920699762301598,
18
+ -0.05996056988781852,
19
+ 0.23806626206705117,
20
+ 0.07010624141600151,
21
+ -0.033894859150088635,
22
+ -0.0012720520309685694,
23
+ -0.000050072995382135404
24
+ ],
25
+ [
26
+ -0.00002976790108589305,
27
+ -0.0010127528261545883,
28
+ 0.008465486468268894,
29
+ -0.006310050622106053,
30
+ 0.0128452270223199,
31
+ 0.02530181645810451,
32
+ 0.006091523234460731,
33
+ 0.00565240857271131,
34
+ -0.00032247654969053596
35
+ ],
36
+ [
37
+ -0.0003888882937781785,
38
+ 0.0018218548283231357,
39
+ -0.002840763720249511,
40
+ 0.0032172323881555598,
41
+ -0.0033687428407382773,
42
+ 0.002863825062724969,
43
+ -0.005816666077603838,
44
+ 0.0009256425022956663,
45
+ -0.0002877872969561021
46
+ ],
47
+ [
48
+ -0.00013658292209028756,
49
+ -0.00007042647920381733,
50
+ -0.00006935339102604978,
51
+ -0.0001244163207941789,
52
+ -0.00007477946631789192,
53
+ -0.00005831881587033492,
54
+ -0.00016619735821620698,
55
+ -0.00016649013451299212,
56
+ -0.0007501798039450527
57
+ ]
58
+ ],
59
+ "z_sin": [
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 0.0,
65
+ 0.0,
66
+ -0.36736678272076023,
67
+ -0.009828810099230176,
68
+ 0.0004982425415133734,
69
+ 0.00008644411436041787
70
+ ],
71
+ [
72
+ -0.00021602389026697613,
73
+ 0.002847751999147193,
74
+ -0.01208084403849006,
75
+ -0.05624644212913916,
76
+ 0.24528675315533016,
77
+ 0.01182189736817721,
78
+ 0.0336478912876403,
79
+ 0.0005027013352311548,
80
+ -0.00028171017743040865
81
+ ],
82
+ [
83
+ 0.0006188150130163509,
84
+ 0.000015920855717569947,
85
+ 0.020426156577576828,
86
+ -0.009570454955266509,
87
+ 0.023644042506073312,
88
+ 0.017426508469963166,
89
+ 0.031709641217604306,
90
+ -0.0023390750650408433,
91
+ -0.00024311048443483493
92
+ ],
93
+ [
94
+ 0.0003870730360700077,
95
+ -0.0009175007090995068,
96
+ 0.002521660978642012,
97
+ 0.001806962902903796,
98
+ 0.0025192968173819814,
99
+ -0.0006078143812159401,
100
+ 0.004011430543081693,
101
+ -0.0020981770377926034,
102
+ 0.0006578046099116529
103
+ ],
104
+ [
105
+ 0.00032816129511889795,
106
+ 0.00032458687845763744,
107
+ 0.00013024726999364151,
108
+ -0.00002563149052308753,
109
+ 0.0005763852814010589,
110
+ 0.0000299758983824296,
111
+ -0.00013731262409596802,
112
+ 0.0005398654842558058,
113
+ 0.00023806074423673925
114
+ ]
115
+ ],
116
+ "r_sin": null,
117
+ "z_cos": null,
118
+ "n_field_periods": 3,
119
+ "is_stellarator_symmetric": true
120
+ }
assets/p1_seeds/scadena_seed.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r_cos": [
3
+ [
4
+ 0.0,
5
+ 0.0,
6
+ 0.0,
7
+ 0.0,
8
+ 1.0,
9
+ 0.3178024006853376,
10
+ -0.00494453968429039,
11
+ -0.0010158379922691344,
12
+ 0.00014170255845398662
13
+ ],
14
+ [
15
+ 2.1273581625780166e-05,
16
+ -0.002504825752146429,
17
+ -0.0019791928302356574,
18
+ -0.05986009847084577,
19
+ 0.2378930884212573,
20
+ 0.07041809177925817,
21
+ -0.03405649158367229,
22
+ -0.0012552908877077342,
23
+ -5.0578783214278185e-05
24
+ ],
25
+ [
26
+ -3.0068586955447527e-05,
27
+ -0.0008786797258682598,
28
+ 0.00871051319329453,
29
+ -0.006108510773329939,
30
+ 0.012799177446456245,
31
+ 0.02540372085366101,
32
+ 0.006120224656894394,
33
+ 0.005782073039163714,
34
+ -0.00032573388857629895
35
+ ],
36
+ [
37
+ -0.0003928164583617964,
38
+ 0.0018402574023466017,
39
+ -0.0028694583032823347,
40
+ 0.003249729685005616,
41
+ -0.00340277054620028,
42
+ 0.0028927525886110798,
43
+ -0.0058754202804079175,
44
+ 0.0009349924265612791,
45
+ -0.00029069423934959806
46
+ ],
47
+ [
48
+ -0.00013796254756594703,
49
+ -7.113785778163367e-05,
50
+ -7.005393032934321e-05,
51
+ -0.00012567305130725142,
52
+ -7.55348144625171e-05,
53
+ -5.890789481852012e-05,
54
+ -0.00016787611941031008,
55
+ -0.0001681718530434264,
56
+ -0.0007577573777222754
57
+ ]
58
+ ],
59
+ "z_sin": [
60
+ [
61
+ 0.0,
62
+ 0.0,
63
+ 0.0,
64
+ 0.0,
65
+ 0.0,
66
+ -0.3682437645980358,
67
+ -0.010313325093545838,
68
+ 0.0008028627195158811,
69
+ 8.731728723274532e-05
70
+ ],
71
+ [
72
+ -0.00021820594976462235,
73
+ 0.00257055829045463,
74
+ -0.0127795602890544,
75
+ -0.05705253192342194,
76
+ 0.25012256718258646,
77
+ 0.012207198333313168,
78
+ 0.0340313223723876,
79
+ 0.0003576776007283744,
80
+ -0.0002845557347781906
81
+ ],
82
+ [
83
+ 0.0006250656697134858,
84
+ 0.0001351500860080824,
85
+ 0.02077775360036836,
86
+ -0.009487259768838647,
87
+ 0.023875626799357026,
88
+ 0.017665643652408247,
89
+ 0.03202330538363405,
90
+ -0.002402806268419791,
91
+ -0.00024556614589377266
92
+ ],
93
+ [
94
+ 0.0003909828647171795,
95
+ -0.0009267683930298048,
96
+ 0.002547132301658598,
97
+ 0.0018252150534381778,
98
+ 0.0025447442599817994,
99
+ -0.0006139539204201416,
100
+ 0.0040519500435168615,
101
+ -0.002119370745245054,
102
+ 0.0006644491009208615
103
+ ],
104
+ [
105
+ 0.0003314760556756545,
106
+ 0.0003278655337955934,
107
+ 0.00013156289898347628,
108
+ -2.5890394467765182e-05,
109
+ 0.0005822073549505646,
110
+ 3.0278685234777375e-05,
111
+ -0.0001386996202989576,
112
+ 0.0005453186709654603,
113
+ 0.00024046539821892854
114
+ ]
115
+ ],
116
+ "r_sin": null,
117
+ "z_cos": null,
118
+ "n_field_periods": 3,
119
+ "is_stellarator_symmetric": true
120
+ }
baselines/README.md CHANGED
@@ -33,7 +33,7 @@ This keeps the baseline on the real verifier path instead of relying on the olde
33
  - heuristic mean reward: `+5.2825`
34
  - random mean final `P1` score: `0.000000`
35
  - heuristic mean final `P1` score: `0.291951`
36
- - feasible high-fidelity finals: `0/5` random vs `5/5` heuristic
37
  - heuristic wins: `5/5`
38
 
39
  The first baseline milestone is:
 
33
  - heuristic mean reward: `+5.2825`
34
  - random mean final `P1` score: `0.000000`
35
  - heuristic mean final `P1` score: `0.291951`
36
+ - feasible submitted finals: `0/5` random vs `5/5` heuristic
37
  - heuristic wins: `5/5`
38
 
39
  The first baseline milestone is:
baselines/compare.py CHANGED
@@ -21,13 +21,13 @@ def main(n_episodes: int = 20) -> None:
21
 
22
  for i in range(n_episodes):
23
  rr, rt = random_episode(env, seed=i)
24
- _require_submit_fidelity(rt[-1], baseline_name="random")
25
  random_rewards.append(rr)
26
  random_final_scores.append(rt[-1]["score"])
27
  random_feasible.append(1 if rt[-1]["constraints_satisfied"] else 0)
28
 
29
  hr, ht = heuristic_episode(env, seed=i)
30
- _require_submit_fidelity(ht[-1], baseline_name="heuristic")
31
  heuristic_rewards.append(hr)
32
  heuristic_final_scores.append(ht[-1]["score"])
33
  heuristic_feasible.append(1 if ht[-1]["constraints_satisfied"] else 0)
@@ -51,12 +51,14 @@ def main(n_episodes: int = 20) -> None:
51
  print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
52
 
53
 
54
- def _require_submit_fidelity(final_step: dict[str, object], *, baseline_name: str) -> None:
55
- fidelity = final_step["evaluation_fidelity"]
56
- if fidelity != "high":
57
  raise ValueError(
58
- f"{baseline_name} baseline ended on {fidelity!r} instead of high-fidelity submit."
59
  )
 
 
60
 
61
 
62
  if __name__ == "__main__":
 
21
 
22
  for i in range(n_episodes):
23
  rr, rt = random_episode(env, seed=i)
24
+ _require_successful_submit(rt[-1], baseline_name="random")
25
  random_rewards.append(rr)
26
  random_final_scores.append(rt[-1]["score"])
27
  random_feasible.append(1 if rt[-1]["constraints_satisfied"] else 0)
28
 
29
  hr, ht = heuristic_episode(env, seed=i)
30
+ _require_successful_submit(ht[-1], baseline_name="heuristic")
31
  heuristic_rewards.append(hr)
32
  heuristic_final_scores.append(ht[-1]["score"])
33
  heuristic_feasible.append(1 if ht[-1]["constraints_satisfied"] else 0)
 
51
  print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
52
 
53
 
54
+ def _require_successful_submit(final_step: dict[str, object], *, baseline_name: str) -> None:
55
+ action = final_step.get("action")
56
+ if action != "submit":
57
  raise ValueError(
58
+ f"{baseline_name} baseline ended on {action!r} instead of an explicit submit."
59
  )
60
+ if bool(final_step.get("evaluation_failed")):
61
+ raise ValueError(f"{baseline_name} baseline submit ended in evaluation failure.")
62
 
63
 
64
  if __name__ == "__main__":
baselines/fixture_high_fidelity_pairs.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "timestamp_utc": "2026-03-08T12:05:24.982605+00:00",
3
  "n_field_periods": 3,
4
  "fixture_count": 3,
5
  "pass_count": 3,
 
1
  {
2
+ "timestamp_utc": "2026-03-08T15:21:04.110925+00:00",
3
  "n_field_periods": 3,
4
  "fixture_count": 3,
5
  "pass_count": 3,
baselines/heuristic_agent.py CHANGED
@@ -50,7 +50,7 @@ def heuristic_episode(
50
  "average_triangularity": obs.average_triangularity,
51
  "edge_iota_over_nfp": obs.edge_iota_over_nfp,
52
  "reward": obs.reward,
53
- "failure": obs.evaluation_failed,
54
  }
55
  )
56
 
 
50
  "average_triangularity": obs.average_triangularity,
51
  "edge_iota_over_nfp": obs.edge_iota_over_nfp,
52
  "reward": obs.reward,
53
+ "evaluation_failed": obs.evaluation_failed,
54
  }
55
  )
56
 
baselines/high_fidelity_validation.py CHANGED
@@ -368,7 +368,7 @@ def _run_submit_trace(
368
  ) -> dict[str, Any]:
369
  env = StellaratorEnvironment()
370
  obs = env.reset(seed=seed)
371
- initial_state = env.state
372
  actions = _parse_submit_sequence(action_sequence)
373
 
374
  trace: list[dict[str, Any]] = [
@@ -387,7 +387,7 @@ def _run_submit_trace(
387
  "budget_remaining": obs.budget_remaining,
388
  "evaluation_fidelity": obs.evaluation_fidelity,
389
  "done": obs.done,
390
- "params": initial_state.current_params.model_dump(),
391
  }
392
  ]
393
 
@@ -442,8 +442,6 @@ def _run_submit_trace(
442
  "steps": trace,
443
  "final_best_low_fidelity_score": obs.best_low_fidelity_score,
444
  "final_best_low_fidelity_feasibility": obs.best_low_fidelity_feasibility,
445
- "final_best_high_fidelity_score": obs.best_high_fidelity_score,
446
- "final_best_high_fidelity_feasibility": obs.best_high_fidelity_feasibility,
447
  "final_diagnostics_text": obs.diagnostics_text,
448
  }
449
  _write_json(payload, trace_output)
 
368
  ) -> dict[str, Any]:
369
  env = StellaratorEnvironment()
370
  obs = env.reset(seed=seed)
371
+ reset_params = env.state.current_params.model_dump()
372
  actions = _parse_submit_sequence(action_sequence)
373
 
374
  trace: list[dict[str, Any]] = [
 
387
  "budget_remaining": obs.budget_remaining,
388
  "evaluation_fidelity": obs.evaluation_fidelity,
389
  "done": obs.done,
390
+ "params": reset_params,
391
  }
392
  ]
393
 
 
442
  "steps": trace,
443
  "final_best_low_fidelity_score": obs.best_low_fidelity_score,
444
  "final_best_low_fidelity_feasibility": obs.best_low_fidelity_feasibility,
 
 
445
  "final_diagnostics_text": obs.diagnostics_text,
446
  }
447
  _write_json(payload, trace_output)
baselines/replay_playtest.py CHANGED
@@ -180,7 +180,7 @@ EPISODE_5 = (
180
  _run("rotational_transform", "increase", "medium"), # rt 1.5β†’1.6 (setup)
181
  _run("triangularity_scale", "increase", "medium"), # tri 0.55β†’0.60 β†’ cross feasibility
182
  _run("elongation", "decrease", "small"), # feasible-side objective move
183
- _submit(), # explicit high-fidelity submit from feasible state
184
  ],
185
  )
186
 
 
180
  _run("rotational_transform", "increase", "medium"), # rt 1.5β†’1.6 (setup)
181
  _run("triangularity_scale", "increase", "medium"), # tri 0.55β†’0.60 β†’ cross feasibility
182
  _run("elongation", "decrease", "small"), # feasible-side objective move
183
+ _submit(), # explicit terminal submit from feasible state
184
  ],
185
  )
186
 
baselines/submit_side_trace.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "trace_label": "submit_side_manual",
3
  "trace_profile": "run:rotational_transform:increase:medium,run:triangularity_scale:increase:medium,run:elongation:decrease:small,submit",
4
- "timestamp_utc": "2026-03-08T07:07:43.478814+00:00",
5
  "n_field_periods": 3,
6
  "seed": 0,
7
- "total_reward": 5.3296,
8
- "final_score": 0.29605869964467535,
9
  "final_feasibility": 0.0008652388718514148,
10
  "final_constraints_satisfied": true,
11
- "final_evaluation_fidelity": "high",
12
  "final_evaluation_failed": false,
13
  "steps": [
14
  {
@@ -37,7 +37,7 @@
37
  "step": 1,
38
  "intent": "run",
39
  "action": "rotational_transform increase medium",
40
- "reward": -0.1,
41
  "score": 0.0,
42
  "feasibility": 0.05065283822502309,
43
  "constraints_satisfied": false,
@@ -53,7 +53,7 @@
53
  "step": 2,
54
  "intent": "run",
55
  "action": "triangularity_scale increase medium",
56
- "reward": 3.1533,
57
  "score": 0.29165951078326,
58
  "feasibility": 0.0,
59
  "constraints_satisfied": true,
@@ -69,7 +69,7 @@
69
  "step": 3,
70
  "intent": "run",
71
  "action": "elongation decrease small",
72
- "reward": 0.2665,
73
  "score": 0.2957311862720885,
74
  "feasibility": 0.0008652388718514148,
75
  "constraints_satisfied": true,
@@ -85,22 +85,20 @@
85
  "step": 4,
86
  "intent": "submit",
87
  "action": "submit",
88
- "reward": 2.0098,
89
- "score": 0.29605869964467535,
90
  "feasibility": 0.0008652388718514148,
91
  "constraints_satisfied": true,
92
  "feasibility_delta": 0.0,
93
- "score_delta": 0.00032751337258685176,
94
- "max_elongation": 7.335471703197922,
95
  "p1_feasibility": 0.0008652388718514148,
96
- "budget_remaining": 3,
97
- "evaluation_fidelity": "high",
98
  "done": true
99
  }
100
  ],
101
  "final_best_low_fidelity_score": 0.2957311862720885,
102
  "final_best_low_fidelity_feasibility": 0.0008652388718514148,
103
- "final_best_high_fidelity_score": 0.29605869964467535,
104
- "final_best_high_fidelity_feasibility": 0.0008652388718514148,
105
- "final_diagnostics_text": "Submitted current_high_fidelity_score=0.296059, best_high_fidelity_score=0.296059, best_high_fidelity_feasibility=0.000865, constraints=SATISFIED.\n\nevaluation_fidelity=high\nevaluation_status=OK\nmax_elongation=7.3355\naspect_ratio=3.2897 (<= 4.0)\naverage_triangularity=-0.4996 (<= -0.5)\nedge_iota_over_nfp=0.3030 (>= 0.3)\nfeasibility=0.000865\nbest_low_fidelity_score=0.295731\nbest_low_fidelity_feasibility=0.000865\nbest_high_fidelity_score=0.296059\nbest_high_fidelity_feasibility=0.000865\nvacuum_well=-0.8079\nconstraints=SATISFIED\nstep=4 | budget=3/6"
106
  }
 
1
  {
2
  "trace_label": "submit_side_manual",
3
  "trace_profile": "run:rotational_transform:increase:medium,run:triangularity_scale:increase:medium,run:elongation:decrease:small,submit",
4
+ "timestamp_utc": "2026-03-08T15:15:56.795168+00:00",
5
  "n_field_periods": 3,
6
  "seed": 0,
7
+ "total_reward": 6.1653,
8
+ "final_score": 0.2957311862720885,
9
  "final_feasibility": 0.0008652388718514148,
10
  "final_constraints_satisfied": true,
11
+ "final_evaluation_fidelity": "low",
12
  "final_evaluation_failed": false,
13
  "steps": [
14
  {
 
37
  "step": 1,
38
  "intent": "run",
39
  "action": "rotational_transform increase medium",
40
+ "reward": -0.0688,
41
  "score": 0.0,
42
  "feasibility": 0.05065283822502309,
43
  "constraints_satisfied": false,
 
53
  "step": 2,
54
  "intent": "run",
55
  "action": "triangularity_scale increase medium",
56
+ "reward": 4.1026,
57
  "score": 0.29165951078326,
58
  "feasibility": 0.0,
59
  "constraints_satisfied": true,
 
69
  "step": 3,
70
  "intent": "run",
71
  "action": "elongation decrease small",
72
+ "reward": 0.3195,
73
  "score": 0.2957311862720885,
74
  "feasibility": 0.0008652388718514148,
75
  "constraints_satisfied": true,
 
85
  "step": 4,
86
  "intent": "submit",
87
  "action": "submit",
88
+ "reward": 1.812,
89
+ "score": 0.2957311862720885,
90
  "feasibility": 0.0008652388718514148,
91
  "constraints_satisfied": true,
92
  "feasibility_delta": 0.0,
93
+ "score_delta": 0.0,
94
+ "max_elongation": 7.338419323551204,
95
  "p1_feasibility": 0.0008652388718514148,
96
+ "budget_remaining": 2,
97
+ "evaluation_fidelity": "low",
98
  "done": true
99
  }
100
  ],
101
  "final_best_low_fidelity_score": 0.2957311862720885,
102
  "final_best_low_fidelity_feasibility": 0.0008652388718514148,
103
+ "final_diagnostics_text": "Submitted current_score=0.295731, best_score=0.295731, best_feasibility=0.000865, constraints=SATISFIED.\n\nevaluation_fidelity=low\nevaluation_status=OK\nmax_elongation=7.3384\naspect_ratio=3.2897 (<= 4.0)\naverage_triangularity=-0.4996 (<= -0.5)\nedge_iota_over_nfp=0.3030 (abs(.) >= 0.3)\nfeasibility=0.000865\naspect_ratio_violation=0.000000\ntriangularity_violation=0.000865\niota_violation=0.000000\ndominant_constraint=average_triangularity\nbest_low_fidelity_score=0.295731\nbest_low_fidelity_feasibility=0.000865\nno_progress_steps=0\nvacuum_well=-0.8067\nconstraints=SATISFIED\nstep=4 | budget=2/6\nreward_total=+1.8120\nreward_terms=terminal_improvement_bonus=+1.4787, terminal_budget_bonus=+0.3333\naction_clamped=False\naction_no_op=False\naction_repeat_state=False\nepisode_total_reward=+6.1653"
 
 
104
  }
baselines/sweep_results/measured_sweep_20260308T045600Z.json DELETED
@@ -1,1308 +0,0 @@
1
- {
2
- "analysis": {
3
- "total": 81,
4
- "evaluated": 63,
5
- "crashed": 18,
6
- "feasible": 0,
7
- "crash_rate": 0.2222222222222222,
8
- "feasibility_rate": 0.0
9
- },
10
- "results": [
11
- {
12
- "aspect_ratio": 3.2,
13
- "elongation": 1.2,
14
- "rotational_transform": 1.2,
15
- "triangularity_scale": 0.4,
16
- "crashed": false,
17
- "failure_reason": "",
18
- "feasible": false,
19
- "p1_feasibility": 0.2992442990666656,
20
- "p1_score": 0.0,
21
- "max_elongation": 3.16130576121601,
22
- "aspect_ratio_out": 2.9974947988779532,
23
- "average_triangularity": -0.3772870380738901,
24
- "edge_iota_over_nfp": 0.2102267102800003,
25
- "vacuum_well": -0.7692596885396071
26
- },
27
- {
28
- "aspect_ratio": 3.2,
29
- "elongation": 1.2,
30
- "rotational_transform": 1.2,
31
- "triangularity_scale": 0.55,
32
- "crashed": false,
33
- "failure_reason": "",
34
- "feasible": false,
35
- "p1_feasibility": 0.4090894053777488,
36
- "p1_score": 0.0,
37
- "max_elongation": 4.152094100549045,
38
- "aspect_ratio_out": 2.921555348457181,
39
- "average_triangularity": -0.47193262838245775,
40
- "edge_iota_over_nfp": 0.17727317838667536,
41
- "vacuum_well": -0.9367956816738985
42
- },
43
- {
44
- "aspect_ratio": 3.2,
45
- "elongation": 1.2,
46
- "rotational_transform": 1.2,
47
- "triangularity_scale": 0.7,
48
- "crashed": false,
49
- "failure_reason": "",
50
- "feasible": false,
51
- "p1_feasibility": 0.5042302640839985,
52
- "p1_score": 0.0,
53
- "max_elongation": 5.535803883650764,
54
- "aspect_ratio_out": 2.8456158980364137,
55
- "average_triangularity": -0.5427662288172865,
56
- "edge_iota_over_nfp": 0.14873092077480043,
57
- "vacuum_well": -1.1406175925392996
58
- },
59
- {
60
- "aspect_ratio": 3.2,
61
- "elongation": 1.2,
62
- "rotational_transform": 1.55,
63
- "triangularity_scale": 0.4,
64
- "crashed": false,
65
- "failure_reason": "",
66
- "feasible": false,
67
- "p1_feasibility": 0.24542592385222362,
68
- "p1_score": 0.0,
69
- "max_elongation": 4.121228842521552,
70
- "aspect_ratio_out": 2.997494798877953,
71
- "average_triangularity": -0.3772870380738882,
72
- "edge_iota_over_nfp": 0.3164284175757522,
73
- "vacuum_well": -0.8497809691028027
74
- },
75
- {
76
- "aspect_ratio": 3.2,
77
- "elongation": 1.2,
78
- "rotational_transform": 1.55,
79
- "triangularity_scale": 0.55,
80
- "crashed": false,
81
- "failure_reason": "",
82
- "feasible": false,
83
- "p1_feasibility": 0.12086983493481807,
84
- "p1_score": 0.0,
85
- "max_elongation": 5.454537868016638,
86
- "aspect_ratio_out": 2.921555348457181,
87
- "average_triangularity": -0.4719326283824573,
88
- "edge_iota_over_nfp": 0.26373904951955457,
89
- "vacuum_well": -1.022090883293349
90
- },
91
- {
92
- "aspect_ratio": 3.2,
93
- "elongation": 1.2,
94
- "rotational_transform": 1.55,
95
- "triangularity_scale": 0.7,
96
- "crashed": false,
97
- "failure_reason": "",
98
- "feasible": false,
99
- "p1_feasibility": 0.28449084880096587,
100
- "p1_score": 0.0,
101
- "max_elongation": 7.3486347080873395,
102
- "aspect_ratio_out": 2.845615898036414,
103
- "average_triangularity": -0.5427662288172859,
104
- "edge_iota_over_nfp": 0.21465274535971024,
105
- "vacuum_well": -1.2227198660107412
106
- },
107
- {
108
- "aspect_ratio": 3.2,
109
- "elongation": 1.2,
110
- "rotational_transform": 1.9,
111
- "triangularity_scale": 0.4,
112
- "crashed": false,
113
- "failure_reason": "",
114
- "feasible": false,
115
- "p1_feasibility": 0.24542592385222062,
116
- "p1_score": 0.0,
117
- "max_elongation": 5.506917831126787,
118
- "aspect_ratio_out": 2.9974947988779492,
119
- "average_triangularity": -0.3772870380738897,
120
- "edge_iota_over_nfp": 0.4111851229739778,
121
- "vacuum_well": -0.8487582268420396
122
- },
123
- {
124
- "aspect_ratio": 3.2,
125
- "elongation": 1.2,
126
- "rotational_transform": 1.9,
127
- "triangularity_scale": 0.55,
128
- "crashed": true,
129
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
130
- "feasible": false,
131
- "p1_feasibility": 1000000.0,
132
- "p1_score": 0.0,
133
- "max_elongation": 10.0,
134
- "aspect_ratio_out": 0.0,
135
- "average_triangularity": 0.0,
136
- "edge_iota_over_nfp": 0.0,
137
- "vacuum_well": 0.0
138
- },
139
- {
140
- "aspect_ratio": 3.2,
141
- "elongation": 1.2,
142
- "rotational_transform": 1.9,
143
- "triangularity_scale": 0.7,
144
- "crashed": true,
145
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
146
- "feasible": false,
147
- "p1_feasibility": 1000000.0,
148
- "p1_score": 0.0,
149
- "max_elongation": 10.0,
150
- "aspect_ratio_out": 0.0,
151
- "average_triangularity": 0.0,
152
- "edge_iota_over_nfp": 0.0,
153
- "vacuum_well": 0.0
154
- },
155
- {
156
- "aspect_ratio": 3.2,
157
- "elongation": 1.5,
158
- "rotational_transform": 1.2,
159
- "triangularity_scale": 0.4,
160
- "crashed": false,
161
- "failure_reason": "",
162
- "feasible": false,
163
- "p1_feasibility": 0.3756331947507288,
164
- "p1_score": 0.0,
165
- "max_elongation": 3.415282869637771,
166
- "aspect_ratio_out": 2.9873706820500727,
167
- "average_triangularity": -0.38276544931263445,
168
- "edge_iota_over_nfp": 0.18731004157478134,
169
- "vacuum_well": -0.7354702188161674
170
- },
171
- {
172
- "aspect_ratio": 3.2,
173
- "elongation": 1.5,
174
- "rotational_transform": 1.2,
175
- "triangularity_scale": 0.55,
176
- "crashed": false,
177
- "failure_reason": "",
178
- "feasible": false,
179
- "p1_feasibility": 0.4698235036344714,
180
- "p1_score": 0.0,
181
- "max_elongation": 4.397190759490295,
182
- "aspect_ratio_out": 2.907634687818849,
183
- "average_triangularity": -0.47640349211453187,
184
- "edge_iota_over_nfp": 0.15905294890965857,
185
- "vacuum_well": -0.8791543482410266
186
- },
187
- {
188
- "aspect_ratio": 3.2,
189
- "elongation": 1.5,
190
- "rotational_transform": 1.2,
191
- "triangularity_scale": 0.7,
192
- "crashed": false,
193
- "failure_reason": "",
194
- "feasible": false,
195
- "p1_feasibility": 0.5428078225003115,
196
- "p1_score": 0.0,
197
- "max_elongation": 5.745675404632478,
198
- "aspect_ratio_out": 2.827898693587622,
199
- "average_triangularity": -0.5463927121513807,
200
- "edge_iota_over_nfp": 0.13715765324990656,
201
- "vacuum_well": -1.0487526494457182
202
- },
203
- {
204
- "aspect_ratio": 3.2,
205
- "elongation": 1.5,
206
- "rotational_transform": 1.55,
207
- "triangularity_scale": 0.4,
208
- "crashed": false,
209
- "failure_reason": "",
210
- "feasible": false,
211
- "p1_feasibility": 0.23446910137473032,
212
- "p1_score": 0.0,
213
- "max_elongation": 4.409780538549243,
214
- "aspect_ratio_out": 2.9873706820500683,
215
- "average_triangularity": -0.38276544931263484,
216
- "edge_iota_over_nfp": 0.2846591548163929,
217
- "vacuum_well": -0.7976954392526402
218
- },
219
- {
220
- "aspect_ratio": 3.2,
221
- "elongation": 1.5,
222
- "rotational_transform": 1.55,
223
- "triangularity_scale": 0.55,
224
- "crashed": false,
225
- "failure_reason": "",
226
- "feasible": false,
227
- "p1_feasibility": 0.19785195828765914,
228
- "p1_score": 0.0,
229
- "max_elongation": 5.717017037692497,
230
- "aspect_ratio_out": 2.907634687818846,
231
- "average_triangularity": -0.47640349211453215,
232
- "edge_iota_over_nfp": 0.24064441251370225,
233
- "vacuum_well": -0.9370717167513004
234
- },
235
- {
236
- "aspect_ratio": 3.2,
237
- "elongation": 1.5,
238
- "rotational_transform": 1.55,
239
- "triangularity_scale": 0.7,
240
- "crashed": false,
241
- "failure_reason": "",
242
- "feasible": false,
243
- "p1_feasibility": 0.31745507935684175,
244
- "p1_score": 0.0,
245
- "max_elongation": 7.524424461976794,
246
- "aspect_ratio_out": 2.827898693587622,
247
- "average_triangularity": -0.5463927121513805,
248
- "edge_iota_over_nfp": 0.20476347619294746,
249
- "vacuum_well": -1.0979141176158662
250
- },
251
- {
252
- "aspect_ratio": 3.2,
253
- "elongation": 1.5,
254
- "rotational_transform": 1.9,
255
- "triangularity_scale": 0.4,
256
- "crashed": false,
257
- "failure_reason": "",
258
- "feasible": false,
259
- "p1_feasibility": 0.2344691013747291,
260
- "p1_score": 0.0,
261
- "max_elongation": 5.828745051952865,
262
- "aspect_ratio_out": 2.98737068205007,
263
- "average_triangularity": -0.38276544931263545,
264
- "edge_iota_over_nfp": 0.37697102463283866,
265
- "vacuum_well": -0.7940349587136619
266
- },
267
- {
268
- "aspect_ratio": 3.2,
269
- "elongation": 1.5,
270
- "rotational_transform": 1.9,
271
- "triangularity_scale": 0.55,
272
- "crashed": true,
273
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
274
- "feasible": false,
275
- "p1_feasibility": 1000000.0,
276
- "p1_score": 0.0,
277
- "max_elongation": 10.0,
278
- "aspect_ratio_out": 0.0,
279
- "average_triangularity": 0.0,
280
- "edge_iota_over_nfp": 0.0,
281
- "vacuum_well": 0.0
282
- },
283
- {
284
- "aspect_ratio": 3.2,
285
- "elongation": 1.5,
286
- "rotational_transform": 1.9,
287
- "triangularity_scale": 0.7,
288
- "crashed": true,
289
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
290
- "feasible": false,
291
- "p1_feasibility": 1000000.0,
292
- "p1_score": 0.0,
293
- "max_elongation": 10.0,
294
- "aspect_ratio_out": 0.0,
295
- "average_triangularity": 0.0,
296
- "edge_iota_over_nfp": 0.0,
297
- "vacuum_well": 0.0
298
- },
299
- {
300
- "aspect_ratio": 3.2,
301
- "elongation": 1.8,
302
- "rotational_transform": 1.2,
303
- "triangularity_scale": 0.4,
304
- "crashed": false,
305
- "failure_reason": "",
306
- "feasible": false,
307
- "p1_feasibility": 0.41572876893682525,
308
- "p1_score": 0.0,
309
- "max_elongation": 3.643871627883724,
310
- "aspect_ratio_out": 2.9727492396200192,
311
- "average_triangularity": -0.3900787346843175,
312
- "edge_iota_over_nfp": 0.1752813693189524,
313
- "vacuum_well": -0.7062026867261749
314
- },
315
- {
316
- "aspect_ratio": 3.2,
317
- "elongation": 1.8,
318
- "rotational_transform": 1.2,
319
- "triangularity_scale": 0.55,
320
- "crashed": false,
321
- "failure_reason": "",
322
- "feasible": false,
323
- "p1_feasibility": 0.511263141576597,
324
- "p1_score": 0.0,
325
- "max_elongation": 4.6308076495645185,
326
- "aspect_ratio_out": 2.8875302044775286,
327
- "average_triangularity": -0.4823961090018443,
328
- "edge_iota_over_nfp": 0.1466210575270209,
329
- "vacuum_well": -0.8495335820014714
330
- },
331
- {
332
- "aspect_ratio": 3.2,
333
- "elongation": 1.8,
334
- "rotational_transform": 1.2,
335
- "triangularity_scale": 0.7,
336
- "crashed": false,
337
- "failure_reason": "",
338
- "feasible": false,
339
- "p1_feasibility": 0.575134593927855,
340
- "p1_score": 0.0,
341
- "max_elongation": 5.983792904658967,
342
- "aspect_ratio_out": 2.802311169335037,
343
- "average_triangularity": -0.5512332332730122,
344
- "edge_iota_over_nfp": 0.12745962182164347,
345
- "vacuum_well": -1.0099648537211192
346
- },
347
- {
348
- "aspect_ratio": 3.2,
349
- "elongation": 1.8,
350
- "rotational_transform": 1.55,
351
- "triangularity_scale": 0.4,
352
- "crashed": false,
353
- "failure_reason": "",
354
- "feasible": false,
355
- "p1_feasibility": 0.2198425306313636,
356
- "p1_score": 0.0,
357
- "max_elongation": 4.658846779436034,
358
- "aspect_ratio_out": 2.9727492396200206,
359
- "average_triangularity": -0.3900787346843182,
360
- "edge_iota_over_nfp": 0.260614756973633,
361
- "vacuum_well": -0.7619073642576293
362
- },
363
- {
364
- "aspect_ratio": 3.2,
365
- "elongation": 1.8,
366
- "rotational_transform": 1.55,
367
- "triangularity_scale": 0.55,
368
- "crashed": false,
369
- "failure_reason": "",
370
- "feasible": false,
371
- "p1_feasibility": 0.2548347634174995,
372
- "p1_score": 0.0,
373
- "max_elongation": 5.96315599581871,
374
- "aspect_ratio_out": 2.8875302044775286,
375
- "average_triangularity": -0.48239610900184327,
376
- "edge_iota_over_nfp": 0.22354957097475014,
377
- "vacuum_well": -0.8874237224309117
378
- },
379
- {
380
- "aspect_ratio": 3.2,
381
- "elongation": 1.8,
382
- "rotational_transform": 1.55,
383
- "triangularity_scale": 0.7,
384
- "crashed": false,
385
- "failure_reason": "",
386
- "feasible": false,
387
- "p1_feasibility": 0.32546519894675746,
388
- "p1_score": 0.0,
389
- "max_elongation": 7.752053893932858,
390
- "aspect_ratio_out": 2.8023111693350367,
391
- "average_triangularity": -0.5512332332730115,
392
- "edge_iota_over_nfp": 0.20236044031597275,
393
- "vacuum_well": -1.025387025277758
394
- },
395
- {
396
- "aspect_ratio": 3.2,
397
- "elongation": 1.8,
398
- "rotational_transform": 1.9,
399
- "triangularity_scale": 0.4,
400
- "crashed": false,
401
- "failure_reason": "",
402
- "feasible": false,
403
- "p1_feasibility": 0.21984253063136272,
404
- "p1_score": 0.0,
405
- "max_elongation": 6.09849446743415,
406
- "aspect_ratio_out": 2.9727492396200192,
407
- "average_triangularity": -0.39007873468431864,
408
- "edge_iota_over_nfp": 0.34816514339419125,
409
- "vacuum_well": -0.7642647530236962
410
- },
411
- {
412
- "aspect_ratio": 3.2,
413
- "elongation": 1.8,
414
- "rotational_transform": 1.9,
415
- "triangularity_scale": 0.55,
416
- "crashed": true,
417
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
418
- "feasible": false,
419
- "p1_feasibility": 1000000.0,
420
- "p1_score": 0.0,
421
- "max_elongation": 10.0,
422
- "aspect_ratio_out": 0.0,
423
- "average_triangularity": 0.0,
424
- "edge_iota_over_nfp": 0.0,
425
- "vacuum_well": 0.0
426
- },
427
- {
428
- "aspect_ratio": 3.2,
429
- "elongation": 1.8,
430
- "rotational_transform": 1.9,
431
- "triangularity_scale": 0.7,
432
- "crashed": true,
433
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
434
- "feasible": false,
435
- "p1_feasibility": 1000000.0,
436
- "p1_score": 0.0,
437
- "max_elongation": 10.0,
438
- "aspect_ratio_out": 0.0,
439
- "average_triangularity": 0.0,
440
- "edge_iota_over_nfp": 0.0,
441
- "vacuum_well": 0.0
442
- },
443
- {
444
- "aspect_ratio": 3.5,
445
- "elongation": 1.2,
446
- "rotational_transform": 1.2,
447
- "triangularity_scale": 0.4,
448
- "crashed": false,
449
- "failure_reason": "",
450
- "feasible": false,
451
- "p1_feasibility": 0.2454259238522214,
452
- "p1_score": 0.0,
453
- "max_elongation": 3.3941794666037,
454
- "aspect_ratio_out": 3.297494798877951,
455
- "average_triangularity": -0.3772870380738893,
456
- "edge_iota_over_nfp": 0.25020340071179015,
457
- "vacuum_well": -0.664703585980881
458
- },
459
- {
460
- "aspect_ratio": 3.5,
461
- "elongation": 1.2,
462
- "rotational_transform": 1.2,
463
- "triangularity_scale": 0.55,
464
- "crashed": false,
465
- "failure_reason": "",
466
- "feasible": false,
467
- "p1_feasibility": 0.29748701762857843,
468
- "p1_score": 0.0,
469
- "max_elongation": 4.453965370200011,
470
- "aspect_ratio_out": 3.221555348457185,
471
- "average_triangularity": -0.47193262838245753,
472
- "edge_iota_over_nfp": 0.21075389471142647,
473
- "vacuum_well": -0.7999402027558562
474
- },
475
- {
476
- "aspect_ratio": 3.5,
477
- "elongation": 1.2,
478
- "rotational_transform": 1.2,
479
- "triangularity_scale": 0.7,
480
- "crashed": false,
481
- "failure_reason": "",
482
- "feasible": false,
483
- "p1_feasibility": 0.41176221731566515,
484
- "p1_score": 0.0,
485
- "max_elongation": 5.935449908790214,
486
- "aspect_ratio_out": 3.1456158980364113,
487
- "average_triangularity": -0.5427662288172873,
488
- "edge_iota_over_nfp": 0.17647133480530044,
489
- "vacuum_well": -0.9605409707647019
490
- },
491
- {
492
- "aspect_ratio": 3.5,
493
- "elongation": 1.2,
494
- "rotational_transform": 1.55,
495
- "triangularity_scale": 0.4,
496
- "crashed": false,
497
- "failure_reason": "",
498
- "feasible": false,
499
- "p1_feasibility": 0.2454259238522163,
500
- "p1_score": 0.0,
501
- "max_elongation": 4.576580280082074,
502
- "aspect_ratio_out": 3.2974947988779504,
503
- "average_triangularity": -0.37728703807389186,
504
- "edge_iota_over_nfp": 0.36414107926306327,
505
- "vacuum_well": -0.7128435653443117
506
- },
507
- {
508
- "aspect_ratio": 3.5,
509
- "elongation": 1.2,
510
- "rotational_transform": 1.55,
511
- "triangularity_scale": 0.55,
512
- "crashed": false,
513
- "failure_reason": "",
514
- "feasible": false,
515
- "p1_feasibility": 0.05613474323508394,
516
- "p1_score": 0.0,
517
- "max_elongation": 6.047857576381886,
518
- "aspect_ratio_out": 3.2215553484571795,
519
- "average_triangularity": -0.47193262838245803,
520
- "edge_iota_over_nfp": 0.3054435838875495,
521
- "vacuum_well": -0.8521401674735174
522
- },
523
- {
524
- "aspect_ratio": 3.5,
525
- "elongation": 1.2,
526
- "rotational_transform": 1.55,
527
- "triangularity_scale": 0.7,
528
- "crashed": false,
529
- "failure_reason": "",
530
- "feasible": false,
531
- "p1_feasibility": 0.1656236239017676,
532
- "p1_score": 0.0,
533
- "max_elongation": 8.134786960902295,
534
- "aspect_ratio_out": 3.1456158980364153,
535
- "average_triangularity": -0.5427662288172861,
536
- "edge_iota_over_nfp": 0.2503129128294697,
537
- "vacuum_well": -1.0097109038318997
538
- },
539
- {
540
- "aspect_ratio": 3.5,
541
- "elongation": 1.2,
542
- "rotational_transform": 1.9,
543
- "triangularity_scale": 0.4,
544
- "crashed": false,
545
- "failure_reason": "",
546
- "feasible": false,
547
- "p1_feasibility": 0.2454259238522143,
548
- "p1_score": 0.0,
549
- "max_elongation": 6.330394186717037,
550
- "aspect_ratio_out": 3.297494798877949,
551
- "average_triangularity": -0.37728703807389286,
552
- "edge_iota_over_nfp": 0.46201189807837234,
553
- "vacuum_well": -0.6837429974221736
554
- },
555
- {
556
- "aspect_ratio": 3.5,
557
- "elongation": 1.2,
558
- "rotational_transform": 1.9,
559
- "triangularity_scale": 0.55,
560
- "crashed": true,
561
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
562
- "feasible": false,
563
- "p1_feasibility": 1000000.0,
564
- "p1_score": 0.0,
565
- "max_elongation": 10.0,
566
- "aspect_ratio_out": 0.0,
567
- "average_triangularity": 0.0,
568
- "edge_iota_over_nfp": 0.0,
569
- "vacuum_well": 0.0
570
- },
571
- {
572
- "aspect_ratio": 3.5,
573
- "elongation": 1.2,
574
- "rotational_transform": 1.9,
575
- "triangularity_scale": 0.7,
576
- "crashed": true,
577
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
578
- "feasible": false,
579
- "p1_feasibility": 1000000.0,
580
- "p1_score": 0.0,
581
- "max_elongation": 10.0,
582
- "aspect_ratio_out": 0.0,
583
- "average_triangularity": 0.0,
584
- "edge_iota_over_nfp": 0.0,
585
- "vacuum_well": 0.0
586
- },
587
- {
588
- "aspect_ratio": 3.5,
589
- "elongation": 1.5,
590
- "rotational_transform": 1.2,
591
- "triangularity_scale": 0.4,
592
- "crashed": false,
593
- "failure_reason": "",
594
- "feasible": false,
595
- "p1_feasibility": 0.2552162867390878,
596
- "p1_score": 0.0,
597
- "max_elongation": 3.645500992042136,
598
- "aspect_ratio_out": 3.287370682050072,
599
- "average_triangularity": -0.3827654493126362,
600
- "edge_iota_over_nfp": 0.22343511397827365,
601
- "vacuum_well": -0.6309996936433193
602
- },
603
- {
604
- "aspect_ratio": 3.5,
605
- "elongation": 1.5,
606
- "rotational_transform": 1.2,
607
- "triangularity_scale": 0.55,
608
- "crashed": false,
609
- "failure_reason": "",
610
- "feasible": false,
611
- "p1_feasibility": 0.3652585910461875,
612
- "p1_score": 0.0,
613
- "max_elongation": 4.683962785437232,
614
- "aspect_ratio_out": 3.2076346878188455,
615
- "average_triangularity": -0.47640349211453326,
616
- "edge_iota_over_nfp": 0.19042242268614373,
617
- "vacuum_well": -0.7435543314082094
618
- },
619
- {
620
- "aspect_ratio": 3.5,
621
- "elongation": 1.5,
622
- "rotational_transform": 1.2,
623
- "triangularity_scale": 0.7,
624
- "crashed": false,
625
- "failure_reason": "",
626
- "feasible": false,
627
- "p1_feasibility": 0.45145912595142856,
628
- "p1_score": 0.0,
629
- "max_elongation": 6.1087362987510785,
630
- "aspect_ratio_out": 3.1278986935876247,
631
- "average_triangularity": -0.5463927121513825,
632
- "edge_iota_over_nfp": 0.16456226221457143,
633
- "vacuum_well": -0.8743755327281046
634
- },
635
- {
636
- "aspect_ratio": 3.5,
637
- "elongation": 1.5,
638
- "rotational_transform": 1.55,
639
- "triangularity_scale": 0.4,
640
- "crashed": false,
641
- "failure_reason": "",
642
- "feasible": false,
643
- "p1_feasibility": 0.23446910137472687,
644
- "p1_score": 0.0,
645
- "max_elongation": 4.869439306707605,
646
- "aspect_ratio_out": 3.287370682050072,
647
- "average_triangularity": -0.38276544931263656,
648
- "edge_iota_over_nfp": 0.3311741122924938,
649
- "vacuum_well": -0.6687744150602308
650
- },
651
- {
652
- "aspect_ratio": 3.5,
653
- "elongation": 1.5,
654
- "rotational_transform": 1.55,
655
- "triangularity_scale": 0.55,
656
- "crashed": false,
657
- "failure_reason": "",
658
- "feasible": false,
659
- "p1_feasibility": 0.05509465022700962,
660
- "p1_score": 0.0,
661
- "max_elongation": 6.294387064409056,
662
- "aspect_ratio_out": 3.2076346878188478,
663
- "average_triangularity": -0.47640349211453337,
664
- "edge_iota_over_nfp": 0.2834716049318971,
665
- "vacuum_well": -0.7817459617243165
666
- },
667
- {
668
- "aspect_ratio": 3.5,
669
- "elongation": 1.5,
670
- "rotational_transform": 1.55,
671
- "triangularity_scale": 0.7,
672
- "crashed": false,
673
- "failure_reason": "",
674
- "feasible": false,
675
- "p1_feasibility": 0.18795354892189217,
676
- "p1_score": 0.0,
677
- "max_elongation": 8.259847119894143,
678
- "aspect_ratio_out": 3.127898693587626,
679
- "average_triangularity": -0.5463927121513834,
680
- "edge_iota_over_nfp": 0.24361393532343234,
681
- "vacuum_well": -0.9101645505111563
682
- },
683
- {
684
- "aspect_ratio": 3.5,
685
- "elongation": 1.5,
686
- "rotational_transform": 1.9,
687
- "triangularity_scale": 0.4,
688
- "crashed": false,
689
- "failure_reason": "",
690
- "feasible": false,
691
- "p1_feasibility": 0.23446910137472798,
692
- "p1_score": 0.0,
693
- "max_elongation": 6.666881773484368,
694
- "aspect_ratio_out": 3.2873706820500743,
695
- "average_triangularity": -0.382765449312636,
696
- "edge_iota_over_nfp": 0.4272117184895036,
697
- "vacuum_well": -0.6343193410902149
698
- },
699
- {
700
- "aspect_ratio": 3.5,
701
- "elongation": 1.5,
702
- "rotational_transform": 1.9,
703
- "triangularity_scale": 0.55,
704
- "crashed": true,
705
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
706
- "feasible": false,
707
- "p1_feasibility": 1000000.0,
708
- "p1_score": 0.0,
709
- "max_elongation": 10.0,
710
- "aspect_ratio_out": 0.0,
711
- "average_triangularity": 0.0,
712
- "edge_iota_over_nfp": 0.0,
713
- "vacuum_well": 0.0
714
- },
715
- {
716
- "aspect_ratio": 3.5,
717
- "elongation": 1.5,
718
- "rotational_transform": 1.9,
719
- "triangularity_scale": 0.7,
720
- "crashed": true,
721
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
722
- "feasible": false,
723
- "p1_feasibility": 1000000.0,
724
- "p1_score": 0.0,
725
- "max_elongation": 10.0,
726
- "aspect_ratio_out": 0.0,
727
- "average_triangularity": 0.0,
728
- "edge_iota_over_nfp": 0.0,
729
- "vacuum_well": 0.0
730
- },
731
- {
732
- "aspect_ratio": 3.5,
733
- "elongation": 1.8,
734
- "rotational_transform": 1.2,
735
- "triangularity_scale": 0.4,
736
- "crashed": false,
737
- "failure_reason": "",
738
- "feasible": false,
739
- "p1_feasibility": 0.31238226470825553,
740
- "p1_score": 0.0,
741
- "max_elongation": 3.8692530187501073,
742
- "aspect_ratio_out": 3.272749239620019,
743
- "average_triangularity": -0.390078734684318,
744
- "edge_iota_over_nfp": 0.20628532058752333,
745
- "vacuum_well": -0.6043041164364112
746
- },
747
- {
748
- "aspect_ratio": 3.5,
749
- "elongation": 1.8,
750
- "rotational_transform": 1.2,
751
- "triangularity_scale": 0.55,
752
- "crashed": false,
753
- "failure_reason": "",
754
- "feasible": false,
755
- "p1_feasibility": 0.4126167583963611,
756
- "p1_score": 0.0,
757
- "max_elongation": 4.903513329259712,
758
- "aspect_ratio_out": 3.1875302044775253,
759
- "average_triangularity": -0.482396109001844,
760
- "edge_iota_over_nfp": 0.17621497248109166,
761
- "vacuum_well": -0.7113384768069663
762
- },
763
- {
764
- "aspect_ratio": 3.5,
765
- "elongation": 1.8,
766
- "rotational_transform": 1.2,
767
- "triangularity_scale": 0.7,
768
- "crashed": false,
769
- "failure_reason": "",
770
- "feasible": false,
771
- "p1_feasibility": 0.4733786617465044,
772
- "p1_score": 0.0,
773
- "max_elongation": 6.318251319708043,
774
- "aspect_ratio_out": 3.1023111693350365,
775
- "average_triangularity": -0.551233233273011,
776
- "edge_iota_over_nfp": 0.15798640147604867,
777
- "vacuum_well": -0.8271583951674183
778
- },
779
- {
780
- "aspect_ratio": 3.5,
781
- "elongation": 1.8,
782
- "rotational_transform": 1.55,
783
- "triangularity_scale": 0.4,
784
- "crashed": false,
785
- "failure_reason": "",
786
- "feasible": false,
787
- "p1_feasibility": 0.21984253063136583,
788
- "p1_score": 0.0,
789
- "max_elongation": 5.118240837608963,
790
- "aspect_ratio_out": 3.272749239620021,
791
- "average_triangularity": -0.3900787346843171,
792
- "edge_iota_over_nfp": 0.30467031557444413,
793
- "vacuum_well": -0.6428347405255894
794
- },
795
- {
796
- "aspect_ratio": 3.5,
797
- "elongation": 1.8,
798
- "rotational_transform": 1.55,
799
- "triangularity_scale": 0.55,
800
- "crashed": false,
801
- "failure_reason": "",
802
- "feasible": false,
803
- "p1_feasibility": 0.1135333246675809,
804
- "p1_score": 0.0,
805
- "max_elongation": 6.52496520982626,
806
- "aspect_ratio_out": 3.1875302044775293,
807
- "average_triangularity": -0.4823961090018447,
808
- "edge_iota_over_nfp": 0.2659400025997257,
809
- "vacuum_well": -0.7422184942989336
810
- },
811
- {
812
- "aspect_ratio": 3.5,
813
- "elongation": 1.8,
814
- "rotational_transform": 1.55,
815
- "triangularity_scale": 0.7,
816
- "crashed": false,
817
- "failure_reason": "",
818
- "feasible": false,
819
- "p1_feasibility": 0.19066188850017865,
820
- "p1_score": 0.0,
821
- "max_elongation": 8.453464204422819,
822
- "aspect_ratio_out": 3.102311169335036,
823
- "average_triangularity": -0.5512332332730117,
824
- "edge_iota_over_nfp": 0.2428014334499464,
825
- "vacuum_well": -0.8527496798204878
826
- },
827
- {
828
- "aspect_ratio": 3.5,
829
- "elongation": 1.8,
830
- "rotational_transform": 1.9,
831
- "triangularity_scale": 0.4,
832
- "crashed": false,
833
- "failure_reason": "",
834
- "feasible": false,
835
- "p1_feasibility": 0.21984253063136494,
836
- "p1_score": 0.0,
837
- "max_elongation": 6.9463560702338905,
838
- "aspect_ratio_out": 3.272749239620025,
839
- "average_triangularity": -0.39007873468431753,
840
- "edge_iota_over_nfp": 0.3976618109725794,
841
- "vacuum_well": -0.6148108774395119
842
- },
843
- {
844
- "aspect_ratio": 3.5,
845
- "elongation": 1.8,
846
- "rotational_transform": 1.9,
847
- "triangularity_scale": 0.55,
848
- "crashed": true,
849
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
850
- "feasible": false,
851
- "p1_feasibility": 1000000.0,
852
- "p1_score": 0.0,
853
- "max_elongation": 10.0,
854
- "aspect_ratio_out": 0.0,
855
- "average_triangularity": 0.0,
856
- "edge_iota_over_nfp": 0.0,
857
- "vacuum_well": 0.0
858
- },
859
- {
860
- "aspect_ratio": 3.5,
861
- "elongation": 1.8,
862
- "rotational_transform": 1.9,
863
- "triangularity_scale": 0.7,
864
- "crashed": true,
865
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
866
- "feasible": false,
867
- "p1_feasibility": 1000000.0,
868
- "p1_score": 0.0,
869
- "max_elongation": 10.0,
870
- "aspect_ratio_out": 0.0,
871
- "average_triangularity": 0.0,
872
- "edge_iota_over_nfp": 0.0,
873
- "vacuum_well": 0.0
874
- },
875
- {
876
- "aspect_ratio": 3.8,
877
- "elongation": 1.2,
878
- "rotational_transform": 1.2,
879
- "triangularity_scale": 0.4,
880
- "crashed": false,
881
- "failure_reason": "",
882
- "feasible": false,
883
- "p1_feasibility": 0.24542592385222006,
884
- "p1_score": 0.0,
885
- "max_elongation": 3.655092009945489,
886
- "aspect_ratio_out": 3.597494798877952,
887
- "average_triangularity": -0.37728703807388997,
888
- "edge_iota_over_nfp": 0.2893199762541339,
889
- "vacuum_well": -0.5782263807621896
890
- },
891
- {
892
- "aspect_ratio": 3.8,
893
- "elongation": 1.2,
894
- "rotational_transform": 1.2,
895
- "triangularity_scale": 0.55,
896
- "crashed": false,
897
- "failure_reason": "",
898
- "feasible": false,
899
- "p1_feasibility": 0.18656815973467256,
900
- "p1_score": 0.0,
901
- "max_elongation": 4.791820749012815,
902
- "aspect_ratio_out": 3.5215553484571793,
903
- "average_triangularity": -0.47193262838245903,
904
- "edge_iota_over_nfp": 0.24402955207959823,
905
- "vacuum_well": -0.6901925400158998
906
- },
907
- {
908
- "aspect_ratio": 3.8,
909
- "elongation": 1.2,
910
- "rotational_transform": 1.2,
911
- "triangularity_scale": 0.7,
912
- "crashed": false,
913
- "failure_reason": "",
914
- "feasible": false,
915
- "p1_feasibility": 0.31951499196875227,
916
- "p1_score": 0.0,
917
- "max_elongation": 6.382367050051349,
918
- "aspect_ratio_out": 3.4456158980364155,
919
- "average_triangularity": -0.5427662288172861,
920
- "edge_iota_over_nfp": 0.20414550240937432,
921
- "vacuum_well": -0.8206695782389528
922
- },
923
- {
924
- "aspect_ratio": 3.8,
925
- "elongation": 1.2,
926
- "rotational_transform": 1.55,
927
- "triangularity_scale": 0.4,
928
- "crashed": false,
929
- "failure_reason": "",
930
- "feasible": false,
931
- "p1_feasibility": 0.24542592385221906,
932
- "p1_score": 0.0,
933
- "max_elongation": 5.094101658298537,
934
- "aspect_ratio_out": 3.5974947988779453,
935
- "average_triangularity": -0.37728703807389047,
936
- "edge_iota_over_nfp": 0.40945734796781447,
937
- "vacuum_well": -0.5981804595302792
938
- },
939
- {
940
- "aspect_ratio": 3.8,
941
- "elongation": 1.2,
942
- "rotational_transform": 1.55,
943
- "triangularity_scale": 0.55,
944
- "crashed": false,
945
- "failure_reason": "",
946
- "feasible": false,
947
- "p1_feasibility": 0.05613474323508627,
948
- "p1_score": 0.0,
949
- "max_elongation": 6.721805873813851,
950
- "aspect_ratio_out": 3.5215553484571824,
951
- "average_triangularity": -0.47193262838245686,
952
- "edge_iota_over_nfp": 0.34411106640849015,
953
- "vacuum_well": -0.7119763601489589
954
- },
955
- {
956
- "aspect_ratio": 3.8,
957
- "elongation": 1.2,
958
- "rotational_transform": 1.55,
959
- "triangularity_scale": 0.7,
960
- "crashed": false,
961
- "failure_reason": "",
962
- "feasible": false,
963
- "p1_feasibility": 0.055638380310600866,
964
- "p1_score": 0.0,
965
- "max_elongation": 9.030344748230837,
966
- "aspect_ratio_out": 3.445615898036414,
967
- "average_triangularity": -0.5427662288172874,
968
- "edge_iota_over_nfp": 0.28330848590681973,
969
- "vacuum_well": -0.8388909971694075
970
- },
971
- {
972
- "aspect_ratio": 3.8,
973
- "elongation": 1.2,
974
- "rotational_transform": 1.9,
975
- "triangularity_scale": 0.4,
976
- "crashed": false,
977
- "failure_reason": "",
978
- "feasible": false,
979
- "p1_feasibility": 0.24542592385222373,
980
- "p1_score": 0.0,
981
- "max_elongation": 7.330392814766231,
982
- "aspect_ratio_out": 3.5974947988779458,
983
- "average_triangularity": -0.37728703807388814,
984
- "edge_iota_over_nfp": 0.5044776421055188,
985
- "vacuum_well": -0.5662857208360687
986
- },
987
- {
988
- "aspect_ratio": 3.8,
989
- "elongation": 1.2,
990
- "rotational_transform": 1.9,
991
- "triangularity_scale": 0.55,
992
- "crashed": true,
993
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
994
- "feasible": false,
995
- "p1_feasibility": 1000000.0,
996
- "p1_score": 0.0,
997
- "max_elongation": 10.0,
998
- "aspect_ratio_out": 0.0,
999
- "average_triangularity": 0.0,
1000
- "edge_iota_over_nfp": 0.0,
1001
- "vacuum_well": 0.0
1002
- },
1003
- {
1004
- "aspect_ratio": 3.8,
1005
- "elongation": 1.2,
1006
- "rotational_transform": 1.9,
1007
- "triangularity_scale": 0.7,
1008
- "crashed": true,
1009
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
1010
- "feasible": false,
1011
- "p1_feasibility": 1000000.0,
1012
- "p1_score": 0.0,
1013
- "max_elongation": 10.0,
1014
- "aspect_ratio_out": 0.0,
1015
- "average_triangularity": 0.0,
1016
- "edge_iota_over_nfp": 0.0,
1017
- "vacuum_well": 0.0
1018
- },
1019
- {
1020
- "aspect_ratio": 3.8,
1021
- "elongation": 1.5,
1022
- "rotational_transform": 1.2,
1023
- "triangularity_scale": 0.4,
1024
- "crashed": false,
1025
- "failure_reason": "",
1026
- "feasible": false,
1027
- "p1_feasibility": 0.23446910137472843,
1028
- "p1_score": 0.0,
1029
- "max_elongation": 3.9082135297904603,
1030
- "aspect_ratio_out": 3.5873706820500715,
1031
- "average_triangularity": -0.3827654493126358,
1032
- "edge_iota_over_nfp": 0.2601408639197091,
1033
- "vacuum_well": -0.546989318570786
1034
- },
1035
- {
1036
- "aspect_ratio": 3.8,
1037
- "elongation": 1.5,
1038
- "rotational_transform": 1.2,
1039
- "triangularity_scale": 0.55,
1040
- "crashed": false,
1041
- "failure_reason": "",
1042
- "feasible": false,
1043
- "p1_feasibility": 0.25770258074325547,
1044
- "p1_score": 0.0,
1045
- "max_elongation": 5.011193704084697,
1046
- "aspect_ratio_out": 3.507634687818846,
1047
- "average_triangularity": -0.47640349211453364,
1048
- "edge_iota_over_nfp": 0.22268922577702335,
1049
- "vacuum_well": -0.6378508849467094
1050
- },
1051
- {
1052
- "aspect_ratio": 3.8,
1053
- "elongation": 1.5,
1054
- "rotational_transform": 1.2,
1055
- "triangularity_scale": 0.7,
1056
- "crashed": false,
1057
- "failure_reason": "",
1058
- "feasible": false,
1059
- "p1_feasibility": 0.3563685202429026,
1060
- "p1_score": 0.0,
1061
- "max_elongation": 6.523908903139856,
1062
- "aspect_ratio_out": 3.427898693587626,
1063
- "average_triangularity": -0.5463927121513822,
1064
- "edge_iota_over_nfp": 0.1930894439271292,
1065
- "vacuum_well": -0.7422835451739921
1066
- },
1067
- {
1068
- "aspect_ratio": 3.8,
1069
- "elongation": 1.5,
1070
- "rotational_transform": 1.55,
1071
- "triangularity_scale": 0.4,
1072
- "crashed": false,
1073
- "failure_reason": "",
1074
- "feasible": false,
1075
- "p1_feasibility": 0.23446910137472743,
1076
- "p1_score": 0.0,
1077
- "max_elongation": 5.4000662498579395,
1078
- "aspect_ratio_out": 3.58737068205007,
1079
- "average_triangularity": -0.3827654493126363,
1080
- "edge_iota_over_nfp": 0.37576442477917044,
1081
- "vacuum_well": -0.5593363795034076
1082
- },
1083
- {
1084
- "aspect_ratio": 3.8,
1085
- "elongation": 1.5,
1086
- "rotational_transform": 1.55,
1087
- "triangularity_scale": 0.55,
1088
- "crashed": false,
1089
- "failure_reason": "",
1090
- "feasible": false,
1091
- "p1_feasibility": 0.047193015770934044,
1092
- "p1_score": 0.0,
1093
- "max_elongation": 6.95917543501653,
1094
- "aspect_ratio_out": 3.5076346878188462,
1095
- "average_triangularity": -0.476403492114533,
1096
- "edge_iota_over_nfp": 0.32390835631237563,
1097
- "vacuum_well": -0.6531086645842118
1098
- },
1099
- {
1100
- "aspect_ratio": 3.8,
1101
- "elongation": 1.5,
1102
- "rotational_transform": 1.55,
1103
- "triangularity_scale": 0.7,
1104
- "crashed": false,
1105
- "failure_reason": "",
1106
- "feasible": false,
1107
- "p1_feasibility": 0.06608766640438413,
1108
- "p1_score": 0.0,
1109
- "max_elongation": 9.110492371135232,
1110
- "aspect_ratio_out": 3.4278986935876268,
1111
- "average_triangularity": -0.5463927121513822,
1112
- "edge_iota_over_nfp": 0.28017370007868475,
1113
- "vacuum_well": -0.7564369584107291
1114
- },
1115
- {
1116
- "aspect_ratio": 3.8,
1117
- "elongation": 1.5,
1118
- "rotational_transform": 1.9,
1119
- "triangularity_scale": 0.4,
1120
- "crashed": false,
1121
- "failure_reason": "",
1122
- "feasible": false,
1123
- "p1_feasibility": 0.23446910137472732,
1124
- "p1_score": 0.0,
1125
- "max_elongation": 7.677673329183981,
1126
- "aspect_ratio_out": 3.5873706820500697,
1127
- "average_triangularity": -0.38276544931263634,
1128
- "edge_iota_over_nfp": 0.4707294226314962,
1129
- "vacuum_well": -0.5146202191204641
1130
- },
1131
- {
1132
- "aspect_ratio": 3.8,
1133
- "elongation": 1.5,
1134
- "rotational_transform": 1.9,
1135
- "triangularity_scale": 0.55,
1136
- "crashed": true,
1137
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
1138
- "feasible": false,
1139
- "p1_feasibility": 1000000.0,
1140
- "p1_score": 0.0,
1141
- "max_elongation": 10.0,
1142
- "aspect_ratio_out": 0.0,
1143
- "average_triangularity": 0.0,
1144
- "edge_iota_over_nfp": 0.0,
1145
- "vacuum_well": 0.0
1146
- },
1147
- {
1148
- "aspect_ratio": 3.8,
1149
- "elongation": 1.5,
1150
- "rotational_transform": 1.9,
1151
- "triangularity_scale": 0.7,
1152
- "crashed": true,
1153
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
1154
- "feasible": false,
1155
- "p1_feasibility": 1000000.0,
1156
- "p1_score": 0.0,
1157
- "max_elongation": 10.0,
1158
- "aspect_ratio_out": 0.0,
1159
- "average_triangularity": 0.0,
1160
- "edge_iota_over_nfp": 0.0,
1161
- "vacuum_well": 0.0
1162
- },
1163
- {
1164
- "aspect_ratio": 3.8,
1165
- "elongation": 1.8,
1166
- "rotational_transform": 1.2,
1167
- "triangularity_scale": 0.4,
1168
- "crashed": false,
1169
- "failure_reason": "",
1170
- "feasible": false,
1171
- "p1_feasibility": 0.21984253063136272,
1172
- "p1_score": 0.0,
1173
- "max_elongation": 4.1304272049856765,
1174
- "aspect_ratio_out": 3.572749239620019,
1175
- "average_triangularity": -0.39007873468431864,
1176
- "edge_iota_over_nfp": 0.23918988633893049,
1177
- "vacuum_well": -0.5244590200054563
1178
- },
1179
- {
1180
- "aspect_ratio": 3.8,
1181
- "elongation": 1.8,
1182
- "rotational_transform": 1.2,
1183
- "triangularity_scale": 0.55,
1184
- "crashed": false,
1185
- "failure_reason": "",
1186
- "feasible": false,
1187
- "p1_feasibility": 0.30941253114225103,
1188
- "p1_score": 0.0,
1189
- "max_elongation": 5.219891723892629,
1190
- "aspect_ratio_out": 3.4875302044775336,
1191
- "average_triangularity": -0.48239610900184327,
1192
- "edge_iota_over_nfp": 0.20717624065732468,
1193
- "vacuum_well": -0.6074137302875826
1194
- },
1195
- {
1196
- "aspect_ratio": 3.8,
1197
- "elongation": 1.8,
1198
- "rotational_transform": 1.2,
1199
- "triangularity_scale": 0.7,
1200
- "crashed": false,
1201
- "failure_reason": "",
1202
- "feasible": false,
1203
- "p1_feasibility": 0.3721181780375554,
1204
- "p1_score": 0.0,
1205
- "max_elongation": 6.709958387787289,
1206
- "aspect_ratio_out": 3.4023111693350363,
1207
- "average_triangularity": -0.551233233273013,
1208
- "edge_iota_over_nfp": 0.18836454658873336,
1209
- "vacuum_well": -0.6969799270812338
1210
- },
1211
- {
1212
- "aspect_ratio": 3.8,
1213
- "elongation": 1.8,
1214
- "rotational_transform": 1.55,
1215
- "triangularity_scale": 0.4,
1216
- "crashed": false,
1217
- "failure_reason": "",
1218
- "feasible": false,
1219
- "p1_feasibility": 0.21984253063136006,
1220
- "p1_score": 0.0,
1221
- "max_elongation": 5.655882151980431,
1222
- "aspect_ratio_out": 3.5727492396200193,
1223
- "average_triangularity": -0.39007873468431997,
1224
- "edge_iota_over_nfp": 0.3476959386568659,
1225
- "vacuum_well": -0.5401610577187229
1226
- },
1227
- {
1228
- "aspect_ratio": 3.8,
1229
- "elongation": 1.8,
1230
- "rotational_transform": 1.55,
1231
- "triangularity_scale": 0.55,
1232
- "crashed": false,
1233
- "failure_reason": "",
1234
- "feasible": false,
1235
- "p1_feasibility": 0.03520778199631258,
1236
- "p1_score": 0.0,
1237
- "max_elongation": 7.180648727079949,
1238
- "aspect_ratio_out": 3.4875302044775283,
1239
- "average_triangularity": -0.4823961090018437,
1240
- "edge_iota_over_nfp": 0.30698149307999983,
1241
- "vacuum_well": -0.6211518335261255
1242
- },
1243
- {
1244
- "aspect_ratio": 3.8,
1245
- "elongation": 1.8,
1246
- "rotational_transform": 1.55,
1247
- "triangularity_scale": 0.7,
1248
- "crashed": false,
1249
- "failure_reason": "",
1250
- "feasible": false,
1251
- "p1_feasibility": 0.06277313159962161,
1252
- "p1_score": 0.0,
1253
- "max_elongation": 9.276836759242448,
1254
- "aspect_ratio_out": 3.402311169335038,
1255
- "average_triangularity": -0.5512332332730103,
1256
- "edge_iota_over_nfp": 0.2811680605201135,
1257
- "vacuum_well": -0.7118470668886681
1258
- },
1259
- {
1260
- "aspect_ratio": 3.8,
1261
- "elongation": 1.8,
1262
- "rotational_transform": 1.9,
1263
- "triangularity_scale": 0.4,
1264
- "crashed": false,
1265
- "failure_reason": "",
1266
- "feasible": false,
1267
- "p1_feasibility": 0.2198425306313675,
1268
- "p1_score": 0.0,
1269
- "max_elongation": 7.969505345435814,
1270
- "aspect_ratio_out": 3.572749239620025,
1271
- "average_triangularity": -0.39007873468431625,
1272
- "edge_iota_over_nfp": 0.44118859445417574,
1273
- "vacuum_well": -0.4933392101193987
1274
- },
1275
- {
1276
- "aspect_ratio": 3.8,
1277
- "elongation": 1.8,
1278
- "rotational_transform": 1.9,
1279
- "triangularity_scale": 0.55,
1280
- "crashed": true,
1281
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
1282
- "feasible": false,
1283
- "p1_feasibility": 1000000.0,
1284
- "p1_score": 0.0,
1285
- "max_elongation": 10.0,
1286
- "aspect_ratio_out": 0.0,
1287
- "average_triangularity": 0.0,
1288
- "edge_iota_over_nfp": 0.0,
1289
- "vacuum_well": 0.0
1290
- },
1291
- {
1292
- "aspect_ratio": 3.8,
1293
- "elongation": 1.8,
1294
- "rotational_transform": 1.9,
1295
- "triangularity_scale": 0.7,
1296
- "crashed": true,
1297
- "failure_reason": "Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.Thread 0:\n\tFATAL ERROR in thread=0. The solver failed during the first iterations. This may happen if the initial boundary is poorly shaped or if it isn't spectrally condensed enough.",
1298
- "feasible": false,
1299
- "p1_feasibility": 1000000.0,
1300
- "p1_score": 0.0,
1301
- "max_elongation": 10.0,
1302
- "aspect_ratio_out": 0.0,
1303
- "average_triangularity": 0.0,
1304
- "edge_iota_over_nfp": 0.0,
1305
- "vacuum_well": 0.0
1306
- }
1307
- ]
1308
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -26,8 +26,8 @@ Completed:
26
  - `P1` is locked as the single benchmark task
27
  - the repaired 4-knob low-dimensional runtime is live in code
28
  - the official `constellaration` verifier path is wired
29
- - low-fidelity `run` and high-fidelity `submit` are separated clearly
30
- - terminal scoring and reporting are fidelity-consistent
31
  - explicit VMEC failure semantics are implemented
32
  - the Northflank smoke workflow is committed
33
  - the Northflank smoke test passed on the team H100
@@ -45,15 +45,15 @@ Still open:
45
  - decision on whether reset-seed pool should change from paired checks
46
  - HF Space deployment evidence
47
  - public Colab mirror or notebook submission link, if the submission surface still requires it
48
- - before/after trained-policy evidence on the current low-fidelity-only workflow
49
  - demo and README polish after the artifacts are real
50
 
51
  Current caution:
52
 
53
  - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
54
  - do not narrate low-fidelity rollout metrics as final submission truth
55
- - the standard notebook and `training/llm_rollout.py` `monitor` / `evaluate` paths now stay on low-fidelity `run` only and ignore `submit` by default
56
- - reserve VMEC-backed `submit` for replay/debug work, paired fixture checks, submit-side traces, and final evidence
57
 
58
  ## 3. Locked Decisions
59
 
@@ -113,7 +113,7 @@ Compute surfaces:
113
  - Northflank is the main compute workspace for verifier-heavy work
114
  - HF Space is the hosted environment surface
115
  - the public notebook artifact should show trained-policy behavior against the live environment and can be mirrored to Colab if the submission form still requires it
116
- - trained-policy work should still iterate on low-fidelity `run`; use high-fidelity `submit` only for sparse checkpoint evaluation and final evidence
117
 
118
  Evidence order:
119
 
@@ -135,21 +135,20 @@ The environment contract must stay narrow and legible:
135
 
136
  - one repaired low-dimensional boundary family derived from a rotating-ellipse seed
137
  - discrete `run | submit | restore_best` interaction
138
- - low-fidelity verifier for normal steps
139
- - high-fidelity verifier for `submit`
140
  - readable observation surface with explicit fidelity labeling
141
- - `Reward V1` kept verifier-native and repair-first, with official normalized violation telemetry
142
 
143
  The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
144
 
145
  ## 8. Execution Order
146
 
147
  - [x] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
148
- - [x] Pair the tracked low-fidelity fixtures with high-fidelity submit checks immediately after the PPO smoke pass.
149
  - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
150
  - [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
151
- - [ ] Save one fixed-seed untrained baseline with the low-fidelity-only `training/llm_rollout.py evaluate` workflow.
152
- - [ ] Run one short H100 GRPO pass with the repository notebook on that same low-fidelity-only workflow.
153
  - [ ] Re-run the same seeds after training and save one before/after artifact.
154
  - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
155
  - [x] Refresh the heuristic baseline using the repaired-family evidence.
@@ -203,7 +202,7 @@ Gate 9: trained-policy evidence is real
203
 
204
  - one fixed-seed untrained baseline exists
205
  - one short low-fidelity training pass exists on the same workflow
206
- - the repo can show a before/after comparison on the same seeds without relying on `submit`
207
 
208
  ## 10. Fallback Rules
209
 
@@ -211,8 +210,8 @@ If training evidence is weak:
211
 
212
  - keep claims conservative about policy quality
213
  - still ship a trained-policy demonstration and document its limitations plainly
214
- - do not skip the paired high-fidelity checks or submit-side manual trace
215
- - do not swap back to submit-included reward traces and present them as the current GRPO path
216
 
217
  If HF Space deployment is delayed:
218
 
@@ -239,7 +238,7 @@ If the repaired family is too easy:
239
  - [x] Check in tracked fixtures.
240
  - [x] Record the first manual playtest log.
241
  - [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
242
- - [x] Pair the tracked fixtures with high-fidelity submit checks.
243
  - [x] Record one submit-side manual trace.
244
  - [x] Refresh the heuristic baseline from that playtest evidence.
245
  - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
 
26
  - `P1` is locked as the single benchmark task
27
  - the repaired 4-knob low-dimensional runtime is live in code
28
  - the official `constellaration` verifier path is wired
29
+ - the live environment is now unified onto one low-fidelity reward and verifier surface
30
+ - `submit` remains an explicit terminal action on that same live contract
31
  - explicit VMEC failure semantics are implemented
32
  - the Northflank smoke workflow is committed
33
  - the Northflank smoke test passed on the team H100
 
45
  - decision on whether reset-seed pool should change from paired checks
46
  - HF Space deployment evidence
47
  - public Colab mirror or notebook submission link, if the submission surface still requires it
48
+ - before/after trained-policy evidence on the current unified low-fidelity workflow
49
  - demo and README polish after the artifacts are real
50
 
51
  Current caution:
52
 
53
  - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
54
  - do not narrate low-fidelity rollout metrics as final submission truth
55
+ - the standard notebook and `training/llm_rollout.py` paths should stay on the same live low-fidelity contract as the environment, including explicit `submit`
56
+ - reserve higher-fidelity validation for paired fixture checks, offline validation scripts, and final evidence
57
 
58
  ## 3. Locked Decisions
59
 
 
113
  - Northflank is the main compute workspace for verifier-heavy work
114
  - HF Space is the hosted environment surface
115
  - the public notebook artifact should show trained-policy behavior against the live environment and can be mirrored to Colab if the submission form still requires it
116
+ - trained-policy work should iterate on the same live low-fidelity environment contract that will be demoed publicly
117
 
118
  Evidence order:
119
 
 
135
 
136
  - one repaired low-dimensional boundary family derived from a rotating-ellipse seed
137
  - discrete `run | submit | restore_best` interaction
138
+ - one low-fidelity verifier surface for all live environment actions
 
139
  - readable observation surface with explicit fidelity labeling
140
+ - `Reward V2` keeps the verifier-native `Reward V1` core and adds small best-so-far / anti-stagnation shaping for the low-fi repair loop
141
 
142
  The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
143
 
144
  ## 8. Execution Order
145
 
146
  - [x] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
147
+ - [x] Pair the tracked low-fidelity fixtures with higher-fidelity validation checks immediately after the PPO smoke pass.
148
  - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
149
  - [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
150
+ - [ ] Save one fixed-seed untrained baseline with the unified live `training/llm_rollout.py evaluate` workflow.
151
+ - [ ] Run one short H100 GRPO pass with the repository notebook on that same unified low-fidelity workflow.
152
  - [ ] Re-run the same seeds after training and save one before/after artifact.
153
  - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
154
  - [x] Refresh the heuristic baseline using the repaired-family evidence.
 
202
 
203
  - one fixed-seed untrained baseline exists
204
  - one short low-fidelity training pass exists on the same workflow
205
+ - the repo can show a before/after comparison on the same seeds using the live environment contract, including `submit`
206
 
207
  ## 10. Fallback Rules
208
 
 
210
 
211
  - keep claims conservative about policy quality
212
  - still ship a trained-policy demonstration and document its limitations plainly
213
+ - do not skip the paired higher-fidelity validation artifacts
214
+ - do not split the notebook back onto a different submit contract than the live environment
215
 
216
  If HF Space deployment is delayed:
217
 
 
238
  - [x] Check in tracked fixtures.
239
  - [x] Record the first manual playtest log.
240
  - [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
241
+ - [x] Pair the tracked fixtures with higher-fidelity validation checks.
242
  - [x] Record one submit-side manual trace.
243
  - [x] Refresh the heuristic baseline from that playtest evidence.
244
  - [ ] Save one fixed-seed untrained baseline with `training/llm_rollout.py evaluate`.
docs/P1_ENV_CONTRACT_V1.md CHANGED
@@ -34,7 +34,8 @@ Official verifier owns:
34
  - boundary in, metrics out
35
  - official `P1` feasibility semantics
36
  - objective direction and score ordering
37
- - low-fidelity and high-fidelity evaluation modes
 
38
  - explicit failure results when VMEC or forward-model evaluation fails
39
 
40
  Environment owns:
@@ -105,10 +106,9 @@ Required fields:
105
  - `failure_reason`
106
  - `step_number`
107
  - `budget_remaining`
 
108
  - `best_low_fidelity_score`
109
  - `best_low_fidelity_feasibility`
110
- - `best_high_fidelity_score`
111
- - `best_high_fidelity_feasibility`
112
  - `target_spec`
113
  - `diagnostics_text`
114
  - `reward_breakdown`
@@ -118,14 +118,14 @@ Required fields:
118
 
119
  Interpretation rules:
120
 
121
- - low-fidelity `run` metrics must be labeled as low-fidelity
122
- - high-fidelity `submit` metrics must be labeled as high-fidelity
123
- - low-fidelity and high-fidelity best-state reporting must stay separate
124
  - the observation must be understandable without hidden state
125
  - normalized constraint-violation telemetry must follow the official `P1` constraint scales
126
  - the dominant active constraint must be visible so a human can explain repair-phase rewards
127
  - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
128
- - action telemetry must expose parameter values before and after the action, including clamped and no-op moves
 
129
 
130
  ## 6. Episode Flow
131
 
@@ -133,7 +133,7 @@ Interpretation rules:
133
  2. Evaluate the initial state with low fidelity and return the first observation.
134
  3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity.
135
  4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget.
136
- 5. On `submit`, end the episode and run the high-fidelity submit evaluation.
137
  6. End the episode on `submit` or budget exhaustion.
138
 
139
  Failure semantics:
@@ -155,8 +155,8 @@ At termination, the environment must provide:
155
 
156
  Terminal reporting rules:
157
 
158
- - keep submit-time reporting fidelity-consistent
159
- - do not compare high-fidelity submit results against low-fidelity baseline state as if they were the same truth surface
160
 
161
  ## 8. Verifier Contract
162
 
@@ -178,34 +178,47 @@ Do not treat parameterization-specific logic as verifier truth.
178
 
179
  VMEC preset mapping:
180
 
181
- - `run` steps use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence)
182
- - `submit` uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution)
183
  - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries
184
 
185
  Training and evaluation rule:
186
 
187
- - use low-fidelity `run` as the RL inner-loop surface
188
- - the standard repository notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on low-fidelity `run` only and ignore `submit` by default
189
- - keep higher-fidelity `submit` for terminal truth, explicit replay/debug work, paired fixture checks, and submit-side manual traces
190
- - do not move VMEC-backed submit evaluation into every training step unless the contract is deliberately redefined
191
 
192
- ## 9. Reward V1
193
 
194
- `Reward V1` replaces `Reward V0` because the old infeasible shaping only used `Ξ” official_feasibility`.
195
- That was too coarse once the transferred P1 findings made the main pathology clear: official
196
- feasibility is a max over normalized constraint violations, so useful repair steps on
197
- non-dominant constraints could be nearly invisible to the reward.
 
 
 
 
 
 
 
 
198
 
199
  Target behavior:
200
 
201
  - infeasible to feasible crossing gets a clear positive bonus
202
  - feasible to infeasible regression gets a clear penalty
203
  - when both states are infeasible, reduced official feasibility violation should still help
 
 
204
  - when both states are infeasible, reduced normalized triangularity violation should help the most
205
  - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
206
  - when both states are feasible, lower `max_elongation` should help
 
207
  - larger `run` actions should pay a larger step cost than smaller `run` actions
208
  - `restore_best` should keep a flat non-submit step cost
 
 
209
  - `submit` should be better than passive exhaustion when the design is genuinely improved
210
  - recovery after a failed evaluation may receive a modest bounded bonus
211
 
 
34
  - boundary in, metrics out
35
  - official `P1` feasibility semantics
36
  - objective direction and score ordering
37
+ - low-fidelity live evaluation mode
38
+ - optional higher-fidelity offline validation mode
39
  - explicit failure results when VMEC or forward-model evaluation fails
40
 
41
  Environment owns:
 
106
  - `failure_reason`
107
  - `step_number`
108
  - `budget_remaining`
109
+ - `no_progress_steps`
110
  - `best_low_fidelity_score`
111
  - `best_low_fidelity_feasibility`
 
 
112
  - `target_spec`
113
  - `diagnostics_text`
114
  - `reward_breakdown`
 
118
 
119
  Interpretation rules:
120
 
121
+ - live environment metrics must be labeled as low-fidelity
122
+ - best-state reporting should reflect the single live reward surface
 
123
  - the observation must be understandable without hidden state
124
  - normalized constraint-violation telemetry must follow the official `P1` constraint scales
125
  - the dominant active constraint must be visible so a human can explain repair-phase rewards
126
  - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
127
+ - action telemetry must expose parameter values before and after the action, including clamped, no-op, and repeat-state moves
128
+ - anti-stagnation state that can change reward must be visible in structured observation fields, not only free text
129
 
130
  ## 6. Episode Flow
131
 
 
133
  2. Evaluate the initial state with low fidelity and return the first observation.
134
  3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity.
135
  4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget.
136
+ 5. On `submit`, re-evaluate the current state with low fidelity, consume budget, and end the episode.
137
  6. End the episode on `submit` or budget exhaustion.
138
 
139
  Failure semantics:
 
155
 
156
  Terminal reporting rules:
157
 
158
+ - keep submit-time reporting on the same live low-fidelity truth surface as the rest of the episode
159
+ - keep any higher-fidelity validation artifacts explicitly outside the live environment observation contract
160
 
161
  ## 8. Verifier Contract
162
 
 
178
 
179
  VMEC preset mapping:
180
 
181
+ - `run`, `restore_best`, and `submit` use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence)
182
+ - higher-fidelity validation uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution) outside the live environment loop
183
  - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries
184
 
185
  Training and evaluation rule:
186
 
187
+ - use the live low-fidelity environment contract, including explicit `submit`, as the RL surface
188
+ - the standard repository notebook and `training/llm_rollout.py` workflows should stay aligned to that same action and reward contract
189
+ - keep higher-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts
190
+ - do not reintroduce a separate high-fidelity submit path into the live environment unless the contract is deliberately redefined
191
 
192
+ ## 9. Reward V2
193
 
194
+ `Reward V2` keeps the verifier-native structure from `Reward V1` and adds a small amount of
195
+ trajectory-aware shaping. `Reward V1` fixed the main coarse-signal pathology from `Reward V0`:
196
+ pure `Ξ” official_feasibility` was too coarse because official feasibility is a max over
197
+ normalized constraint violations, so useful repair steps on non-dominant constraints could be
198
+ nearly invisible to the reward.
199
+
200
+ The remaining `Reward V1` pathology was not verifier mismatch. It was short-horizon shaping:
201
+
202
+ - the agent got no extra signal for setting a new best infeasible point
203
+ - near-feasible progress below `0.02` had no milestone signal unless it crossed the full feasible boundary
204
+ - feasible improvements only saw step-to-step objective deltas, not "new best feasible score" progress
205
+ - repeated local loops or three-step stagnation had no explicit penalty beyond normal step cost
206
 
207
  Target behavior:
208
 
209
  - infeasible to feasible crossing gets a clear positive bonus
210
  - feasible to infeasible regression gets a clear penalty
211
  - when both states are infeasible, reduced official feasibility violation should still help
212
+ - on low-fidelity `run` steps, setting a new best infeasible feasibility should help
213
+ - entering the near-feasible corridor around `p1_feasibility <= 0.02` should get a small bounded bonus
214
  - when both states are infeasible, reduced normalized triangularity violation should help the most
215
  - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
216
  - when both states are feasible, lower `max_elongation` should help
217
+ - on low-fidelity `run` steps, beating the previous best feasible score should help
218
  - larger `run` actions should pay a larger step cost than smaller `run` actions
219
  - `restore_best` should keep a flat non-submit step cost
220
+ - repeated local revisits without improvement should pay a small penalty
221
+ - three non-improving steps in a row should pay a small stagnation penalty
222
  - `submit` should be better than passive exhaustion when the design is genuinely improved
223
  - recovery after a failed evaluation may receive a modest bounded bonus
224
 
docs/p1_transfer/p1_reward_transfer.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 reward transfer for fusion-design-lab
2
+
3
+ ## Core conclusion
4
+ Use `scadena` as the main repair prior and `CreativeEngineer` as the top-score exploitation prior.
5
+
6
+ ## Reward schedule
7
+ 1. Infeasible phase: reduce official feasibility violation first.
8
+ 2. Within infeasible candidates, prioritize average triangularity repair first.
9
+ 3. After triangularity starts to clear, prefer aspect-ratio and edge-iota cleanup.
10
+ 4. Crossing the official feasibility threshold should receive a large bonus.
11
+ 5. Once feasible, optimize the official P1 score, which in practice tracks lower max elongation.
12
+
13
+ ## Practical priors
14
+ - `scadena_seed.json`: best repair anchor. Raw near-P1 HF seeds crossed official feasibility by moving toward this family.
15
+ - `creative_seed.json` and `creative_best.json`: best endpoint family. Use these for exploitation after feasibility is stable.
16
+ - `samet_seed.json`: exact-feasible distinct family. Useful for diversity and non-scadena exploration.
17
+ - `egodos_seed.json`: best near-feasible non-scadena source.
18
+ - `egodos_sparse_rgroup_best.json`: proof that a sparse low-order `r_cos` move can create a new feasible non-scadena design.
19
+
20
+ ## Operators that worked
21
+ - Coordinated move toward the `scadena` manifold for repairing raw near-P1 seeds.
22
+ - Sparse low-order `r_cos` move from `egodos` toward `Samet`.
23
+ - Small local `Samet` continuation on low-order modes.
24
+
25
+ ## Operators that did not work well
26
+ - Whole-boundary interpolation between distant families.
27
+ - Random full-space coefficient noise.
28
+ - Sparse top-k replacement toward `scadena` without coordinated movement.
29
+
30
+ ## Suggested usage in fusion-design-lab
31
+ - During repair-heavy exploration, add reward telemetry for:
32
+ - official feasibility
33
+ - average triangularity
34
+ - aspect ratio
35
+ - edge rotational transform over field periods
36
+ - max elongation
37
+ - Bias mutation proposals toward:
38
+ - `scadena` direction for feasibility repair
39
+ - `CreativeEngineer` neighborhood for high-score exploitation
40
+ - `Samet` and `egodos` seeds for diversity maintenance
docs/p1_transfer/p1_seed_selection.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 curated seed selection
2
+
3
+ ## Included seeds
4
+ - `creative_best.json`: best overall official P1 design found. Use as the score ceiling reference.
5
+ - `creative_seed.json`: original CreativeEngineer leaderboard anchor. Use as the exploitation parent.
6
+ - `scadena_seed.json`: strongest repair anchor found. Use when a candidate is close but still infeasible.
7
+ - `scadena_repaired_best.json`: repaired raw-HF survivor showing that the scadena corridor generalizes.
8
+ - `samet_seed.json`: exact-feasible distinct family. Use to prevent collapse into only the creative/scadena basin.
9
+ - `egodos_seed.json`: near-feasible non-scadena target. Use as a repair-source seed.
10
+ - `egodos_sparse_rgroup_best.json`: best new non-scadena feasible design found from a sparse grouped repair.
11
+
12
+ ## Family roles
13
+ - `creative`: best objective region.
14
+ - `scadena`: best feasibility-repair corridor.
15
+ - `samet`: stable distinct feasible basin.
16
+ - `egodos`: useful near-feasible source for non-scadena exploration.
17
+
18
+ ## Search pattern extracted
19
+ - `CreativeEngineer` is the better endpoint family.
20
+ - `scadena` is the better repair corridor.
21
+ - `Samet` supports local feasible continuation.
22
+ - `egodos` can be repaired into feasibility with a very small sparse low-order `r_cos` move toward `Samet`.
23
+
24
+ ## Recommended initialization mix
25
+ - 40% from `scadena_seed.json` and `scadena_repaired_best.json`
26
+ - 30% from `creative_seed.json` and `creative_best.json`
27
+ - 20% from `samet_seed.json`
28
+ - 10% from `egodos_seed.json` and `egodos_sparse_rgroup_best.json`
29
+
30
+ ## Why this is minimal
31
+ This pack gives one strong exploitation family, one strong repair family, and one genuinely distinct non-scadena frontier, without dragging over the entire search archive.
fusion_lab/llm_agent.py CHANGED
@@ -2,7 +2,7 @@ from __future__ import annotations
2
 
3
  import json
4
  from dataclasses import asdict, dataclass
5
- from typing import Final, Sequence
6
 
7
  from fusion_lab.models import (
8
  DirectionName,
@@ -22,6 +22,12 @@ RUN_PARAMETERS: Final[tuple[ParameterName, ...]] = (
22
  RUN_DIRECTIONS: Final[tuple[DirectionName, ...]] = ("increase", "decrease")
23
  RUN_MAGNITUDES: Final[tuple[MagnitudeName, ...]] = ("small", "medium", "large")
24
 
 
 
 
 
 
 
25
  SYSTEM_PROMPT: Final[str] = """You are an expert stellarator designer.
26
 
27
  Goal:
@@ -39,8 +45,9 @@ Action rules:
39
  - each item must be either:
40
  - {"intent":"run","parameter":"<parameter>","direction":"increase|decrease","magnitude":"small|medium|large"}
41
  - {"intent":"restore_best"}
 
42
  - keep the plan short and within the remaining budget
43
- - do not output "submit"
44
 
45
  Constraint directions:
46
  - aspect_ratio <= 4.0
@@ -154,17 +161,24 @@ def format_observation(observation: StellaratorObservation) -> str:
154
  f"- evaluation_fidelity: {observation.evaluation_fidelity}\n"
155
  f"- evaluation_failed: {observation.evaluation_failed}\n"
156
  f"- budget_remaining: {observation.budget_remaining}\n"
 
157
  f"- best_low_fidelity_score: {observation.best_low_fidelity_score:.4f}\n"
158
  f"- best_low_fidelity_feasibility: {observation.best_low_fidelity_feasibility:.6f}\n"
159
  f"- diagnostics: {observation.diagnostics_text}\n"
160
  )
161
 
162
 
 
 
 
 
 
 
 
163
  def build_prompt(observation: StellaratorObservation) -> str:
 
164
  return (
165
- f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
166
- f"<|im_start|>user\n{format_observation(observation)}<|im_end|>\n"
167
- "<|im_start|>assistant\n"
168
  )
169
 
170
 
@@ -202,7 +216,7 @@ def _parse_action_item(item: object) -> StellaratorAction | None:
202
  )
203
 
204
 
205
- def parse_action_plan(text: str, *, allow_submit: bool = False) -> list[StellaratorAction]:
206
  raw_plan = extract_json_plan(text)
207
  if raw_plan is None:
208
  return []
@@ -231,7 +245,7 @@ def run_episode_with_actions(
231
  *,
232
  seed_idx: int,
233
  auto_submit: bool = False,
234
- allow_submit: bool = False,
235
  ) -> LLMEpisodeTrace:
236
  environment = StellaratorEnvironment()
237
  observation = environment.reset(seed=seed_idx)
@@ -266,6 +280,16 @@ def run_episode_with_actions(
266
  done = False
267
  step_index = 0
268
  rollout_actions = [action for action in actions if allow_submit or action.intent != "submit"]
 
 
 
 
 
 
 
 
 
 
269
  for step_index, action in enumerate(rollout_actions[:BUDGET], start=1):
270
  if _step_and_record(action, step_index):
271
  done = True
 
2
 
3
  import json
4
  from dataclasses import asdict, dataclass
5
+ from typing import Final, Literal, Sequence, TypedDict
6
 
7
  from fusion_lab.models import (
8
  DirectionName,
 
22
  RUN_DIRECTIONS: Final[tuple[DirectionName, ...]] = ("increase", "decrease")
23
  RUN_MAGNITUDES: Final[tuple[MagnitudeName, ...]] = ("small", "medium", "large")
24
 
25
+
26
+ class PromptMessage(TypedDict):
27
+ role: Literal["system", "user"]
28
+ content: str
29
+
30
+
31
  SYSTEM_PROMPT: Final[str] = """You are an expert stellarator designer.
32
 
33
  Goal:
 
45
  - each item must be either:
46
  - {"intent":"run","parameter":"<parameter>","direction":"increase|decrease","magnitude":"small|medium|large"}
47
  - {"intent":"restore_best"}
48
+ - {"intent":"submit"}
49
  - keep the plan short and within the remaining budget
50
+ - use "submit" once when you want to stop and lock in the current design
51
 
52
  Constraint directions:
53
  - aspect_ratio <= 4.0
 
161
  f"- evaluation_fidelity: {observation.evaluation_fidelity}\n"
162
  f"- evaluation_failed: {observation.evaluation_failed}\n"
163
  f"- budget_remaining: {observation.budget_remaining}\n"
164
+ f"- no_progress_steps: {observation.no_progress_steps}\n"
165
  f"- best_low_fidelity_score: {observation.best_low_fidelity_score:.4f}\n"
166
  f"- best_low_fidelity_feasibility: {observation.best_low_fidelity_feasibility:.6f}\n"
167
  f"- diagnostics: {observation.diagnostics_text}\n"
168
  )
169
 
170
 
171
+ def build_messages(observation: StellaratorObservation) -> tuple[PromptMessage, PromptMessage]:
172
+ return (
173
+ {"role": "system", "content": SYSTEM_PROMPT},
174
+ {"role": "user", "content": format_observation(observation)},
175
+ )
176
+
177
+
178
  def build_prompt(observation: StellaratorObservation) -> str:
179
+ system_message, user_message = build_messages(observation)
180
  return (
181
+ f"System:\n{system_message['content']}\n\nUser:\n{user_message['content']}\n\nAssistant:\n"
 
 
182
  )
183
 
184
 
 
216
  )
217
 
218
 
219
+ def parse_action_plan(text: str, *, allow_submit: bool = True) -> list[StellaratorAction]:
220
  raw_plan = extract_json_plan(text)
221
  if raw_plan is None:
222
  return []
 
245
  *,
246
  seed_idx: int,
247
  auto_submit: bool = False,
248
+ allow_submit: bool = True,
249
  ) -> LLMEpisodeTrace:
250
  environment = StellaratorEnvironment()
251
  observation = environment.reset(seed=seed_idx)
 
280
  done = False
281
  step_index = 0
282
  rollout_actions = [action for action in actions if allow_submit or action.intent != "submit"]
283
+ if len(rollout_actions) > BUDGET:
284
+ submit_index = next(
285
+ (idx for idx, action in enumerate(rollout_actions) if action.intent == "submit"),
286
+ None,
287
+ )
288
+ if submit_index is not None and submit_index >= BUDGET:
289
+ # Keep terminal submit within the budget if the model over-runs plan length.
290
+ rollout_actions = rollout_actions[: BUDGET - 1] + [rollout_actions[submit_index]]
291
+ else:
292
+ rollout_actions = rollout_actions[:BUDGET]
293
  for step_index, action in enumerate(rollout_actions[:BUDGET], start=1):
294
  if _step_and_record(action, step_index):
295
  done = True
fusion_lab/models.py CHANGED
@@ -57,11 +57,16 @@ class RewardBreakdown(BaseModel):
57
  feasibility_crossing_bonus: float = 0.0
58
  feasibility_regression_penalty: float = 0.0
59
  feasibility_delta_reward: float = 0.0
 
 
60
  aspect_ratio_repair_reward: float = 0.0
61
  triangularity_repair_reward: float = 0.0
62
  iota_repair_reward: float = 0.0
63
  objective_delta_reward: float = 0.0
 
64
  step_cost: float = 0.0
 
 
65
  recovery_bonus: float = 0.0
66
  terminal_improvement_bonus: float = 0.0
67
  terminal_budget_bonus: float = 0.0
@@ -81,6 +86,7 @@ class ActionMonitor(BaseModel):
81
  params_after: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
82
  clamped: bool = False
83
  no_op: bool = False
 
84
  used_best_params: bool = False
85
 
86
 
@@ -115,10 +121,9 @@ class StellaratorObservation(Observation):
115
  failure_reason: str = ""
116
  step_number: int = 0
117
  budget_remaining: int = 6
 
118
  best_low_fidelity_score: float = 0.0
119
  best_low_fidelity_feasibility: float = float("inf")
120
- best_high_fidelity_score: float | None = None
121
- best_high_fidelity_feasibility: float | None = None
122
  constraints_satisfied: bool = True
123
  target_spec: str = ""
124
  reward_breakdown: RewardBreakdown = Field(default_factory=default_reward_breakdown)
@@ -132,14 +137,13 @@ class StellaratorState(State):
132
  current_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
133
  best_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
134
  initial_low_fidelity_score: float = 0.0
135
- initial_high_fidelity_score: float | None = None
136
  best_low_fidelity_score: float = 0.0
137
  best_low_fidelity_feasibility: float = float("inf")
138
- best_high_fidelity_score: float | None = None
139
- best_high_fidelity_feasibility: float | None = None
140
  budget_total: int = 6
141
  budget_remaining: int = 6
142
  episode_done: bool = False
143
  constraints_satisfied: bool = True
144
  total_reward: float = 0.0
 
 
145
  history: list[str] = Field(default_factory=list)
 
57
  feasibility_crossing_bonus: float = 0.0
58
  feasibility_regression_penalty: float = 0.0
59
  feasibility_delta_reward: float = 0.0
60
+ best_feasibility_bonus: float = 0.0
61
+ near_feasible_bonus: float = 0.0
62
  aspect_ratio_repair_reward: float = 0.0
63
  triangularity_repair_reward: float = 0.0
64
  iota_repair_reward: float = 0.0
65
  objective_delta_reward: float = 0.0
66
+ best_score_bonus: float = 0.0
67
  step_cost: float = 0.0
68
+ no_progress_penalty: float = 0.0
69
+ repeat_state_penalty: float = 0.0
70
  recovery_bonus: float = 0.0
71
  terminal_improvement_bonus: float = 0.0
72
  terminal_budget_bonus: float = 0.0
 
86
  params_after: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
87
  clamped: bool = False
88
  no_op: bool = False
89
+ repeat_state: bool = False
90
  used_best_params: bool = False
91
 
92
 
 
121
  failure_reason: str = ""
122
  step_number: int = 0
123
  budget_remaining: int = 6
124
+ no_progress_steps: int = 0
125
  best_low_fidelity_score: float = 0.0
126
  best_low_fidelity_feasibility: float = float("inf")
 
 
127
  constraints_satisfied: bool = True
128
  target_spec: str = ""
129
  reward_breakdown: RewardBreakdown = Field(default_factory=default_reward_breakdown)
 
137
  current_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
138
  best_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
139
  initial_low_fidelity_score: float = 0.0
 
140
  best_low_fidelity_score: float = 0.0
141
  best_low_fidelity_feasibility: float = float("inf")
 
 
142
  budget_total: int = 6
143
  budget_remaining: int = 6
144
  episode_done: bool = False
145
  constraints_satisfied: bool = True
146
  total_reward: float = 0.0
147
+ no_progress_steps: int = 0
148
+ visited_state_keys: list[str] = Field(default_factory=list)
149
  history: list[str] = Field(default_factory=list)
server/app.py CHANGED
@@ -66,7 +66,7 @@ def landing_page() -> str:
66
  <h2>Constraints</h2>
67
  <div class="constraint"><span class="name">aspect_ratio</span><span class="bound">&le; 4.0</span></div>
68
  <div class="constraint"><span class="name">average_triangularity</span><span class="bound">&le; &minus;0.5</span></div>
69
- <div class="constraint"><span class="name">edge_iota_over_nfp</span><span class="bound">&ge; 0.3</span></div>
70
  </div>
71
 
72
  <div class="card">
@@ -98,7 +98,7 @@ def task_summary() -> dict[str, object]:
98
  "constraints": {
99
  "aspect_ratio_max": ASPECT_RATIO_MAX,
100
  "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
101
- "edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
102
  },
103
  "n_field_periods": N_FIELD_PERIODS,
104
  "budget": BUDGET,
@@ -113,7 +113,7 @@ def task_summary() -> dict[str, object]:
113
  "magnitudes": ["small", "medium", "large"],
114
  "evaluation_modes": {
115
  "run": "low-fidelity constellaration evaluation",
116
- "submit": "high-fidelity constellaration evaluation",
117
  },
118
  }
119
 
 
66
  <h2>Constraints</h2>
67
  <div class="constraint"><span class="name">aspect_ratio</span><span class="bound">&le; 4.0</span></div>
68
  <div class="constraint"><span class="name">average_triangularity</span><span class="bound">&le; &minus;0.5</span></div>
69
+ <div class="constraint"><span class="name">abs(edge_iota_over_nfp)</span><span class="bound">&ge; 0.3</span></div>
70
  </div>
71
 
72
  <div class="card">
 
98
  "constraints": {
99
  "aspect_ratio_max": ASPECT_RATIO_MAX,
100
  "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
101
+ "abs_edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
102
  },
103
  "n_field_periods": N_FIELD_PERIODS,
104
  "budget": BUDGET,
 
113
  "magnitudes": ["small", "medium", "large"],
114
  "evaluation_modes": {
115
  "run": "low-fidelity constellaration evaluation",
116
+ "submit": "low-fidelity constellaration terminal evaluation",
117
  },
118
  }
119
 
server/data/README.md DELETED
@@ -1,7 +0,0 @@
1
- Baseline VMEC inputs and related static assets belong here.
2
-
3
- Do not commit generated solver outputs or large transient artifacts.
4
-
5
- ## Status
6
-
7
- - [ ] tracked `P1` fixture assets added under `server/data/p1/`
 
 
 
 
 
 
 
 
server/data/p1/bad_low_iota.json CHANGED
@@ -38,5 +38,5 @@
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
- "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:19.629771+00:00"
42
  }
 
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
+ "paired_high_fidelity_timestamp_utc": "2026-03-08T15:20:53.640050+00:00"
42
  }
server/data/p1/boundary_default_reset.json CHANGED
@@ -38,5 +38,5 @@
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
- "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:24.745385+00:00"
42
  }
 
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
+ "paired_high_fidelity_timestamp_utc": "2026-03-08T15:20:58.843405+00:00"
42
  }
server/data/p1/lowfi_feasible_local.json CHANGED
@@ -38,5 +38,5 @@
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
- "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:29.939083+00:00"
42
  }
 
38
  "evaluation_fidelity": "high",
39
  "failure_reason": ""
40
  },
41
+ "paired_high_fidelity_timestamp_utc": "2026-03-08T15:21:04.110710+00:00"
42
  }
server/environment.py CHANGED
@@ -45,8 +45,8 @@ TARGET_SPEC: Final[str] = (
45
  "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
46
  "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
47
  "triangularity <= -0.5, abs(edge rotational transform / n_field_periods) >= 0.3. "
48
- "Run actions use low-fidelity verification. Submit uses high-fidelity verification. "
49
- "Budget: 6 evaluations."
50
  )
51
 
52
  FAILURE_PENALTY: Final[float] = -2.0
@@ -54,6 +54,13 @@ FEASIBILITY_DELTA_WEIGHT: Final[float] = 2.0
54
  TRIANGULARITY_REPAIR_WEIGHT: Final[float] = 2.0
55
  ASPECT_RATIO_REPAIR_WEIGHT: Final[float] = 1.0
56
  IOTA_REPAIR_WEIGHT: Final[float] = 1.0
 
 
 
 
 
 
 
57
  STEP_COST_BY_MAGNITUDE: Final[dict[MagnitudeName, float]] = {
58
  "small": -0.05,
59
  "medium": -0.1,
@@ -94,6 +101,7 @@ class StellaratorEnvironment(
94
  constraints_satisfied=metrics.constraints_satisfied,
95
  total_reward=0.0,
96
  )
 
97
  self._last_metrics = metrics
98
  self._last_successful_metrics = None if metrics.evaluation_failed else metrics
99
  return self._build_observation(
@@ -148,17 +156,24 @@ class StellaratorEnvironment(
148
  direction=action.direction,
149
  magnitude=action.magnitude,
150
  )
 
151
  action_monitor = self._build_action_monitor(
152
  action=action,
153
  params_before=params_before,
154
  params_after=params,
155
  clamped=clamped,
156
  no_op=no_op,
 
157
  )
158
  metrics = self._evaluate_params(params, fidelity="low")
159
  self._state.current_params = params
160
  self._state.constraints_satisfied = metrics.constraints_satisfied
161
- self._update_best(params, metrics)
 
 
 
 
 
162
 
163
  done = self._state.budget_remaining <= 0
164
  reward_breakdown = self._compute_reward_breakdown(
@@ -166,6 +181,11 @@ class StellaratorEnvironment(
166
  action.intent,
167
  done,
168
  magnitude=action.magnitude,
 
 
 
 
 
169
  )
170
  reward = reward_breakdown.total
171
  summary = self._summary_run(action, metrics, action_monitor)
@@ -186,23 +206,22 @@ class StellaratorEnvironment(
186
  )
187
 
188
  def _handle_submit(self) -> StellaratorObservation:
 
189
  action = StellaratorAction(intent="submit")
190
  action_monitor = self._build_action_monitor(
191
  action=action,
192
  params_before=self._state.current_params,
193
  params_after=self._state.current_params,
194
  )
195
- metrics = self._evaluate_params(self._state.current_params, fidelity="high")
196
- initial_submit_score = self._initial_high_fidelity_score()
197
- best_submit_metrics = self._refresh_best_high_fidelity_metrics(metrics)
198
  reward_breakdown = self._compute_reward_breakdown(
199
  metrics,
200
  "submit",
201
  done=True,
202
- initial_reference_score=initial_submit_score,
203
  )
204
  reward = reward_breakdown.total
205
- summary = self._summary_submit(metrics, best_submit_metrics)
206
  self._state.history.append(summary)
207
  self._state.total_reward = round(self._state.total_reward + reward, 4)
208
  self._state.episode_done = True
@@ -223,19 +242,36 @@ class StellaratorEnvironment(
223
  self._state.budget_remaining -= 1
224
  params_before = self._state.current_params
225
  self._state.current_params = self._state.best_params
 
226
  action = StellaratorAction(intent="restore_best")
227
  action_monitor = self._build_action_monitor(
228
  action=action,
229
  params_before=params_before,
230
  params_after=self._state.current_params,
231
  no_op=params_before == self._state.current_params,
 
232
  used_best_params=True,
233
  )
234
  metrics = self._evaluate_params(self._state.current_params, fidelity="low")
235
  self._state.constraints_satisfied = metrics.constraints_satisfied
 
 
 
 
 
 
236
 
237
  done = self._state.budget_remaining <= 0
238
- reward_breakdown = self._compute_reward_breakdown(metrics, "restore_best", done)
 
 
 
 
 
 
 
 
 
239
  reward = reward_breakdown.total
240
  summary = self._summary_restore(metrics, action_monitor)
241
  self._state.history.append(summary)
@@ -283,9 +319,25 @@ class StellaratorEnvironment(
283
  done: bool,
284
  magnitude: MagnitudeName | None = None,
285
  initial_reference_score: float | None = None,
 
 
 
 
 
 
286
  ) -> RewardBreakdown:
287
  recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
288
- previous_metrics = self._reference_metrics(metrics)
 
 
 
 
 
 
 
 
 
 
289
  breakdown = RewardBreakdown(
290
  intent=intent,
291
  evaluation_failed=metrics.evaluation_failed,
@@ -296,10 +348,17 @@ class StellaratorEnvironment(
296
  reference_max_elongation=previous_metrics.max_elongation,
297
  initial_reference_score=initial_reference_score,
298
  )
 
 
 
 
 
 
 
 
 
299
  if metrics.evaluation_failed:
300
  breakdown.failure_penalty = FAILURE_PENALTY
301
- if intent != "submit":
302
- breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
303
  if intent == "submit":
304
  breakdown.failure_submit_penalty = -1.0
305
  elif done:
@@ -312,14 +371,40 @@ class StellaratorEnvironment(
312
  if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
313
  breakdown.feasibility_regression_penalty = -3.0
314
 
 
 
 
 
 
 
315
  if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
316
  breakdown.objective_delta_reward = (
317
  previous_metrics.max_elongation - metrics.max_elongation
318
  ) * 10.0
 
 
 
 
 
 
 
 
319
  else:
320
  breakdown.feasibility_delta_reward = (
321
  previous_metrics.p1_feasibility - metrics.p1_feasibility
322
  ) * FEASIBILITY_DELTA_WEIGHT
 
 
 
 
 
 
 
 
 
 
 
 
323
  breakdown.triangularity_repair_reward = (
324
  previous_metrics.triangularity_violation - metrics.triangularity_violation
325
  ) * TRIANGULARITY_REPAIR_WEIGHT
@@ -330,9 +415,6 @@ class StellaratorEnvironment(
330
  previous_metrics.iota_violation - metrics.iota_violation
331
  ) * IOTA_REPAIR_WEIGHT
332
 
333
- if intent != "submit":
334
- breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
335
-
336
  if recovered_from_failure:
337
  breakdown.recovery_bonus = 1.0
338
 
@@ -375,8 +457,6 @@ class StellaratorEnvironment(
375
  )
376
  best_low_fidelity_score = self._state.best_low_fidelity_score
377
  best_low_fidelity_feasibility = self._state.best_low_fidelity_feasibility
378
- best_high_fidelity_score = self._state.best_high_fidelity_score
379
- best_high_fidelity_feasibility = self._state.best_high_fidelity_feasibility
380
  trajectory_summary = self._trajectory_summary()
381
  text_lines = [
382
  action_summary,
@@ -402,14 +482,7 @@ class StellaratorEnvironment(
402
  f"dominant_constraint={metrics.dominant_constraint}",
403
  f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
404
  f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
405
- (
406
- "best_high_fidelity_score="
407
- f"{self._format_optional_metric(best_high_fidelity_score)}"
408
- ),
409
- (
410
- "best_high_fidelity_feasibility="
411
- f"{self._format_optional_metric(best_high_fidelity_feasibility)}"
412
- ),
413
  f"vacuum_well={metrics.vacuum_well:.4f}",
414
  f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
415
  f"step={self._state.step_count} | budget={self._state.budget_remaining}/{self._state.budget_total}",
@@ -417,6 +490,7 @@ class StellaratorEnvironment(
417
  f"reward_terms={self._reward_terms_text(reward_breakdown)}",
418
  f"action_clamped={action_monitor.clamped}",
419
  f"action_no_op={action_monitor.no_op}",
 
420
  f"episode_total_reward={self._state.total_reward:+.4f}",
421
  ]
422
  )
@@ -439,10 +513,9 @@ class StellaratorEnvironment(
439
  failure_reason=metrics.failure_reason,
440
  step_number=self._state.step_count,
441
  budget_remaining=self._state.budget_remaining,
 
442
  best_low_fidelity_score=best_low_fidelity_score,
443
  best_low_fidelity_feasibility=best_low_fidelity_feasibility,
444
- best_high_fidelity_score=best_high_fidelity_score,
445
- best_high_fidelity_feasibility=best_high_fidelity_feasibility,
446
  constraints_satisfied=metrics.constraints_satisfied,
447
  target_spec=TARGET_SPEC,
448
  reward=reward,
@@ -499,14 +572,13 @@ class StellaratorEnvironment(
499
  def _summary_submit(
500
  self,
501
  metrics: EvaluationMetrics,
502
- best_submit_metrics: EvaluationMetrics,
503
  ) -> str:
504
  if metrics.evaluation_failed:
505
- return f"Submit failed during high-fidelity evaluation: {metrics.failure_reason}"
506
  return (
507
- f"Submitted current_high_fidelity_score={metrics.p1_score:.6f}, "
508
- f"best_high_fidelity_score={best_submit_metrics.p1_score:.6f}, "
509
- f"best_high_fidelity_feasibility={best_submit_metrics.p1_feasibility:.6f}, "
510
  f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
511
  )
512
 
@@ -573,36 +645,45 @@ class StellaratorEnvironment(
573
  return self._last_successful_metrics
574
  return fallback
575
 
576
- def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
577
  return (
578
- not metrics.evaluation_failed
579
- and self._last_metrics is not None
580
- and self._last_metrics.evaluation_failed
581
  )
582
 
583
- def _initial_high_fidelity_score(self) -> float:
584
- if self._state.initial_high_fidelity_score is not None:
585
- return self._state.initial_high_fidelity_score
586
- metrics = self._evaluate_params(self._state.initial_params, fidelity="high")
587
- self._state.initial_high_fidelity_score = metrics.p1_score
588
- return metrics.p1_score
589
-
590
- def _refresh_best_high_fidelity_metrics(
591
  self,
592
- current_submit_metrics: EvaluationMetrics,
593
- ) -> EvaluationMetrics:
594
- best_metrics = current_submit_metrics
595
- if self._state.best_params != self._state.current_params:
596
- best_metrics = self._evaluate_params(self._state.best_params, fidelity="high")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
597
 
598
- self._state.best_high_fidelity_score = best_metrics.p1_score
599
- self._state.best_high_fidelity_feasibility = best_metrics.p1_feasibility
600
- return best_metrics
 
601
 
602
- def _format_optional_metric(self, value: float | None) -> str:
603
- if value is None:
604
- return "n/a"
605
- return f"{value:.6f}"
 
 
606
 
607
  def _build_action_monitor(
608
  self,
@@ -612,6 +693,7 @@ class StellaratorEnvironment(
612
  params_after: LowDimBoundaryParams,
613
  clamped: bool = False,
614
  no_op: bool = False,
 
615
  used_best_params: bool = False,
616
  ) -> ActionMonitor:
617
  return ActionMonitor(
@@ -623,6 +705,7 @@ class StellaratorEnvironment(
623
  params_after=params_after,
624
  clamped=clamped,
625
  no_op=no_op,
 
626
  used_best_params=used_best_params,
627
  )
628
 
@@ -644,6 +727,24 @@ class StellaratorEnvironment(
644
  return "The requested move was clipped to stay inside the allowed parameter range. "
645
  return ""
646
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
647
  def _step_cost(self, *, intent: ActionIntent, magnitude: MagnitudeName | None) -> float:
648
  if intent == "restore_best":
649
  return RESTORE_STEP_COST
@@ -660,11 +761,16 @@ class StellaratorEnvironment(
660
  + breakdown.feasibility_crossing_bonus
661
  + breakdown.feasibility_regression_penalty
662
  + breakdown.feasibility_delta_reward
 
 
663
  + breakdown.aspect_ratio_repair_reward
664
  + breakdown.triangularity_repair_reward
665
  + breakdown.iota_repair_reward
666
  + breakdown.objective_delta_reward
 
667
  + breakdown.step_cost
 
 
668
  + breakdown.recovery_bonus
669
  + breakdown.terminal_improvement_bonus
670
  + breakdown.terminal_budget_bonus
@@ -681,11 +787,16 @@ class StellaratorEnvironment(
681
  ("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
682
  ("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
683
  ("feasibility_delta_reward", breakdown.feasibility_delta_reward),
 
 
684
  ("aspect_ratio_repair_reward", breakdown.aspect_ratio_repair_reward),
685
  ("triangularity_repair_reward", breakdown.triangularity_repair_reward),
686
  ("iota_repair_reward", breakdown.iota_repair_reward),
687
  ("objective_delta_reward", breakdown.objective_delta_reward),
 
688
  ("step_cost", breakdown.step_cost),
 
 
689
  ("recovery_bonus", breakdown.recovery_bonus),
690
  ("terminal_improvement_bonus", breakdown.terminal_improvement_bonus),
691
  ("terminal_budget_bonus", breakdown.terminal_budget_bonus),
@@ -705,15 +816,61 @@ class StellaratorEnvironment(
705
  if metrics.evaluation_failed:
706
  return
707
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
708
  current = (
709
  (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
710
  )
711
  best = (
712
- (1, self._state.best_low_fidelity_score)
713
- if self._state.best_low_fidelity_feasibility <= FEASIBILITY_TOLERANCE
714
- else (0, -self._state.best_low_fidelity_feasibility)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
715
  )
716
- if current > best:
717
- self._state.best_params = params
718
- self._state.best_low_fidelity_score = metrics.p1_score
719
- self._state.best_low_fidelity_feasibility = metrics.p1_feasibility
 
45
  "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
46
  "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
47
  "triangularity <= -0.5, abs(edge rotational transform / n_field_periods) >= 0.3. "
48
+ "All actions use low-fidelity verification. Submit ends the episode with an explicit "
49
+ "terminal evaluation and reward bonus. Budget: 6 evaluations including submit."
50
  )
51
 
52
  FAILURE_PENALTY: Final[float] = -2.0
 
54
  TRIANGULARITY_REPAIR_WEIGHT: Final[float] = 2.0
55
  ASPECT_RATIO_REPAIR_WEIGHT: Final[float] = 1.0
56
  IOTA_REPAIR_WEIGHT: Final[float] = 1.0
57
+ BEST_FEASIBILITY_BONUS_WEIGHT: Final[float] = 1.5
58
+ BEST_SCORE_BONUS_WEIGHT: Final[float] = 0.75
59
+ NEAR_FEASIBILITY_THRESHOLD: Final[float] = 0.02
60
+ NEAR_FEASIBILITY_BONUS: Final[float] = 1.0
61
+ NO_PROGRESS_STEP_THRESHOLD: Final[int] = 3
62
+ NO_PROGRESS_PENALTY: Final[float] = -0.2
63
+ REPEAT_STATE_PENALTY: Final[float] = -0.15
64
  STEP_COST_BY_MAGNITUDE: Final[dict[MagnitudeName, float]] = {
65
  "small": -0.05,
66
  "medium": -0.1,
 
101
  constraints_satisfied=metrics.constraints_satisfied,
102
  total_reward=0.0,
103
  )
104
+ self._state.visited_state_keys = [self._state_key(params)]
105
  self._last_metrics = metrics
106
  self._last_successful_metrics = None if metrics.evaluation_failed else metrics
107
  return self._build_observation(
 
156
  direction=action.direction,
157
  magnitude=action.magnitude,
158
  )
159
+ repeat_state = self._is_repeat_state(params)
160
  action_monitor = self._build_action_monitor(
161
  action=action,
162
  params_before=params_before,
163
  params_after=params,
164
  clamped=clamped,
165
  no_op=no_op,
166
+ repeat_state=repeat_state,
167
  )
168
  metrics = self._evaluate_params(params, fidelity="low")
169
  self._state.current_params = params
170
  self._state.constraints_satisfied = metrics.constraints_satisfied
171
+ (
172
+ best_low_fidelity_feasibility_before,
173
+ best_low_fidelity_score_before,
174
+ step_improved,
175
+ no_progress_steps,
176
+ ) = self._advance_low_fidelity_progress(params, metrics)
177
 
178
  done = self._state.budget_remaining <= 0
179
  reward_breakdown = self._compute_reward_breakdown(
 
181
  action.intent,
182
  done,
183
  magnitude=action.magnitude,
184
+ best_low_fidelity_feasibility_before=best_low_fidelity_feasibility_before,
185
+ best_low_fidelity_score_before=best_low_fidelity_score_before,
186
+ step_improved=step_improved,
187
+ no_progress_steps=no_progress_steps,
188
+ repeat_state=repeat_state,
189
  )
190
  reward = reward_breakdown.total
191
  summary = self._summary_run(action, metrics, action_monitor)
 
206
  )
207
 
208
  def _handle_submit(self) -> StellaratorObservation:
209
+ self._state.budget_remaining -= 1
210
  action = StellaratorAction(intent="submit")
211
  action_monitor = self._build_action_monitor(
212
  action=action,
213
  params_before=self._state.current_params,
214
  params_after=self._state.current_params,
215
  )
216
+ metrics = self._evaluate_params(self._state.current_params, fidelity="low")
217
+ self._state.constraints_satisfied = metrics.constraints_satisfied
 
218
  reward_breakdown = self._compute_reward_breakdown(
219
  metrics,
220
  "submit",
221
  done=True,
 
222
  )
223
  reward = reward_breakdown.total
224
+ summary = self._summary_submit(metrics)
225
  self._state.history.append(summary)
226
  self._state.total_reward = round(self._state.total_reward + reward, 4)
227
  self._state.episode_done = True
 
242
  self._state.budget_remaining -= 1
243
  params_before = self._state.current_params
244
  self._state.current_params = self._state.best_params
245
+ repeat_state = self._is_repeat_state(self._state.current_params)
246
  action = StellaratorAction(intent="restore_best")
247
  action_monitor = self._build_action_monitor(
248
  action=action,
249
  params_before=params_before,
250
  params_after=self._state.current_params,
251
  no_op=params_before == self._state.current_params,
252
+ repeat_state=repeat_state,
253
  used_best_params=True,
254
  )
255
  metrics = self._evaluate_params(self._state.current_params, fidelity="low")
256
  self._state.constraints_satisfied = metrics.constraints_satisfied
257
+ (
258
+ best_low_fidelity_feasibility_before,
259
+ best_low_fidelity_score_before,
260
+ step_improved,
261
+ no_progress_steps,
262
+ ) = self._advance_low_fidelity_progress(self._state.current_params, metrics)
263
 
264
  done = self._state.budget_remaining <= 0
265
+ reward_breakdown = self._compute_reward_breakdown(
266
+ metrics,
267
+ "restore_best",
268
+ done,
269
+ best_low_fidelity_feasibility_before=best_low_fidelity_feasibility_before,
270
+ best_low_fidelity_score_before=best_low_fidelity_score_before,
271
+ step_improved=step_improved,
272
+ no_progress_steps=no_progress_steps,
273
+ repeat_state=repeat_state,
274
+ )
275
  reward = reward_breakdown.total
276
  summary = self._summary_restore(metrics, action_monitor)
277
  self._state.history.append(summary)
 
319
  done: bool,
320
  magnitude: MagnitudeName | None = None,
321
  initial_reference_score: float | None = None,
322
+ reference_metrics: EvaluationMetrics | None = None,
323
+ best_low_fidelity_feasibility_before: float | None = None,
324
+ best_low_fidelity_score_before: float | None = None,
325
+ step_improved: bool = False,
326
+ no_progress_steps: int = 0,
327
+ repeat_state: bool = False,
328
  ) -> RewardBreakdown:
329
  recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
330
+ previous_metrics = reference_metrics or self._reference_metrics(metrics)
331
+ best_low_fidelity_feasibility_before = (
332
+ self._state.best_low_fidelity_feasibility
333
+ if best_low_fidelity_feasibility_before is None
334
+ else best_low_fidelity_feasibility_before
335
+ )
336
+ best_low_fidelity_score_before = (
337
+ self._state.best_low_fidelity_score
338
+ if best_low_fidelity_score_before is None
339
+ else best_low_fidelity_score_before
340
+ )
341
  breakdown = RewardBreakdown(
342
  intent=intent,
343
  evaluation_failed=metrics.evaluation_failed,
 
348
  reference_max_elongation=previous_metrics.max_elongation,
349
  initial_reference_score=initial_reference_score,
350
  )
351
+ self._apply_step_penalties(
352
+ breakdown,
353
+ intent=intent,
354
+ magnitude=magnitude,
355
+ no_progress_steps=no_progress_steps,
356
+ repeat_state=repeat_state,
357
+ step_improved=step_improved,
358
+ )
359
+
360
  if metrics.evaluation_failed:
361
  breakdown.failure_penalty = FAILURE_PENALTY
 
 
362
  if intent == "submit":
363
  breakdown.failure_submit_penalty = -1.0
364
  elif done:
 
371
  if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
372
  breakdown.feasibility_regression_penalty = -3.0
373
 
374
+ if (
375
+ previous_metrics.p1_feasibility > NEAR_FEASIBILITY_THRESHOLD
376
+ and metrics.p1_feasibility <= NEAR_FEASIBILITY_THRESHOLD
377
+ ):
378
+ breakdown.near_feasible_bonus = NEAR_FEASIBILITY_BONUS
379
+
380
  if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
381
  breakdown.objective_delta_reward = (
382
  previous_metrics.max_elongation - metrics.max_elongation
383
  ) * 10.0
384
+ if intent != "submit" and best_low_fidelity_feasibility_before <= FEASIBILITY_TOLERANCE:
385
+ breakdown.best_score_bonus = (
386
+ max(
387
+ 0.0,
388
+ metrics.p1_score - best_low_fidelity_score_before,
389
+ )
390
+ * BEST_SCORE_BONUS_WEIGHT
391
+ )
392
  else:
393
  breakdown.feasibility_delta_reward = (
394
  previous_metrics.p1_feasibility - metrics.p1_feasibility
395
  ) * FEASIBILITY_DELTA_WEIGHT
396
+ if (
397
+ intent != "submit"
398
+ and not metrics.constraints_satisfied
399
+ and best_low_fidelity_feasibility_before > FEASIBILITY_TOLERANCE
400
+ ):
401
+ breakdown.best_feasibility_bonus = (
402
+ max(
403
+ 0.0,
404
+ best_low_fidelity_feasibility_before - metrics.p1_feasibility,
405
+ )
406
+ * BEST_FEASIBILITY_BONUS_WEIGHT
407
+ )
408
  breakdown.triangularity_repair_reward = (
409
  previous_metrics.triangularity_violation - metrics.triangularity_violation
410
  ) * TRIANGULARITY_REPAIR_WEIGHT
 
415
  previous_metrics.iota_violation - metrics.iota_violation
416
  ) * IOTA_REPAIR_WEIGHT
417
 
 
 
 
418
  if recovered_from_failure:
419
  breakdown.recovery_bonus = 1.0
420
 
 
457
  )
458
  best_low_fidelity_score = self._state.best_low_fidelity_score
459
  best_low_fidelity_feasibility = self._state.best_low_fidelity_feasibility
 
 
460
  trajectory_summary = self._trajectory_summary()
461
  text_lines = [
462
  action_summary,
 
482
  f"dominant_constraint={metrics.dominant_constraint}",
483
  f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
484
  f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
485
+ f"no_progress_steps={self._state.no_progress_steps}",
 
 
 
 
 
 
 
486
  f"vacuum_well={metrics.vacuum_well:.4f}",
487
  f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
488
  f"step={self._state.step_count} | budget={self._state.budget_remaining}/{self._state.budget_total}",
 
490
  f"reward_terms={self._reward_terms_text(reward_breakdown)}",
491
  f"action_clamped={action_monitor.clamped}",
492
  f"action_no_op={action_monitor.no_op}",
493
+ f"action_repeat_state={action_monitor.repeat_state}",
494
  f"episode_total_reward={self._state.total_reward:+.4f}",
495
  ]
496
  )
 
513
  failure_reason=metrics.failure_reason,
514
  step_number=self._state.step_count,
515
  budget_remaining=self._state.budget_remaining,
516
+ no_progress_steps=self._state.no_progress_steps,
517
  best_low_fidelity_score=best_low_fidelity_score,
518
  best_low_fidelity_feasibility=best_low_fidelity_feasibility,
 
 
519
  constraints_satisfied=metrics.constraints_satisfied,
520
  target_spec=TARGET_SPEC,
521
  reward=reward,
 
572
  def _summary_submit(
573
  self,
574
  metrics: EvaluationMetrics,
 
575
  ) -> str:
576
  if metrics.evaluation_failed:
577
+ return f"Submit failed during low-fidelity evaluation: {metrics.failure_reason}"
578
  return (
579
+ f"Submitted current_score={metrics.p1_score:.6f}, "
580
+ f"best_score={self._state.best_low_fidelity_score:.6f}, "
581
+ f"best_feasibility={self._state.best_low_fidelity_feasibility:.6f}, "
582
  f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
583
  )
584
 
 
645
  return self._last_successful_metrics
646
  return fallback
647
 
648
+ def _best_low_fidelity_snapshot(self) -> tuple[float, float]:
649
  return (
650
+ self._state.best_low_fidelity_feasibility,
651
+ self._state.best_low_fidelity_score,
 
652
  )
653
 
654
+ def _advance_low_fidelity_progress(
 
 
 
 
 
 
 
655
  self,
656
+ params: LowDimBoundaryParams,
657
+ metrics: EvaluationMetrics,
658
+ ) -> tuple[float, float, bool, int]:
659
+ best_low_fidelity_feasibility_before, best_low_fidelity_score_before = (
660
+ self._best_low_fidelity_snapshot()
661
+ )
662
+ step_improved = self._is_better_than_reference(
663
+ metrics,
664
+ self._previous_step_metrics(metrics),
665
+ )
666
+ self._update_best(params, metrics)
667
+ no_progress_steps = self._advance_no_progress(step_improved=step_improved)
668
+ self._record_visited_state(params)
669
+ return (
670
+ best_low_fidelity_feasibility_before,
671
+ best_low_fidelity_score_before,
672
+ step_improved,
673
+ no_progress_steps,
674
+ )
675
 
676
+ def _previous_step_metrics(self, fallback: EvaluationMetrics) -> EvaluationMetrics:
677
+ if self._last_metrics is not None:
678
+ return self._last_metrics
679
+ return fallback
680
 
681
+ def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
682
+ return (
683
+ not metrics.evaluation_failed
684
+ and self._last_metrics is not None
685
+ and self._last_metrics.evaluation_failed
686
+ )
687
 
688
  def _build_action_monitor(
689
  self,
 
693
  params_after: LowDimBoundaryParams,
694
  clamped: bool = False,
695
  no_op: bool = False,
696
+ repeat_state: bool = False,
697
  used_best_params: bool = False,
698
  ) -> ActionMonitor:
699
  return ActionMonitor(
 
705
  params_after=params_after,
706
  clamped=clamped,
707
  no_op=no_op,
708
+ repeat_state=repeat_state,
709
  used_best_params=used_best_params,
710
  )
711
 
 
727
  return "The requested move was clipped to stay inside the allowed parameter range. "
728
  return ""
729
 
730
+ def _apply_step_penalties(
731
+ self,
732
+ breakdown: RewardBreakdown,
733
+ *,
734
+ intent: ActionIntent,
735
+ magnitude: MagnitudeName | None,
736
+ no_progress_steps: int,
737
+ repeat_state: bool,
738
+ step_improved: bool,
739
+ ) -> None:
740
+ if intent == "submit":
741
+ return
742
+ breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
743
+ if intent == "run" and no_progress_steps >= NO_PROGRESS_STEP_THRESHOLD:
744
+ breakdown.no_progress_penalty = NO_PROGRESS_PENALTY
745
+ if intent == "run" and repeat_state and not step_improved:
746
+ breakdown.repeat_state_penalty = REPEAT_STATE_PENALTY
747
+
748
  def _step_cost(self, *, intent: ActionIntent, magnitude: MagnitudeName | None) -> float:
749
  if intent == "restore_best":
750
  return RESTORE_STEP_COST
 
761
  + breakdown.feasibility_crossing_bonus
762
  + breakdown.feasibility_regression_penalty
763
  + breakdown.feasibility_delta_reward
764
+ + breakdown.best_feasibility_bonus
765
+ + breakdown.near_feasible_bonus
766
  + breakdown.aspect_ratio_repair_reward
767
  + breakdown.triangularity_repair_reward
768
  + breakdown.iota_repair_reward
769
  + breakdown.objective_delta_reward
770
+ + breakdown.best_score_bonus
771
  + breakdown.step_cost
772
+ + breakdown.no_progress_penalty
773
+ + breakdown.repeat_state_penalty
774
  + breakdown.recovery_bonus
775
  + breakdown.terminal_improvement_bonus
776
  + breakdown.terminal_budget_bonus
 
787
  ("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
788
  ("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
789
  ("feasibility_delta_reward", breakdown.feasibility_delta_reward),
790
+ ("best_feasibility_bonus", breakdown.best_feasibility_bonus),
791
+ ("near_feasible_bonus", breakdown.near_feasible_bonus),
792
  ("aspect_ratio_repair_reward", breakdown.aspect_ratio_repair_reward),
793
  ("triangularity_repair_reward", breakdown.triangularity_repair_reward),
794
  ("iota_repair_reward", breakdown.iota_repair_reward),
795
  ("objective_delta_reward", breakdown.objective_delta_reward),
796
+ ("best_score_bonus", breakdown.best_score_bonus),
797
  ("step_cost", breakdown.step_cost),
798
+ ("no_progress_penalty", breakdown.no_progress_penalty),
799
+ ("repeat_state_penalty", breakdown.repeat_state_penalty),
800
  ("recovery_bonus", breakdown.recovery_bonus),
801
  ("terminal_improvement_bonus", breakdown.terminal_improvement_bonus),
802
  ("terminal_budget_bonus", breakdown.terminal_budget_bonus),
 
816
  if metrics.evaluation_failed:
817
  return
818
 
819
+ if self._is_better_than_best(
820
+ metrics,
821
+ best_low_fidelity_feasibility=self._state.best_low_fidelity_feasibility,
822
+ best_low_fidelity_score=self._state.best_low_fidelity_score,
823
+ ):
824
+ self._state.best_params = params
825
+ self._state.best_low_fidelity_score = metrics.p1_score
826
+ self._state.best_low_fidelity_feasibility = metrics.p1_feasibility
827
+
828
+ def _is_better_than_best(
829
+ self,
830
+ metrics: EvaluationMetrics,
831
+ *,
832
+ best_low_fidelity_feasibility: float,
833
+ best_low_fidelity_score: float,
834
+ ) -> bool:
835
  current = (
836
  (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
837
  )
838
  best = (
839
+ (1, best_low_fidelity_score)
840
+ if best_low_fidelity_feasibility <= FEASIBILITY_TOLERANCE
841
+ else (0, -best_low_fidelity_feasibility)
842
+ )
843
+ return current > best
844
+
845
+ def _is_better_than_reference(
846
+ self,
847
+ metrics: EvaluationMetrics,
848
+ reference_metrics: EvaluationMetrics,
849
+ ) -> bool:
850
+ return self._metrics_rank(metrics) > self._metrics_rank(reference_metrics)
851
+
852
+ def _metrics_rank(self, metrics: EvaluationMetrics) -> tuple[int, float]:
853
+ if metrics.evaluation_failed:
854
+ return (-1, float("-inf"))
855
+ if metrics.constraints_satisfied:
856
+ return (1, metrics.p1_score)
857
+ return (0, -metrics.p1_feasibility)
858
+
859
+ def _advance_no_progress(self, *, step_improved: bool) -> int:
860
+ if step_improved:
861
+ self._state.no_progress_steps = 0
862
+ else:
863
+ self._state.no_progress_steps += 1
864
+ return self._state.no_progress_steps
865
+
866
+ def _is_repeat_state(self, params: LowDimBoundaryParams) -> bool:
867
+ return self._state_key(params) in self._state.visited_state_keys
868
+
869
+ def _record_visited_state(self, params: LowDimBoundaryParams) -> None:
870
+ self._state.visited_state_keys.append(self._state_key(params))
871
+
872
+ def _state_key(self, params: LowDimBoundaryParams) -> str:
873
+ return (
874
+ f"{params.aspect_ratio:.6f}|{params.elongation:.6f}|"
875
+ f"{params.rotational_transform:.6f}|{params.triangularity_scale:.6f}"
876
  )
 
 
 
 
training/README.md CHANGED
@@ -4,9 +4,9 @@ This repository treats notebooks and trained-policy runs as supporting evidence
4
 
5
  Training policy:
6
 
7
- - train on the low-fidelity `run` surface for the normal RL inner loop
8
- - keep the standard `training/llm_rollout.py` monitor/evaluate workflow on low-fidelity `run` only
9
- - use high-fidelity `submit` only for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence
10
 
11
  ## Status
12
 
@@ -50,8 +50,8 @@ Use that module as the source of truth for:
50
  - local rollout replay
51
  - rollout telemetry structure used by the monitor command
52
 
53
- For `monitor` and `evaluate`, the rollout stays on low-fidelity `run` steps only and ignores `submit`.
54
- Use `replay` when you explicitly want to exercise the full environment path including terminal `submit`.
55
 
56
  For `evaluate`, the completion command reads the prompt from `stdin` and writes a raw completion to `stdout`.
57
  The current seed is exposed as the `FUSION_LAB_SEED` environment variable so the same command can be used
 
4
 
5
  Training policy:
6
 
7
+ - train on the live low-fidelity environment surface, including explicit `submit`
8
+ - keep the standard `training/llm_rollout.py` monitor/evaluate workflow on the same live contract as the notebook
9
+ - keep high-fidelity validation in offline tooling such as `baselines/high_fidelity_validation.py`
10
 
11
  ## Status
12
 
 
50
  - local rollout replay
51
  - rollout telemetry structure used by the monitor command
52
 
53
+ For `prompt`, `monitor`, `evaluate`, and the notebook, the shared helper contract now includes the live `submit` action.
54
+ Use offline validation scripts when you explicitly want high-fidelity checks outside the environment loop.
55
 
56
  For `evaluate`, the completion command reads the prompt from `stdin` and writes a raw completion to `stdout`.
57
  The current seed is exposed as the `FUSION_LAB_SEED` environment variable so the same command can be used
training/llm_rollout.py CHANGED
@@ -10,10 +10,11 @@ from pathlib import Path
10
  from typing import Final
11
 
12
  from fusion_lab.llm_agent import (
 
13
  build_prompt,
 
14
  parse_action_plan,
15
  run_episode_with_actions,
16
- LLMEpisodeTrace,
17
  )
18
  from fusion_lab.models import StellaratorAction
19
  from server.environment import StellaratorEnvironment
@@ -130,6 +131,7 @@ def prompt_payload(seed: int) -> dict[str, object]:
130
  return {
131
  "created_at_utc": datetime.now(UTC).isoformat(),
132
  "seed": seed,
 
133
  "prompt": build_prompt(observation),
134
  "target_spec": observation.target_spec,
135
  "budget_remaining": observation.budget_remaining,
@@ -138,7 +140,7 @@ def prompt_payload(seed: int) -> dict[str, object]:
138
 
139
 
140
  def parse_actions(
141
- args: argparse.Namespace, *, allow_submit: bool = False
142
  ) -> tuple[str, list[StellaratorAction]]:
143
  if args.action_plan_file is not None:
144
  text = args.action_plan_file.read_text()
@@ -253,30 +255,29 @@ def _pearson_correlation(xs: list[float], ys: list[float]) -> float | None:
253
 
254
  def summarize_traces(traces: list[LLMEpisodeTrace]) -> dict[str, object]:
255
  feasible_count = sum(1 for trace in traces if trace.constraints_satisfied)
256
- high_fidelity_traces = [trace for trace in traces if trace.final_evaluation_fidelity == "high"]
257
- high_fidelity_count = len(high_fidelity_traces)
 
 
 
258
  failed_count = sum(1 for trace in traces if trace.evaluation_failed)
259
  total_rewards = [trace.total_reward for trace in traces]
260
  final_scores = [trace.final_score for trace in traces]
261
  final_feasibilities = [trace.final_feasibility for trace in traces]
262
- high_fidelity_scores = [trace.final_score for trace in high_fidelity_traces]
263
- high_fidelity_feasibilities = [trace.final_feasibility for trace in high_fidelity_traces]
264
  feasible_flags = [1.0 if trace.constraints_satisfied else 0.0 for trace in traces]
265
  episode_count = len(traces)
266
 
267
  return {
268
  "episode_count": episode_count,
269
  "feasible_episode_count": feasible_count,
270
- "high_fidelity_episode_count": high_fidelity_count,
271
  "evaluation_failed_episode_count": failed_count,
272
  "feasible_rate": _round_metric(feasible_count / episode_count),
273
- "high_fidelity_rate": _round_metric(high_fidelity_count / episode_count),
274
  "evaluation_failed_rate": _round_metric(failed_count / episode_count),
275
  "mean_total_reward": _round_metric(_mean(total_rewards)),
276
  "mean_final_score": _round_metric(_mean(final_scores)),
277
  "mean_final_feasibility": _round_metric(_mean(final_feasibilities)),
278
- "mean_high_fidelity_score": _round_metric(_mean(high_fidelity_scores)),
279
- "mean_high_fidelity_feasibility": _round_metric(_mean(high_fidelity_feasibilities)),
280
  "reward_final_score_correlation": _round_metric(
281
  _pearson_correlation(total_rewards, final_scores)
282
  ),
@@ -347,6 +348,7 @@ def evaluate_payload(
347
  evaluations.append(
348
  {
349
  "seed": seed,
 
350
  "prompt": prompt,
351
  "completion": completion,
352
  "parsed_action_count": len(actions),
@@ -370,10 +372,10 @@ def write_monitor_summary(payload: dict[str, object]) -> None:
370
  print(
371
  "episodes="
372
  f"{summary['episode_count']} feasible={summary['feasible_episode_count']} "
373
- f"high_fidelity={summary['high_fidelity_episode_count']} "
374
  f"failed={summary['evaluation_failed_episode_count']} "
375
  f"mean_total_reward={_format_metric(summary['mean_total_reward'], signed=True)} "
376
- f"mean_high_fidelity_score={_format_metric(summary['mean_high_fidelity_score'], signed=True)} "
377
  f"reward_score_corr={summary['reward_final_score_correlation']}"
378
  )
379
  for episode in payload["episodes"]:
 
10
  from typing import Final
11
 
12
  from fusion_lab.llm_agent import (
13
+ build_messages,
14
  build_prompt,
15
+ LLMEpisodeTrace,
16
  parse_action_plan,
17
  run_episode_with_actions,
 
18
  )
19
  from fusion_lab.models import StellaratorAction
20
  from server.environment import StellaratorEnvironment
 
131
  return {
132
  "created_at_utc": datetime.now(UTC).isoformat(),
133
  "seed": seed,
134
+ "messages": list(build_messages(observation)),
135
  "prompt": build_prompt(observation),
136
  "target_spec": observation.target_spec,
137
  "budget_remaining": observation.budget_remaining,
 
140
 
141
 
142
  def parse_actions(
143
+ args: argparse.Namespace, *, allow_submit: bool = True
144
  ) -> tuple[str, list[StellaratorAction]]:
145
  if args.action_plan_file is not None:
146
  text = args.action_plan_file.read_text()
 
255
 
256
  def summarize_traces(traces: list[LLMEpisodeTrace]) -> dict[str, object]:
257
  feasible_count = sum(1 for trace in traces if trace.constraints_satisfied)
258
+ submitted_count = sum(
259
+ 1
260
+ for trace in traces
261
+ if trace.steps and trace.steps[-1].reward_breakdown.get("intent") == "submit"
262
+ )
263
  failed_count = sum(1 for trace in traces if trace.evaluation_failed)
264
  total_rewards = [trace.total_reward for trace in traces]
265
  final_scores = [trace.final_score for trace in traces]
266
  final_feasibilities = [trace.final_feasibility for trace in traces]
 
 
267
  feasible_flags = [1.0 if trace.constraints_satisfied else 0.0 for trace in traces]
268
  episode_count = len(traces)
269
 
270
  return {
271
  "episode_count": episode_count,
272
  "feasible_episode_count": feasible_count,
273
+ "submitted_episode_count": submitted_count,
274
  "evaluation_failed_episode_count": failed_count,
275
  "feasible_rate": _round_metric(feasible_count / episode_count),
276
+ "submitted_rate": _round_metric(submitted_count / episode_count),
277
  "evaluation_failed_rate": _round_metric(failed_count / episode_count),
278
  "mean_total_reward": _round_metric(_mean(total_rewards)),
279
  "mean_final_score": _round_metric(_mean(final_scores)),
280
  "mean_final_feasibility": _round_metric(_mean(final_feasibilities)),
 
 
281
  "reward_final_score_correlation": _round_metric(
282
  _pearson_correlation(total_rewards, final_scores)
283
  ),
 
348
  evaluations.append(
349
  {
350
  "seed": seed,
351
+ "messages": list(build_messages(observation)),
352
  "prompt": prompt,
353
  "completion": completion,
354
  "parsed_action_count": len(actions),
 
372
  print(
373
  "episodes="
374
  f"{summary['episode_count']} feasible={summary['feasible_episode_count']} "
375
+ f"submitted={summary['submitted_episode_count']} "
376
  f"failed={summary['evaluation_failed_episode_count']} "
377
  f"mean_total_reward={_format_metric(summary['mean_total_reward'], signed=True)} "
378
+ f"mean_final_score={_format_metric(summary['mean_final_score'], signed=True)} "
379
  f"reward_score_corr={summary['reward_final_score_correlation']}"
380
  )
381
  for episode in payload["episodes"]:
training/notebooks/README.md CHANGED
@@ -26,10 +26,10 @@ Operational defaults:
26
 
27
  - use the same Python dependency set as the repo runtime
28
  - keep heavy verifier and training work on Northflank
29
- - keep low-fidelity `run` as the training inner loop; do not put high-fidelity `submit` in every RL step
30
- - use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, manual traces, and final evidence
31
  - keep the repository GRPO notebook aligned to the shared helper contract in `fusion_lab/llm_agent.py`
32
- - the standard notebook reward/eval path is low-fidelity-only and ignores `submit` by default
33
  - keep the public submission notebook focused on connecting to the deployed HF Space and exporting visible traces
34
  - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
35
 
 
26
 
27
  - use the same Python dependency set as the repo runtime
28
  - keep heavy verifier and training work on Northflank
29
+ - keep the live notebook and environment on one low-fidelity reward surface, including explicit `submit`
30
+ - keep high-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts
31
  - keep the repository GRPO notebook aligned to the shared helper contract in `fusion_lab/llm_agent.py`
32
+ - the standard notebook reward/eval path uses the same action contract as the environment, including `submit`
33
  - keep the public submission notebook focused on connecting to the deployed HF Space and exporting visible traces
34
  - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
35
 
training/notebooks/fusion_design_lab_training.ipynb CHANGED
@@ -4,22 +4,7 @@
4
  "cell_type": "markdown",
5
  "id": "7fb27b941602401d91542211134fc71a",
6
  "metadata": {},
7
- "source": [
8
- "# Fusion Design Lab β€” GRPO Training\n",
9
- "\n",
10
- "Train an LLM to optimize stellarator fusion reactor designs using **GRPO** (Group Relative Policy Optimization) with **Unsloth** and **TRL**.\n",
11
- "\n",
12
- "The agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n",
13
- "- `aspect_ratio ≀ 4.0`\n",
14
- "- `average_triangularity ≀ -0.5`\n",
15
- "- `abs(edge_iota_over_nfp) β‰₯ 0.3`\n",
16
- "\n",
17
- "Each episode has **6 evaluations** budgeted. The agent produces a plan of actions and the environment scores it via the `constellaration` physics verifier.\n",
18
- "\n",
19
- "**Environment deployed at**: https://creativeengineer-fusion-design-lab.hf.space\n",
20
- "\n",
21
- "**Runtime**: Select GPU (T4 or better) via `Runtime > Change runtime type`."
22
- ]
23
  },
24
  {
25
  "cell_type": "markdown",
@@ -35,7 +20,74 @@
35
  "id": "9a63283cbaf04dbcab1f6479b197f3a8",
36
  "metadata": {},
37
  "outputs": [],
38
- "source": "%%capture\n# Build deps for constellaration (booz-xform compiles from source)\n!apt-get update -qq && apt-get install -y -qq cmake ninja-build g++ gfortran libnetcdf-dev libnetcdff-dev > /dev/null\n\n!pip install trl peft bitsandbytes datasets matplotlib accelerate\n!pip install \"transformers>=4.51\" \"huggingface-hub<1.0\""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  },
40
  {
41
  "cell_type": "markdown",
@@ -49,13 +101,61 @@
49
  "id": "72eea5119410473aa328ad9291626812",
50
  "metadata": {},
51
  "outputs": [],
52
- "source": "import importlib\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\nfrom peft import LoraConfig, get_peft_model\n\nMODEL_NAME = \"Qwen/Qwen3.5-4B\"\nMAX_SEQ_LENGTH = 2048\n\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_quant_type=\"nf4\",\n bnb_4bit_use_double_quant=True,\n bnb_4bit_compute_dtype=torch.bfloat16,\n)\n\nattn_impl = \"flash_attention_2\" if importlib.util.find_spec(\"flash_attn\") else \"sdpa\"\n\nmodel = AutoModelForCausalLM.from_pretrained(\n MODEL_NAME,\n quantization_config=bnb_config,\n torch_dtype=torch.bfloat16,\n device_map=\"auto\",\n attn_implementation=attn_impl,\n)\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\nif tokenizer.pad_token is None:\n tokenizer.pad_token = tokenizer.eos_token\n\nlora_config = LoraConfig(\n r=32,\n lora_alpha=32,\n target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n lora_dropout=0.0,\n task_type=\"CAUSAL_LM\",\n)\nmodel = get_peft_model(model, lora_config)\nmodel.gradient_checkpointing_enable()\nmodel.print_trainable_parameters()\nprint(f\"Model loaded: {MODEL_NAME} (attn: {attn_impl})\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  },
54
  {
55
  "cell_type": "markdown",
56
  "id": "8edb47106e1a46a883d545849b8ab81b",
57
  "metadata": {},
58
- "source": "## 3. Setup Stellarator Environment\n\nInstall the environment package directly from the HF Space repository so training runs locally (no network latency per step). The package also includes the typed `FusionLabClient` and Pydantic models for remote OpenEnv sessions."
59
  },
60
  {
61
  "cell_type": "code",
@@ -65,9 +165,55 @@
65
  "outputs": [],
66
  "source": [
67
  "%%capture\n",
68
- "# Install the fusion-design-lab environment (includes constellaration physics engine)\n",
69
- "# This takes ~3 minutes due to booz-xform compilation\n",
70
- "!pip install \"fusion-design-lab @ git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab\""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ]
72
  },
73
  {
@@ -77,20 +223,24 @@
77
  "metadata": {},
78
  "outputs": [],
79
  "source": [
 
80
  "import json\n",
 
 
81
  "from typing import Final\n",
82
  "\n",
83
  "from fusion_lab.llm_agent import (\n",
84
  " RUN_DIRECTIONS,\n",
85
  " RUN_MAGNITUDES,\n",
86
  " RUN_PARAMETERS,\n",
87
- " build_prompt,\n",
88
  " parse_action_plan,\n",
89
  " run_episode_with_actions,\n",
90
  ")\n",
91
  "from fusion_lab.models import StellaratorAction\n",
92
  "from server.contract import RESET_SEEDS\n",
93
  "from server.environment import BUDGET, StellaratorEnvironment\n",
 
94
  "\n",
95
  "RUN_ACTION_SPECS: Final[list[dict[str, str]]] = [\n",
96
  " {\"intent\": \"run\", \"parameter\": p, \"direction\": d, \"magnitude\": m}\n",
@@ -108,7 +258,40 @@
108
  "print(\n",
109
  " f\"Environment ready. Initial score: {obs.p1_score:.4f}, feasibility: {obs.p1_feasibility:.4f}\"\n",
110
  ")\n",
111
- "print(f\"Budget: {obs.budget_remaining}, Constraints satisfied: {obs.constraints_satisfied}\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ]
113
  },
114
  {
@@ -131,7 +314,7 @@
131
  "# Shared helper smoke test\n",
132
  "env = StellaratorEnvironment()\n",
133
  "obs = env.reset(seed=0)\n",
134
- "prompt = build_prompt(obs)\n",
135
  "print(prompt[:500])\n",
136
  "print(\"...\")\n",
137
  "\n",
@@ -171,7 +354,7 @@
171
  "prompts = []\n",
172
  "for seed_idx in range(len(RESET_SEEDS)):\n",
173
  " obs = StellaratorEnvironment().reset(seed=seed_idx)\n",
174
- " prompt = build_prompt(obs)\n",
175
  " # Repeat each seed to create a larger training set\n",
176
  " for _ in range(50):\n",
177
  " prompts.append({\"prompt\": prompt, \"seed_idx\": seed_idx})\n",
@@ -185,13 +368,7 @@
185
  "cell_type": "markdown",
186
  "id": "504fb2a444614c0babb325280ed9130a",
187
  "metadata": {},
188
- "source": [
189
- "## 6. Reward Function\n",
190
- "\n",
191
- "The environment reward executes each generated action plan in the stellarator environment and returns the cumulative low-fidelity Reward V1 from the live environment. The environment's built-in reward decomposes feasibility (+3/-3 crossing bonuses, official feasibility progress, weighted triangularity/aspect/iota repair terms), objective (max elongation improvement), step costs, and failure penalties β€” see `server/environment.py:_compute_reward_breakdown(...)`.\n",
192
- "\n",
193
- "For the current training workflow, the notebook ignores `submit` and does not auto-submit. GRPO therefore optimizes the low-fidelity `run` path only. The live observation telemetry still exposes `reward_breakdown` and `action_monitor` for debugging reward behavior.\n"
194
- ]
195
  },
196
  {
197
  "cell_type": "code",
@@ -200,35 +377,30 @@
200
  "metadata": {},
201
  "outputs": [],
202
  "source": [
203
- "import traceback\n",
204
- "\n",
205
- "\n",
206
  "def environment_reward_fn(\n",
207
  " completions: list[str], seed_idx: list[int] | None = None, **kwargs\n",
208
  ") -> list[float]:\n",
209
  " \"\"\"Execute each action plan in the environment and return cumulative reward.\n",
210
  "\n",
211
  " This is the sole GRPO training signal in the notebook. It uses the live\n",
212
- " low-fidelity environment reward path and ignores submit so the trainer\n",
213
- " optimizes only the `run` surface. Empty or unparseable outputs still\n",
214
- " receive a trainer-side fallback penalty of -3.0.\n",
 
 
215
  " \"\"\"\n",
216
  " rewards = []\n",
217
  " seeds = seed_idx if seed_idx is not None else [0] * len(completions)\n",
218
  " for i, completion in enumerate(completions):\n",
219
- " try:\n",
220
- " actions = parse_action_plan(completion)\n",
221
- " if len(actions) == 0:\n",
222
- " rewards.append(-3.0)\n",
223
- " continue\n",
224
- " trace = run_episode_with_actions(\n",
225
- " actions,\n",
226
- " seed_idx=int(seeds[i]) % len(RESET_SEEDS),\n",
227
- " )\n",
228
- " rewards.append(trace.total_reward)\n",
229
- " except Exception:\n",
230
- " traceback.print_exc()\n",
231
  " rewards.append(-3.0)\n",
 
 
 
 
 
 
232
  " return rewards\n",
233
  "\n",
234
  "\n",
@@ -249,20 +421,133 @@
249
  " },\n",
250
  " ]\n",
251
  ")\n",
252
- "print(f\"Environment reward (low-fi only): {environment_reward_fn([test_plan], seed_idx=[0])}\")\n",
253
  "\n",
254
- "# Test short plan with no explicit submit\n",
255
- "test_short = json.dumps(\n",
256
  " [\n",
257
  " {\n",
258
  " \"intent\": \"run\",\n",
259
  " \"parameter\": \"triangularity_scale\",\n",
260
  " \"direction\": \"increase\",\n",
261
- " \"magnitude\": \"medium\",\n",
262
  " },\n",
 
263
  " ]\n",
264
  ")\n",
265
- "print(f\"Environment reward (short plan): {environment_reward_fn([test_short], seed_idx=[0])}\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
266
  ]
267
  },
268
  {
@@ -281,7 +566,140 @@
281
  "id": "8a65eabff63a45729fe45fb5ade58bdc",
282
  "metadata": {},
283
  "outputs": [],
284
- "source": "from trl import GRPOConfig, GRPOTrainer\n\nMAX_PROMPT_LENGTH = 768\nMAX_COMPLETION_LENGTH = MAX_SEQ_LENGTH - MAX_PROMPT_LENGTH\n\ntraining_args = GRPOConfig(\n output_dir=\"./grpo_fusion_output\",\n learning_rate=5e-5,\n num_generations=8,\n max_completion_length=MAX_COMPLETION_LENGTH,\n max_prompt_length=MAX_PROMPT_LENGTH,\n per_device_train_batch_size=8,\n gradient_accumulation_steps=1,\n max_steps=60,\n temperature=1.0,\n logging_steps=1,\n save_steps=20,\n bf16=True,\n report_to=\"none\",\n seed=42,\n)\n\ntrainer = GRPOTrainer(\n model=model,\n processing_class=tokenizer,\n reward_funcs=[environment_reward_fn],\n args=training_args,\n train_dataset=dataset,\n)\n\nprint(\"Starting GRPO training...\")\ntrain_result = trainer.train()\nprint(f\"Training complete. Total steps: {train_result.global_step}\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
285
  },
286
  {
287
  "cell_type": "markdown",
@@ -290,7 +708,7 @@
290
  "source": [
291
  "## 8. Training Results\n",
292
  "\n",
293
- "Visualize reward improvement over training steps."
294
  ]
295
  },
296
  {
@@ -300,8 +718,6 @@
300
  "metadata": {},
301
  "outputs": [],
302
  "source": [
303
- "import matplotlib.pyplot as plt\n",
304
- "\n",
305
  "log_history = trainer.state.log_history\n",
306
  "steps = [entry[\"step\"] for entry in log_history if \"loss\" in entry]\n",
307
  "losses = [entry[\"loss\"] for entry in log_history if \"loss\" in entry]\n",
@@ -346,11 +762,7 @@
346
  "cell_type": "markdown",
347
  "id": "8309879909854d7188b41380fd92a7c3",
348
  "metadata": {},
349
- "source": [
350
- "## 9. Evaluate Trained Policy\n",
351
- "\n",
352
- "Generate action plans from the trained model and compare against random baselines."
353
- ]
354
  },
355
  {
356
  "cell_type": "code",
@@ -358,13 +770,13 @@
358
  "id": "3ed186c9a28b402fb0bc4494df01f08d",
359
  "metadata": {},
360
  "outputs": [],
361
- "source": "import random\n\nmodel.eval()\n\n\ndef reward_term_summary(step_or_obs: object) -> str:\n breakdown_obj = getattr(step_or_obs, \"reward_breakdown\")\n breakdown = (\n breakdown_obj.model_dump() if hasattr(breakdown_obj, \"model_dump\") else breakdown_obj\n )\n terms = []\n for key, value in breakdown.items():\n if key in {\n \"intent\",\n \"total\",\n \"evaluation_failed\",\n \"recovered_from_failure\",\n \"reference_constraints_satisfied\",\n \"reference_score\",\n \"reference_feasibility\",\n \"reference_max_elongation\",\n \"initial_reference_score\",\n \"terminal_score_ratio\",\n }:\n continue\n if isinstance(value, (int, float)) and float(value) != 0.0:\n terms.append(f\"{key}={float(value):+.3f}\")\n return \", \".join(terms) if terms else \"none\"\n\n\ndef run_episode_with_model(seed_idx: int) -> tuple[float, list[str]]:\n \"\"\"Run one episode using the trained model.\"\"\"\n env = StellaratorEnvironment()\n obs = env.reset(seed=seed_idx)\n prompt = build_prompt(obs)\n inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n with torch.no_grad():\n outputs = model.generate(\n **inputs,\n max_new_tokens=MAX_COMPLETION_LENGTH,\n temperature=0.7,\n do_sample=True,\n )\n completion = tokenizer.decode(\n outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n )\n actions = parse_action_plan(completion)\n episode = run_episode_with_actions(actions, seed_idx=seed_idx)\n trace = [\n (\n f\"{step.action_label} β†’ reward={step.reward:.3f} \"\n f\"score={step.p1_score:.4f} feasible={step.constraints_satisfied} \"\n f\"terms={reward_term_summary(step)}\"\n )\n for step in episode.steps\n ]\n return episode.total_reward, trace\n\n\ndef run_random_episode(seed_idx: int) -> float:\n \"\"\"Run one episode with random actions for comparison.\"\"\"\n actions = [StellaratorAction(**random.choice(RUN_ACTION_SPECS)) for _ in range(BUDGET)]\n return run_episode_with_actions(actions, seed_idx=seed_idx).total_reward\n\n\n# Evaluate\nprint(\"=\" * 60)\nprint(\"TRAINED MODEL EPISODES\")\nprint(\"=\" * 60)\ntrained_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n reward, trace = run_episode_with_model(seed)\n trained_rewards.append(reward)\n print(f\"\\nSeed {seed} β€” Total reward: {reward:.3f}\")\n for line in trace:\n print(f\" {line}\")\n\nprint(f\"\\nMean trained reward: {sum(trained_rewards) / len(trained_rewards):.3f}\")\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"RANDOM BASELINE (10 episodes per seed)\")\nprint(\"=\" * 60)\nrandom_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n seed_rewards = [run_random_episode(seed) for _ in range(10)]\n random_rewards.extend(seed_rewards)\n print(\n f\"Seed {seed} β€” Mean: {sum(seed_rewards) / len(seed_rewards):.3f}, Best: {max(seed_rewards):.3f}\"\n )\n\nprint(f\"\\nMean random reward: {sum(random_rewards) / len(random_rewards):.3f}\")\nprint(f\"Mean trained reward: {sum(trained_rewards) / len(trained_rewards):.3f}\")"
362
  },
363
  {
364
  "cell_type": "markdown",
365
  "id": "cb1e1581032b452c9409d6c6813c49d1",
366
  "metadata": {},
367
- "source": "## 10. Connect to Deployed HF Space\n\nDemonstrate connecting to the live environment on Hugging Face Spaces through the typed OpenEnv client and running the trained model against it."
368
  },
369
  {
370
  "cell_type": "code",
@@ -372,7 +784,119 @@
372
  "id": "379cbbc1e968416e875cc15c1202d7eb",
373
  "metadata": {},
374
  "outputs": [],
375
- "source": "import requests\n\nfrom fusion_lab.client import FusionLabClient\n\nHF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n\n# Check health\nhealth = requests.get(f\"{HF_SPACE_URL}/health\").json()\nprint(f\"HF Space status: {health['status']}\")\n\n# Get task description\ntask = requests.get(f\"{HF_SPACE_URL}/task\").json()\nprint(f\"\\nTask: {task['description']}\")\nprint(f\"Constraints: {task['constraints']}\")\nprint(f\"Budget: {task['budget']}\")\n\nwith FusionLabClient(base_url=HF_SPACE_URL) as env:\n reset_result = env.reset(seed=42)\n remote_obs = reset_result.observation\n print(f\"\\nRemote reset β€” max_elongation: {remote_obs.max_elongation:.4f}\")\n print(f\" aspect_ratio: {remote_obs.aspect_ratio:.4f}\")\n print(f\" constraints_satisfied: {remote_obs.constraints_satisfied}\")\n print(f\" budget_remaining: {remote_obs.budget_remaining}\")\n\n # Generate an action plan from the trained model\n prompt = build_prompt(remote_obs)\n inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n with torch.no_grad():\n outputs = model.generate(\n **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n )\n completion = tokenizer.decode(\n outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n )\n actions = parse_action_plan(completion)\n\n print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n for i, action in enumerate(actions[:BUDGET], start=1):\n if action.intent == \"submit\":\n continue\n result = env.step(action)\n step_obs = result.observation\n reward = float(result.reward) if result.reward is not None else 0.0\n print(\n f\" Step {i}: {action.intent} {action.parameter or ''} \"\n f\"{action.direction or ''} {action.magnitude or ''} \"\n f\"β†’ reward={reward:.3f}, score={step_obs.p1_score:.4f}, terms={reward_term_summary(step_obs)}\"\n )\n if result.done:\n print(f\" Episode done. Final score: {step_obs.p1_score:.4f}\")\n break\nprint(\"\\nEnvironment is live and accessible for training and evaluation.\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
376
  }
377
  ],
378
  "metadata": {
 
4
  "cell_type": "markdown",
5
  "id": "7fb27b941602401d91542211134fc71a",
6
  "metadata": {},
7
+ "source": "# Fusion Design Lab β€” GRPO Training\n\nTrain an LLM to optimize stellarator fusion reactor designs using **GRPO** (Group Relative Policy Optimization) with **HF TRL**.\n\nThe agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n- `aspect_ratio ≀ 4.0`\n- `average_triangularity ≀ -0.5`\n- `abs(edge_iota_over_nfp) β‰₯ 0.3`\n\nEach episode has **6 evaluations** budgeted. The notebook now trains on the same live **low-fidelity environment contract** used by the repo runtime: `run`, `restore_best`, and explicit terminal `submit` all stay on the same verifier surface. Higher-fidelity checks live outside the notebook reward loop.\n\n**Environment deployed at**: https://creativeengineer-fusion-design-lab.hf.space\n\n**Runtime**: Select GPU via `Runtime > Change runtime type`. The notebook automatically uses `fp16` on T4/V100-class GPUs and `bf16` on Ampere-or-newer GPUs."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  },
9
  {
10
  "cell_type": "markdown",
 
20
  "id": "9a63283cbaf04dbcab1f6479b197f3a8",
21
  "metadata": {},
22
  "outputs": [],
23
+ "source": [
24
+ "%%capture\n",
25
+ "import importlib.util\n",
26
+ "import os\n",
27
+ "import shutil\n",
28
+ "import subprocess\n",
29
+ "import sys\n",
30
+ "\n",
31
+ "\n",
32
+ "def run_checked(command: list[str]) -> None:\n",
33
+ " subprocess.run(command, check=True)\n",
34
+ "\n",
35
+ "\n",
36
+ "def maybe_install_build_deps() -> None:\n",
37
+ " if sys.platform != \"linux\":\n",
38
+ " return\n",
39
+ " apt_get = shutil.which(\"apt-get\")\n",
40
+ " if apt_get is None or os.geteuid() != 0:\n",
41
+ " return\n",
42
+ " run_checked([apt_get, \"update\", \"-qq\"])\n",
43
+ " run_checked(\n",
44
+ " [\n",
45
+ " apt_get,\n",
46
+ " \"install\",\n",
47
+ " \"-y\",\n",
48
+ " \"-qq\",\n",
49
+ " \"cmake\",\n",
50
+ " \"ninja-build\",\n",
51
+ " \"g++\",\n",
52
+ " \"gfortran\",\n",
53
+ " \"libnetcdf-dev\",\n",
54
+ " \"libnetcdff-dev\",\n",
55
+ " ]\n",
56
+ " )\n",
57
+ "\n",
58
+ "\n",
59
+ "def ensure_pip() -> None:\n",
60
+ " if importlib.util.find_spec(\"pip\") is None:\n",
61
+ " run_checked([sys.executable, \"-m\", \"ensurepip\", \"--upgrade\"])\n",
62
+ "\n",
63
+ "\n",
64
+ "def install_python_deps() -> None:\n",
65
+ " ensure_pip()\n",
66
+ " run_checked(\n",
67
+ " [\n",
68
+ " sys.executable,\n",
69
+ " \"-m\",\n",
70
+ " \"pip\",\n",
71
+ " \"install\",\n",
72
+ " \"trl==0.29.0\",\n",
73
+ " \"peft>=0.15.0,<1.0\",\n",
74
+ " \"bitsandbytes>=0.45.0,<1.0\",\n",
75
+ " \"datasets>=3.0.0,<4.0\",\n",
76
+ " \"matplotlib>=3.9.0,<4.0\",\n",
77
+ " \"accelerate>=1.3.0,<2.0\",\n",
78
+ " ]\n",
79
+ " )\n",
80
+ "\n",
81
+ "\n",
82
+ "maybe_install_build_deps()\n",
83
+ "install_python_deps()\n",
84
+ "\n",
85
+ "if importlib.util.find_spec(\"torch\") is None:\n",
86
+ " raise RuntimeError(\n",
87
+ " \"PyTorch is not installed in this kernel. Use a CUDA-enabled Colab runtime \"\n",
88
+ " \"or a Northflank PyTorch GPU notebook image before running this notebook.\"\n",
89
+ " )"
90
+ ]
91
  },
92
  {
93
  "cell_type": "markdown",
 
101
  "id": "72eea5119410473aa328ad9291626812",
102
  "metadata": {},
103
  "outputs": [],
104
+ "source": [
105
+ "import importlib\n",
106
+ "import torch\n",
107
+ "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
108
+ "from peft import LoraConfig, get_peft_model\n",
109
+ "\n",
110
+ "MODEL_NAME = \"Qwen/Qwen3-4B\"\n",
111
+ "MAX_SEQ_LENGTH = 2048\n",
112
+ "\n",
113
+ "if not torch.cuda.is_available():\n",
114
+ " raise RuntimeError(\"This notebook requires a CUDA GPU runtime.\")\n",
115
+ "gpu_major, _ = torch.cuda.get_device_capability()\n",
116
+ "use_bf16 = gpu_major >= 8\n",
117
+ "compute_dtype = torch.bfloat16 if use_bf16 else torch.float16\n",
118
+ "\n",
119
+ "bnb_config = BitsAndBytesConfig(\n",
120
+ " load_in_4bit=True,\n",
121
+ " bnb_4bit_quant_type=\"nf4\",\n",
122
+ " bnb_4bit_use_double_quant=True,\n",
123
+ " bnb_4bit_compute_dtype=compute_dtype,\n",
124
+ ")\n",
125
+ "\n",
126
+ "attn_impl = \"flash_attention_2\" if importlib.util.find_spec(\"flash_attn\") else \"sdpa\"\n",
127
+ "\n",
128
+ "model = AutoModelForCausalLM.from_pretrained(\n",
129
+ " MODEL_NAME,\n",
130
+ " quantization_config=bnb_config,\n",
131
+ " torch_dtype=compute_dtype,\n",
132
+ " device_map=\"auto\",\n",
133
+ " attn_implementation=attn_impl,\n",
134
+ ")\n",
135
+ "\n",
136
+ "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
137
+ "if tokenizer.pad_token is None:\n",
138
+ " tokenizer.pad_token = tokenizer.eos_token\n",
139
+ "\n",
140
+ "lora_config = LoraConfig(\n",
141
+ " r=32,\n",
142
+ " lora_alpha=32,\n",
143
+ " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
144
+ " lora_dropout=0.0,\n",
145
+ " task_type=\"CAUSAL_LM\",\n",
146
+ ")\n",
147
+ "model = get_peft_model(model, lora_config)\n",
148
+ "model.gradient_checkpointing_enable()\n",
149
+ "model.print_trainable_parameters()\n",
150
+ "dtype_name = \"bf16\" if use_bf16 else \"fp16\"\n",
151
+ "print(f\"Model loaded: {MODEL_NAME} (attn: {attn_impl}, dtype: {dtype_name})\")"
152
+ ]
153
  },
154
  {
155
  "cell_type": "markdown",
156
  "id": "8edb47106e1a46a883d545849b8ab81b",
157
  "metadata": {},
158
+ "source": "## 3. Setup Stellarator Environment\n\nInstall the environment from the checked-out Fusion Design Lab repository when it is available in the runtime. If the notebook is running in a fresh Colab session, clone the public repo first and then install it in editable mode. This keeps the notebook bound to the same `server/environment.py` Reward V2 code and `server/physics.py` verifier code that ship with the notebook, instead of a potentially stale deployment copy."
159
  },
160
  {
161
  "cell_type": "code",
 
165
  "outputs": [],
166
  "source": [
167
  "%%capture\n",
168
+ "from pathlib import Path\n",
169
+ "from typing import Final\n",
170
+ "\n",
171
+ "REPO_URL = \"https://github.com/jungdaesuh/fusion-design-lab.git\"\n",
172
+ "EXPECTED_REPO_FILES: Final[tuple[str, ...]] = (\n",
173
+ " \"pyproject.toml\",\n",
174
+ " \"server/environment.py\",\n",
175
+ " \"server/physics.py\",\n",
176
+ " \"fusion_lab/models.py\",\n",
177
+ " \"fusion_lab/llm_agent.py\",\n",
178
+ " \"training/notebooks/fusion_design_lab_training.ipynb\",\n",
179
+ ")\n",
180
+ "\n",
181
+ "\n",
182
+ "def _is_valid_repo_root(candidate: Path) -> bool:\n",
183
+ " return candidate.is_dir() and all((candidate / item).exists() for item in EXPECTED_REPO_FILES)\n",
184
+ "\n",
185
+ "\n",
186
+ "def resolve_repo_root() -> Path:\n",
187
+ " candidates = [\n",
188
+ " Path.cwd(),\n",
189
+ " Path.cwd().parent,\n",
190
+ " Path(\"/content/fusion-design-lab\"),\n",
191
+ " Path(\"/home/jovyan/fusion-design-lab\"),\n",
192
+ " Path.home() / \"fusion-design-lab\",\n",
193
+ " ]\n",
194
+ " for candidate in candidates:\n",
195
+ " if _is_valid_repo_root(candidate):\n",
196
+ " return candidate.resolve()\n",
197
+ "\n",
198
+ " target = (\n",
199
+ " Path(\"/content/fusion-design-lab\")\n",
200
+ " if \"google.colab\" in sys.modules\n",
201
+ " else Path.home() / \"fusion-design-lab\"\n",
202
+ " )\n",
203
+ " if not target.exists():\n",
204
+ " subprocess.run([\"git\", \"clone\", REPO_URL, str(target)], check=True)\n",
205
+ " if not _is_valid_repo_root(target):\n",
206
+ " raise RuntimeError(\n",
207
+ " \"Could not locate a complete fusion-design-lab repository at {target}.\".format(\n",
208
+ " target=target\n",
209
+ " )\n",
210
+ " )\n",
211
+ " return target.resolve()\n",
212
+ "\n",
213
+ "\n",
214
+ "REPO_ROOT = resolve_repo_root()\n",
215
+ "os.chdir(REPO_ROOT)\n",
216
+ "subprocess.run([sys.executable, \"-m\", \"pip\", \"install\", \"-e\", str(REPO_ROOT)], check=True)"
217
  ]
218
  },
219
  {
 
223
  "metadata": {},
224
  "outputs": [],
225
  "source": [
226
+ "import inspect\n",
227
  "import json\n",
228
+ "import os\n",
229
+ "from pathlib import Path\n",
230
  "from typing import Final\n",
231
  "\n",
232
  "from fusion_lab.llm_agent import (\n",
233
  " RUN_DIRECTIONS,\n",
234
  " RUN_MAGNITUDES,\n",
235
  " RUN_PARAMETERS,\n",
236
+ " build_messages,\n",
237
  " parse_action_plan,\n",
238
  " run_episode_with_actions,\n",
239
  ")\n",
240
  "from fusion_lab.models import StellaratorAction\n",
241
  "from server.contract import RESET_SEEDS\n",
242
  "from server.environment import BUDGET, StellaratorEnvironment\n",
243
+ "from server.physics import evaluate_boundary\n",
244
  "\n",
245
  "RUN_ACTION_SPECS: Final[list[dict[str, str]]] = [\n",
246
  " {\"intent\": \"run\", \"parameter\": p, \"direction\": d, \"magnitude\": m}\n",
 
258
  "print(\n",
259
  " f\"Environment ready. Initial score: {obs.p1_score:.4f}, feasibility: {obs.p1_feasibility:.4f}\"\n",
260
  ")\n",
261
+ "print(f\"Budget: {obs.budget_remaining}, Constraints satisfied: {obs.constraints_satisfied}\")\n",
262
+ "\n",
263
+ "\n",
264
+ "def _assert_expected_source(source: str | Path, *, expected: Path, label: str) -> Path:\n",
265
+ " source_path = Path(source or \"\").resolve()\n",
266
+ " if source_path.name != expected.name:\n",
267
+ " raise RuntimeError(f\"Expected {label} to come from {expected}, got {source_path}\")\n",
268
+ " if source_path != expected.resolve():\n",
269
+ " raise RuntimeError(\n",
270
+ " f\"Expected {label} to come from {expected}, got {source_path}. This indicates an environment or module path mismatch.\"\n",
271
+ " )\n",
272
+ " return source_path\n",
273
+ "\n",
274
+ "\n",
275
+ "reward_source = _assert_expected_source(\n",
276
+ " inspect.getsourcefile(StellaratorEnvironment._compute_reward_breakdown),\n",
277
+ " expected=REPO_ROOT / \"server\" / \"environment.py\",\n",
278
+ " label=\"Reward V2\",\n",
279
+ ")\n",
280
+ "verifier_source = _assert_expected_source(\n",
281
+ " inspect.getsourcefile(evaluate_boundary),\n",
282
+ " expected=REPO_ROOT / \"server\" / \"physics.py\",\n",
283
+ " label=\"Verifier\",\n",
284
+ ")\n",
285
+ "print(f\"Reward source bound to: {reward_source}\")\n",
286
+ "print(f\"Verifier source bound to: {verifier_source}\")\n",
287
+ "\n",
288
+ "\n",
289
+ "def render_generation_prompt(observation):\n",
290
+ " return tokenizer.apply_chat_template(\n",
291
+ " list(build_messages(observation)),\n",
292
+ " tokenize=False,\n",
293
+ " add_generation_prompt=True,\n",
294
+ " )"
295
  ]
296
  },
297
  {
 
314
  "# Shared helper smoke test\n",
315
  "env = StellaratorEnvironment()\n",
316
  "obs = env.reset(seed=0)\n",
317
+ "prompt = render_generation_prompt(obs)\n",
318
  "print(prompt[:500])\n",
319
  "print(\"...\")\n",
320
  "\n",
 
354
  "prompts = []\n",
355
  "for seed_idx in range(len(RESET_SEEDS)):\n",
356
  " obs = StellaratorEnvironment().reset(seed=seed_idx)\n",
357
+ " prompt = render_generation_prompt(obs)\n",
358
  " # Repeat each seed to create a larger training set\n",
359
  " for _ in range(50):\n",
360
  " prompts.append({\"prompt\": prompt, \"seed_idx\": seed_idx})\n",
 
368
  "cell_type": "markdown",
369
  "id": "504fb2a444614c0babb325280ed9130a",
370
  "metadata": {},
371
+ "source": "## 6. Reward Function\n\nThe GRPO training signal comes from **Reward V2**, the environment's built-in reward computed per step in `server/environment.py:_compute_reward_breakdown(...)`. Each generated action plan is rolled out in the local environment, and the cumulative reward across all steps becomes the single scalar GRPO optimizes.\n\nThe notebook now uses the same live action contract as the environment itself: plans may include explicit `submit`, and `submit` stays on the same low-fidelity verifier surface as the rest of the episode. Empty or unparseable outputs receive a trainer-side fallback penalty of **βˆ’3.0**. Right below, the notebook runs a `submit` smoke so Colab and Northflank confirm the live terminal submit path is wired correctly.\n\n---\n\n### Reward V2 Breakdown\n\nEvery step's reward is the sum of the applicable terms below.\n\n#### 1. Step Costs (every non-submit step)\n\n| Term | Value | Condition |\n|------|-------|-----------|\n| `step_cost` | βˆ’0.05 / βˆ’0.10 / βˆ’0.20 | `run` small / medium / large magnitude |\n| `step_cost` | βˆ’0.10 | `restore_best` |\n| `no_progress_penalty` | βˆ’0.20 | `no_progress_steps β‰₯ 3` (consecutive non-improving steps) |\n| `repeat_state_penalty` | βˆ’0.15 | Revisiting a previously seen parameter state without improvement |\n| `invalid_action_penalty` | βˆ’1.0 | `run` action missing parameter, direction, or magnitude |\n\n#### 2. Evaluation Failure\n\n| Term | Value | Condition |\n|------|-------|-----------|\n| `failure_penalty` | βˆ’2.0 | Physics evaluation failed |\n| `failure_submit_penalty` | βˆ’1.0 | Failed evaluation on `submit` (additional) |\n| `failure_budget_penalty` | βˆ’0.5 | Failed evaluation on last budget step (additional) |\n| `recovery_bonus` | +1.0 | Recovering from a previously failed evaluation |\n\nIf evaluation fails, **only** failure terms and step costs apply β€” the feasibility/objective terms below are skipped.\n\n#### 3. Feasibility Path (constraints NOT all satisfied)\n\nWhen current or previous state has violated constraints:\n\n| Term | Formula / Value | Purpose |\n|------|-----------------|---------|\n| `feasibility_crossing_bonus` | +3.0 | Crossing from infeasible β†’ feasible |\n| `feasibility_regression_penalty` | βˆ’3.0 | Crossing from feasible β†’ infeasible |\n| `near_feasible_bonus` | +1.0 | Feasibility dropping below 0.02 threshold |\n| `feasibility_delta_reward` | `(prev_feasibility βˆ’ curr_feasibility) Γ— 2.0` | Progress toward satisfying constraints |\n| `best_feasibility_bonus` | `max(0, best_feas_before βˆ’ curr_feas) Γ— 1.5` | New-best feasibility while still infeasible |\n| `triangularity_repair_reward` | `(prev_tri_violation βˆ’ curr_tri_violation) Γ— 2.0` | Reducing triangularity constraint gap |\n| `aspect_ratio_repair_reward` | `(prev_ar_violation βˆ’ curr_ar_violation) Γ— 1.0` | Reducing aspect ratio constraint gap |\n| `iota_repair_reward` | `(prev_iota_violation βˆ’ curr_iota_violation) Γ— 1.0` | Reducing iota constraint gap |\n\n#### 4. Objective Path (both prev and curr constraints satisfied)\n\nWhen the design is feasible and stays feasible:\n\n| Term | Formula / Value | Purpose |\n|------|-----------------|---------|\n| `objective_delta_reward` | `(prev_max_elongation βˆ’ curr_max_elongation) Γ— 10.0` | Lowering max elongation (the optimization target) |\n| `best_score_bonus` | `max(0, curr_score βˆ’ best_score_before) Γ— 0.75` | New-best P1 score while feasible |\n\n#### 5. Terminal Bonus (on `submit` or final budget step)\n\n| Term | Formula / Value | Condition |\n|------|-----------------|-----------|\n| `terminal_improvement_bonus` | `5.0 Γ— ratio` (submit) / `2.0 Γ— ratio` (last step) | Feasible and score > initial score |\n| `terminal_budget_bonus` | `budget_remaining / budget_total` | Submit only, with improvement |\n| `terminal_no_improvement_penalty` | βˆ’1.0 (submit) / βˆ’0.5 (last step) | No improvement over initial |\n\nWhere `ratio = (curr_score βˆ’ base_score) / max(1.0 βˆ’ base_score, 1e-6)`.\n\n---\n\n**Constants** (from `server/environment.py`): `FAILURE_PENALTY=βˆ’2.0`, `FEASIBILITY_DELTA_WEIGHT=2.0`, `TRIANGULARITY_REPAIR_WEIGHT=2.0`, `ASPECT_RATIO_REPAIR_WEIGHT=1.0`, `IOTA_REPAIR_WEIGHT=1.0`, `BEST_FEASIBILITY_BONUS_WEIGHT=1.5`, `BEST_SCORE_BONUS_WEIGHT=0.75`, `NEAR_FEASIBILITY_THRESHOLD=0.02`, `NEAR_FEASIBILITY_BONUS=1.0`, `NO_PROGRESS_STEP_THRESHOLD=3`, `NO_PROGRESS_PENALTY=βˆ’0.2`, `REPEAT_STATE_PENALTY=βˆ’0.15`, `RESTORE_STEP_COST=βˆ’0.1`, `STEP_COST_BY_MAGNITUDE={small: βˆ’0.05, medium: βˆ’0.10, large: βˆ’0.20}`."
 
 
 
 
 
 
372
  },
373
  {
374
  "cell_type": "code",
 
377
  "metadata": {},
378
  "outputs": [],
379
  "source": [
 
 
 
380
  "def environment_reward_fn(\n",
381
  " completions: list[str], seed_idx: list[int] | None = None, **kwargs\n",
382
  ") -> list[float]:\n",
383
  " \"\"\"Execute each action plan in the environment and return cumulative reward.\n",
384
  "\n",
385
  " This is the sole GRPO training signal in the notebook. It uses the live\n",
386
+ " low-fidelity environment reward path and allows explicit submit so the\n",
387
+ " trainer stays aligned to the same action contract as the environment.\n",
388
+ " Empty or unparseable outputs still receive a trainer-side fallback\n",
389
+ " penalty of -3.0. Environment/runtime bugs should raise directly so they\n",
390
+ " are not misclassified as bad model outputs.\n",
391
  " \"\"\"\n",
392
  " rewards = []\n",
393
  " seeds = seed_idx if seed_idx is not None else [0] * len(completions)\n",
394
  " for i, completion in enumerate(completions):\n",
395
+ " actions = parse_action_plan(completion)\n",
396
+ " if len(actions) == 0:\n",
 
 
 
 
 
 
 
 
 
 
397
  " rewards.append(-3.0)\n",
398
+ " continue\n",
399
+ " trace = run_episode_with_actions(\n",
400
+ " actions,\n",
401
+ " seed_idx=int(seeds[i]) % len(RESET_SEEDS),\n",
402
+ " )\n",
403
+ " rewards.append(trace.total_reward)\n",
404
  " return rewards\n",
405
  "\n",
406
  "\n",
 
421
  " },\n",
422
  " ]\n",
423
  ")\n",
424
+ "print(f\"Environment reward: {environment_reward_fn([test_plan], seed_idx=[0])}\")\n",
425
  "\n",
426
+ "# Test terminal submit path on the same live verifier surface\n",
427
+ "submit_smoke_plan = json.dumps(\n",
428
  " [\n",
429
  " {\n",
430
  " \"intent\": \"run\",\n",
431
  " \"parameter\": \"triangularity_scale\",\n",
432
  " \"direction\": \"increase\",\n",
433
+ " \"magnitude\": \"small\",\n",
434
  " },\n",
435
+ " {\"intent\": \"submit\"},\n",
436
  " ]\n",
437
  ")\n",
438
+ "submit_smoke_trace = run_episode_with_actions(\n",
439
+ " parse_action_plan(submit_smoke_plan),\n",
440
+ " seed_idx=0,\n",
441
+ ")\n",
442
+ "final_step = submit_smoke_trace.steps[-1]\n",
443
+ "if final_step.reward_breakdown.get(\"intent\") != \"submit\":\n",
444
+ " raise RuntimeError(\"Submit smoke did not end on the terminal submit reward path.\")\n",
445
+ "if final_step.evaluation_fidelity != \"low\":\n",
446
+ " raise RuntimeError(\n",
447
+ " f\"Expected unified low-fidelity submit path, got {final_step.evaluation_fidelity!r}.\"\n",
448
+ " )\n",
449
+ "print(\"Submit smoke confirmed live terminal path:\")\n",
450
+ "print(\n",
451
+ " f\" final action={final_step.action_label}, fidelity={final_step.evaluation_fidelity}, \"\n",
452
+ " f\"reward={final_step.reward:+.3f}\"\n",
453
+ ")\n",
454
+ "print(f\" submit reward terms={final_step.reward_breakdown}\")"
455
+ ]
456
+ },
457
+ {
458
+ "cell_type": "markdown",
459
+ "id": "hprgv01ibkq",
460
+ "metadata": {},
461
+ "source": "## 6b. Untrained Model Baseline\n\nEvaluate the base model **before any GRPO training** on all 3 seeds using **greedy decoding** (`do_sample=False`). Greedy decoding is deterministic β€” the same model + prompt always produces the same output β€” so the before/after comparison is fully reproducible across reruns."
462
+ },
463
+ {
464
+ "cell_type": "code",
465
+ "execution_count": null,
466
+ "id": "77dt4zyn6it",
467
+ "metadata": {},
468
+ "outputs": [],
469
+ "source": [
470
+ "MAX_PROMPT_LENGTH = 768\n",
471
+ "MAX_COMPLETION_LENGTH = MAX_SEQ_LENGTH - MAX_PROMPT_LENGTH\n",
472
+ "\n",
473
+ "N_RANDOM_ROLLOUTS = 10\n",
474
+ "\n",
475
+ "\n",
476
+ "def reward_term_summary(step_or_obs: object) -> str:\n",
477
+ " \"\"\"Format non-zero reward terms for display.\"\"\"\n",
478
+ " breakdown_obj = getattr(step_or_obs, \"reward_breakdown\")\n",
479
+ " breakdown = (\n",
480
+ " breakdown_obj.model_dump() if hasattr(breakdown_obj, \"model_dump\") else breakdown_obj\n",
481
+ " )\n",
482
+ " terms = []\n",
483
+ " for key, value in breakdown.items():\n",
484
+ " if key in {\n",
485
+ " \"intent\",\n",
486
+ " \"total\",\n",
487
+ " \"evaluation_failed\",\n",
488
+ " \"recovered_from_failure\",\n",
489
+ " \"reference_constraints_satisfied\",\n",
490
+ " \"reference_score\",\n",
491
+ " \"reference_feasibility\",\n",
492
+ " \"reference_max_elongation\",\n",
493
+ " \"initial_reference_score\",\n",
494
+ " \"terminal_score_ratio\",\n",
495
+ " }:\n",
496
+ " continue\n",
497
+ " if isinstance(value, (int, float)) and float(value) != 0.0:\n",
498
+ " terms.append(f\"{key}={float(value):+.3f}\")\n",
499
+ " return \", \".join(terms) if terms else \"none\"\n",
500
+ "\n",
501
+ "\n",
502
+ "def run_episode_with_model(seed_idx: int) -> tuple[float, list[str], bool]:\n",
503
+ " \"\"\"Run one episode using the current model state (greedy decoding).\n",
504
+ "\n",
505
+ " Greedy decoding (do_sample=False) makes the output fully deterministic\n",
506
+ " for a given model state and seed, so a single rollout per seed is\n",
507
+ " sufficient for reproducible evaluation.\n",
508
+ " \"\"\"\n",
509
+ " env = StellaratorEnvironment()\n",
510
+ " obs = env.reset(seed=seed_idx)\n",
511
+ " prompt = render_generation_prompt(obs)\n",
512
+ " inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
513
+ " with torch.no_grad():\n",
514
+ " outputs = model.generate(\n",
515
+ " **inputs,\n",
516
+ " max_new_tokens=MAX_COMPLETION_LENGTH,\n",
517
+ " do_sample=False,\n",
518
+ " )\n",
519
+ " completion = tokenizer.decode(\n",
520
+ " outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
521
+ " )\n",
522
+ " actions = parse_action_plan(completion)\n",
523
+ " if len(actions) == 0:\n",
524
+ " return -3.0, [\"(no valid actions parsed)\"], False\n",
525
+ " episode = run_episode_with_actions(actions, seed_idx=seed_idx)\n",
526
+ " trace = [\n",
527
+ " (\n",
528
+ " f\"{step.action_label} β†’ reward={step.reward:.3f} \"\n",
529
+ " f\"score={step.p1_score:.4f} feasible={step.constraints_satisfied}\"\n",
530
+ " )\n",
531
+ " for step in episode.steps\n",
532
+ " ]\n",
533
+ " return episode.total_reward, trace, episode.constraints_satisfied\n",
534
+ "\n",
535
+ "\n",
536
+ "model.eval()\n",
537
+ "print(\"=\" * 60)\n",
538
+ "print(\"UNTRAINED MODEL BASELINE (before GRPO) β€” greedy, deterministic\")\n",
539
+ "print(\"=\" * 60)\n",
540
+ "untrained_rewards = []\n",
541
+ "for seed in range(len(RESET_SEEDS)):\n",
542
+ " reward, trace, feasible = run_episode_with_model(seed)\n",
543
+ " untrained_rewards.append(reward)\n",
544
+ " print(f\"\\nSeed {seed} β€” Total reward: {reward:.3f}, Feasible: {feasible}\")\n",
545
+ " for line in trace:\n",
546
+ " print(f\" {line}\")\n",
547
+ "\n",
548
+ "untrained_mean = sum(untrained_rewards) / len(untrained_rewards)\n",
549
+ "print(f\"\\nUntrained mean reward: {untrained_mean:.3f}\")\n",
550
+ "print(\"Snapshot saved. Will compare against trained model after GRPO.\")"
551
  ]
552
  },
553
  {
 
566
  "id": "8a65eabff63a45729fe45fb5ade58bdc",
567
  "metadata": {},
568
  "outputs": [],
569
+ "source": [
570
+ "import matplotlib.pyplot as plt\n",
571
+ "\n",
572
+ "from IPython.display import clear_output, display\n",
573
+ "from transformers import TrainerCallback\n",
574
+ "from trl import GRPOConfig, GRPOTrainer\n",
575
+ "\n",
576
+ "\n",
577
+ "def extract_logged_reward(logs: dict[str, object]) -> float | None:\n",
578
+ " reward_value = logs.get(\"reward\")\n",
579
+ " if reward_value is None:\n",
580
+ " reward_value = logs.get(\"rewards/environment_reward_fn\")\n",
581
+ " if isinstance(reward_value, (int, float)):\n",
582
+ " return float(reward_value)\n",
583
+ " return None\n",
584
+ "\n",
585
+ "\n",
586
+ "class LiveTrainingMonitorCallback(TrainerCallback):\n",
587
+ " def __init__(self, max_steps: int) -> None:\n",
588
+ " self.max_steps = max_steps\n",
589
+ " self.loss_steps: list[int] = []\n",
590
+ " self.losses: list[float] = []\n",
591
+ " self.reward_steps: list[int] = []\n",
592
+ " self.rewards: list[float] = []\n",
593
+ "\n",
594
+ " def _render(self, step: int) -> None:\n",
595
+ " clear_output(wait=True)\n",
596
+ " latest_loss = self.losses[-1] if self.losses else None\n",
597
+ " latest_reward = self.rewards[-1] if self.rewards else None\n",
598
+ " best_reward = max(self.rewards) if self.rewards else None\n",
599
+ " latest_loss_text = f\"{latest_loss:.4f}\" if latest_loss is not None else \"n/a\"\n",
600
+ " latest_reward_text = f\"{latest_reward:+.4f}\" if latest_reward is not None else \"n/a\"\n",
601
+ " best_reward_text = f\"{best_reward:+.4f}\" if best_reward is not None else \"n/a\"\n",
602
+ "\n",
603
+ " print(\"GRPO live monitor\")\n",
604
+ " print(f\"step: {step}/{self.max_steps}\")\n",
605
+ " print(f\"latest loss: {latest_loss_text}\")\n",
606
+ " print(f\"latest reward: {latest_reward_text}\")\n",
607
+ " print(f\"best reward so far: {best_reward_text}\")\n",
608
+ "\n",
609
+ " fig, axes = plt.subplots(1, 2, figsize=(14, 4))\n",
610
+ " if self.losses:\n",
611
+ " axes[0].plot(self.loss_steps, self.losses, color=\"#0b6efd\", linewidth=2)\n",
612
+ " axes[0].scatter(self.loss_steps[-1], self.losses[-1], color=\"#0b6efd\", s=40)\n",
613
+ " else:\n",
614
+ " axes[0].text(0.5, 0.5, \"waiting for loss logs\", ha=\"center\", va=\"center\")\n",
615
+ " axes[0].set_xlabel(\"Step\")\n",
616
+ " axes[0].set_ylabel(\"Loss\")\n",
617
+ " axes[0].set_title(\"Training Loss\")\n",
618
+ " axes[0].grid(True, alpha=0.3)\n",
619
+ "\n",
620
+ " if self.rewards:\n",
621
+ " axes[1].plot(\n",
622
+ " self.reward_steps,\n",
623
+ " self.rewards,\n",
624
+ " color=\"#198754\",\n",
625
+ " linewidth=2,\n",
626
+ " marker=\"o\",\n",
627
+ " markersize=3,\n",
628
+ " )\n",
629
+ " axes[1].scatter(self.reward_steps[-1], self.rewards[-1], color=\"#198754\", s=40)\n",
630
+ " else:\n",
631
+ " axes[1].text(0.5, 0.5, \"waiting for reward logs\", ha=\"center\", va=\"center\")\n",
632
+ " axes[1].axhline(0.0, color=\"0.7\", linewidth=1, linestyle=\"--\")\n",
633
+ " axes[1].set_xlabel(\"Step\")\n",
634
+ " axes[1].set_ylabel(\"Mean Reward\")\n",
635
+ " axes[1].set_title(\"Environment Reward\")\n",
636
+ " axes[1].grid(True, alpha=0.3)\n",
637
+ "\n",
638
+ " fig.suptitle(\"Fusion Design Lab β€” Live GRPO Monitor\", fontsize=14, fontweight=\"bold\")\n",
639
+ " fig.tight_layout()\n",
640
+ " display(fig)\n",
641
+ " plt.close(fig)\n",
642
+ "\n",
643
+ " def on_log(self, args, state, control, logs=None, **kwargs):\n",
644
+ " if not state.is_world_process_zero or not logs:\n",
645
+ " return\n",
646
+ "\n",
647
+ " step = int(state.global_step)\n",
648
+ " loss_value = logs.get(\"loss\")\n",
649
+ " if isinstance(loss_value, (int, float)):\n",
650
+ " if self.loss_steps and self.loss_steps[-1] == step:\n",
651
+ " self.losses[-1] = float(loss_value)\n",
652
+ " else:\n",
653
+ " self.loss_steps.append(step)\n",
654
+ " self.losses.append(float(loss_value))\n",
655
+ "\n",
656
+ " reward_value = extract_logged_reward(logs)\n",
657
+ " if reward_value is not None:\n",
658
+ " if self.reward_steps and self.reward_steps[-1] == step:\n",
659
+ " self.rewards[-1] = reward_value\n",
660
+ " else:\n",
661
+ " self.reward_steps.append(step)\n",
662
+ " self.rewards.append(reward_value)\n",
663
+ "\n",
664
+ " self._render(step)\n",
665
+ "\n",
666
+ " def on_train_end(self, args, state, control, **kwargs):\n",
667
+ " if state.is_world_process_zero:\n",
668
+ " self._render(int(state.global_step))\n",
669
+ "\n",
670
+ "\n",
671
+ "training_args = GRPOConfig(\n",
672
+ " output_dir=\"./grpo_fusion_output\",\n",
673
+ " learning_rate=5e-5,\n",
674
+ " num_generations=8,\n",
675
+ " max_completion_length=MAX_COMPLETION_LENGTH,\n",
676
+ " per_device_train_batch_size=8,\n",
677
+ " gradient_accumulation_steps=1,\n",
678
+ " max_steps=60,\n",
679
+ " temperature=1.0,\n",
680
+ " logging_steps=1,\n",
681
+ " save_steps=20,\n",
682
+ " bf16=use_bf16,\n",
683
+ " fp16=not use_bf16,\n",
684
+ " report_to=\"none\",\n",
685
+ " seed=42,\n",
686
+ ")\n",
687
+ "\n",
688
+ "live_training_callback = LiveTrainingMonitorCallback(max_steps=training_args.max_steps)\n",
689
+ "\n",
690
+ "trainer = GRPOTrainer(\n",
691
+ " model=model,\n",
692
+ " processing_class=tokenizer,\n",
693
+ " reward_funcs=[environment_reward_fn],\n",
694
+ " args=training_args,\n",
695
+ " train_dataset=dataset,\n",
696
+ " callbacks=[live_training_callback],\n",
697
+ ")\n",
698
+ "\n",
699
+ "print(\"Starting GRPO training...\")\n",
700
+ "train_result = trainer.train()\n",
701
+ "print(f\"Training complete. Total steps: {train_result.global_step}\")"
702
+ ]
703
  },
704
  {
705
  "cell_type": "markdown",
 
708
  "source": [
709
  "## 8. Training Results\n",
710
  "\n",
711
+ "The training cell above renders a live dashboard while GRPO runs. This section saves a clean post-training summary figure."
712
  ]
713
  },
714
  {
 
718
  "metadata": {},
719
  "outputs": [],
720
  "source": [
 
 
721
  "log_history = trainer.state.log_history\n",
722
  "steps = [entry[\"step\"] for entry in log_history if \"loss\" in entry]\n",
723
  "losses = [entry[\"loss\"] for entry in log_history if \"loss\" in entry]\n",
 
762
  "cell_type": "markdown",
763
  "id": "8309879909854d7188b41380fd92a7c3",
764
  "metadata": {},
765
+ "source": "## 9. Evaluate Trained Policy\n\nCompare the GRPO-trained model against the **untrained baseline** (captured in Section 6b) and random action selection on the same **live low-fidelity environment contract** used during GRPO, including explicit terminal `submit`.\n\n- **Model evaluations** use deterministic greedy decoding (`do_sample=False`), so results are fully reproducible across reruns. One rollout per seed suffices.\n- **Random baseline** remains stochastic, so it averages 10 rollouts per seed for a stable estimate, then explicitly submits its final candidate to stay on the same terminal contract."
 
 
 
 
766
  },
767
  {
768
  "cell_type": "code",
 
770
  "id": "3ed186c9a28b402fb0bc4494df01f08d",
771
  "metadata": {},
772
  "outputs": [],
773
+ "source": "import random\n\nmodel.eval()\n\n# --- Trained model (greedy = deterministic, 1 rollout per seed) ---\nprint(\"=\" * 60)\nprint(\"TRAINED MODEL (after GRPO) β€” greedy, deterministic\")\nprint(\"=\" * 60)\ntrained_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n reward, trace, feasible = run_episode_with_model(seed)\n trained_rewards.append(reward)\n print(f\"\\nSeed {seed} β€” Total reward: {reward:.3f}, Feasible: {feasible}\")\n for line in trace:\n print(f\" {line}\")\n\ntrained_mean = sum(trained_rewards) / len(trained_rewards)\n\n# --- Random baseline (stochastic, averaged over N_RANDOM_ROLLOUTS per seed) ---\nprint(\"\\n\" + \"=\" * 60)\nprint(f\"RANDOM BASELINE ({N_RANDOM_ROLLOUTS} episodes per seed)\")\nprint(\"=\" * 60)\nrandom_rewards = []\nfor seed in range(len(RESET_SEEDS)):\n seed_rewards = []\n for _ in range(N_RANDOM_ROLLOUTS):\n random_plan = [\n StellaratorAction(**random.choice(RUN_ACTION_SPECS)) for _ in range(max(BUDGET - 1, 0))\n ]\n random_plan.append(StellaratorAction(intent=\"submit\"))\n seed_rewards.append(\n run_episode_with_actions(\n random_plan,\n seed_idx=seed,\n ).total_reward\n )\n random_rewards.extend(seed_rewards)\n print(\n f\"Seed {seed} β€” Mean: {sum(seed_rewards) / len(seed_rewards):.3f}, \"\n f\"Best: {max(seed_rewards):.3f}\"\n )\n\nrandom_mean = sum(random_rewards) / len(random_rewards)\n\n# --- Before/After comparison ---\nprint(\"\\n\" + \"=\" * 60)\nprint(\"BEFORE / AFTER COMPARISON\")\nprint(\"=\" * 60)\nprint(f\" Model evals: greedy (deterministic), 1 rollout Γ— {len(RESET_SEEDS)} seeds\")\nprint(f\" Random baseline: {N_RANDOM_ROLLOUTS} rollouts Γ— {len(RESET_SEEDS)} seeds (averaged)\")\nprint()\nprint(f\"{'Agent':<25} {'Mean Reward':>12}\")\nprint(\"-\" * 39)\nprint(f\"{'Random':<25} {random_mean:>+12.3f}\")\nprint(f\"{'Untrained Qwen 3-4B':<25} {untrained_mean:>+12.3f}\")\nprint(f\"{'GRPO-trained (60 steps)':<25} {trained_mean:>+12.3f}\")\nprint()\nimprovement = trained_mean - untrained_mean\nprint(f\"GRPO improvement over untrained: {improvement:+.3f}\")\nprint(f\"GRPO improvement over random: {trained_mean - random_mean:+.3f}\")"
774
  },
775
  {
776
  "cell_type": "markdown",
777
  "id": "cb1e1581032b452c9409d6c6813c49d1",
778
  "metadata": {},
779
+ "source": "## 10. Connect to Deployed HF Space (Optional)\n\nDemonstrate connecting to the live environment on Hugging Face Spaces through the typed OpenEnv client and running the trained model against it. This section is optional and will skip cleanly if the deployment is unavailable.\n\n**Contract compatibility check:** Before running any episodes, the cell verifies that the remote `/task` endpoint returns the same budget, constraints, parameters, directions, and magnitudes as the local source code. If any field mismatches, the demo is skipped with a diagnostic message β€” this prevents silent reward or behavior divergence between the notebook and a stale deployment."
780
  },
781
  {
782
  "cell_type": "code",
 
784
  "id": "379cbbc1e968416e875cc15c1202d7eb",
785
  "metadata": {},
786
  "outputs": [],
787
+ "source": [
788
+ "import requests\n",
789
+ "\n",
790
+ "from fusion_lab.client import FusionLabClient\n",
791
+ "from server.physics import (\n",
792
+ " ASPECT_RATIO_MAX,\n",
793
+ " AVERAGE_TRIANGULARITY_MAX,\n",
794
+ " EDGE_IOTA_OVER_NFP_MIN,\n",
795
+ ")\n",
796
+ "\n",
797
+ "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
798
+ "REQUEST_TIMEOUT_SECONDS = 10\n",
799
+ "\n",
800
+ "try:\n",
801
+ " health_response = requests.get(f\"{HF_SPACE_URL}/health\", timeout=REQUEST_TIMEOUT_SECONDS)\n",
802
+ "except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as exc:\n",
803
+ " health_response = None\n",
804
+ " print(f\"Skipping remote demo β€” network error reaching HF Space: {exc}\")\n",
805
+ "\n",
806
+ "if health_response is not None and health_response.status_code != 200:\n",
807
+ " print(\n",
808
+ " \"Skipping remote demo because the HF Space is unavailable: \"\n",
809
+ " f\"/health returned {health_response.status_code}.\"\n",
810
+ " )\n",
811
+ " health_response = None\n",
812
+ "\n",
813
+ "if health_response is not None:\n",
814
+ " health = health_response.json()\n",
815
+ " print(f\"HF Space status: {health['status']}\")\n",
816
+ "\n",
817
+ " try:\n",
818
+ " task_response = requests.get(f\"{HF_SPACE_URL}/task\", timeout=REQUEST_TIMEOUT_SECONDS)\n",
819
+ " except (requests.exceptions.ConnectionError, requests.exceptions.Timeout) as exc:\n",
820
+ " task_response = None\n",
821
+ " print(f\"Skipping remote demo β€” network error reaching /task: {exc}\")\n",
822
+ "\n",
823
+ " if task_response is not None and task_response.status_code != 200:\n",
824
+ " print(\n",
825
+ " \"Skipping remote demo because the HF Space task endpoint is unavailable: \"\n",
826
+ " f\"/task returned {task_response.status_code}.\"\n",
827
+ " )\n",
828
+ " task_response = None\n",
829
+ "\n",
830
+ " # ── Contract compatibility check ──────────────────────────────────────\n",
831
+ " if task_response is not None:\n",
832
+ " task = task_response.json()\n",
833
+ " expected_contract = {\n",
834
+ " \"budget\": BUDGET,\n",
835
+ " \"constraints\": {\n",
836
+ " \"aspect_ratio_max\": ASPECT_RATIO_MAX,\n",
837
+ " \"average_triangularity_max\": AVERAGE_TRIANGULARITY_MAX,\n",
838
+ " \"abs_edge_iota_over_nfp_min\": EDGE_IOTA_OVER_NFP_MIN,\n",
839
+ " },\n",
840
+ " \"parameters\": list(RUN_PARAMETERS),\n",
841
+ " \"directions\": list(RUN_DIRECTIONS),\n",
842
+ " \"magnitudes\": list(RUN_MAGNITUDES),\n",
843
+ " }\n",
844
+ " mismatches: list[str] = []\n",
845
+ " for key in expected_contract:\n",
846
+ " remote_val = task.get(key)\n",
847
+ " local_val = expected_contract[key]\n",
848
+ " if remote_val != local_val:\n",
849
+ " mismatches.append(f\" {key}: local={local_val!r} remote={remote_val!r}\")\n",
850
+ "\n",
851
+ " if mismatches:\n",
852
+ " print(\"Skipping remote demo β€” contract mismatch between local code and HF Space:\")\n",
853
+ " for m in mismatches:\n",
854
+ " print(m)\n",
855
+ " print(\"Redeploy the Space with the current code to fix this.\")\n",
856
+ " task_response = None\n",
857
+ " else:\n",
858
+ " print(\"Contract check passed β€” remote matches local code.\")\n",
859
+ " print(f\"\\nTask: {task['description']}\")\n",
860
+ " print(f\"Constraints: {task['constraints']}\")\n",
861
+ " print(f\"Budget: {task['budget']}\")\n",
862
+ "\n",
863
+ " # ── Run trained model against remote environment ──────────────────────\n",
864
+ " if task_response is not None:\n",
865
+ " with FusionLabClient(base_url=HF_SPACE_URL) as env:\n",
866
+ " reset_result = env.reset(seed=42)\n",
867
+ " remote_obs = reset_result.observation\n",
868
+ " print(f\"\\nRemote reset β€” max_elongation: {remote_obs.max_elongation:.4f}\")\n",
869
+ " print(f\" aspect_ratio: {remote_obs.aspect_ratio:.4f}\")\n",
870
+ " print(f\" constraints_satisfied: {remote_obs.constraints_satisfied}\")\n",
871
+ " print(f\" budget_remaining: {remote_obs.budget_remaining}\")\n",
872
+ "\n",
873
+ " prompt = render_generation_prompt(remote_obs)\n",
874
+ " inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
875
+ " with torch.no_grad():\n",
876
+ " outputs = model.generate(\n",
877
+ " **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, do_sample=False\n",
878
+ " )\n",
879
+ " completion = tokenizer.decode(\n",
880
+ " outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
881
+ " )\n",
882
+ " actions = parse_action_plan(completion)\n",
883
+ "\n",
884
+ " print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n",
885
+ " for i, action in enumerate(actions[:BUDGET], start=1):\n",
886
+ " result = env.step(action)\n",
887
+ " step_obs = result.observation\n",
888
+ " reward = float(result.reward) if result.reward is not None else 0.0\n",
889
+ " print(\n",
890
+ " f\" Step {i}: {action.intent} {action.parameter or ''} \"\n",
891
+ " f\"{action.direction or ''} {action.magnitude or ''} \"\n",
892
+ " f\"β†’ reward={reward:.3f}, score={step_obs.p1_score:.4f}, \"\n",
893
+ " f\"terms={reward_term_summary(step_obs)}\"\n",
894
+ " )\n",
895
+ " if result.done:\n",
896
+ " print(f\" Episode done. Final score: {step_obs.p1_score:.4f}\")\n",
897
+ " break\n",
898
+ " print(\"\\nEnvironment is live and accessible for training and evaluation.\")"
899
+ ]
900
  }
901
  ],
902
  "metadata": {