CreativeEngineer commited on
Commit
acb992c
·
1 Parent(s): 3f7be89

docs: fix p1 parameterization blocker fallout

Browse files
README.md CHANGED
@@ -22,7 +22,8 @@ Implementation status:
22
  - docs are aligned to fresh `P1` wiring in this repo
23
  - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
24
  - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
25
- - the remaining runtime work is fixture coverage, manual playtesting, heuristic refresh, and deployment evidence
 
26
 
27
  ## Execution Status
28
 
@@ -36,6 +37,10 @@ Implementation status:
36
  - [x] Replace the synthetic evaluator with `constellaration`
37
  - [x] Add a runnable Northflank smoke workflow and note
38
  - [x] Pass the Northflank smoke test on the H100 workspace
 
 
 
 
39
  - [ ] Add tracked `P1` fixtures under `server/data/p1/`
40
  - [ ] Run manual playtesting and record the first reward pathology
41
  - [ ] Refresh the heuristic baseline for the real verifier path
@@ -43,15 +48,16 @@ Implementation status:
43
 
44
  ## Known Gaps
45
 
46
- - `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen before meaningful manual playtesting.
 
47
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
48
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
49
- - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
50
 
51
  Current mode:
52
 
53
  - strategic task choice is already locked
54
- - the next work is fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
55
  - new planning text should only appear when a real blocker forces a decision change
56
 
57
  ## Planned Repository Layout
@@ -104,12 +110,15 @@ uv sync --extra notebooks
104
 
105
  ## Immediate Next Steps
106
 
107
- 1. Add tracked `P1` fixtures under `server/data/p1`.
108
- 2. Run manual playtest episodes and record the first real reward pathology, if any.
109
- 3. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
110
- 4. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
111
- 5. Deploy the environment to HF Space.
112
- 6. Add the Colab notebook under `training/notebooks`.
 
 
 
113
 
114
  These are implementation steps, not another planning phase.
115
 
@@ -127,6 +136,10 @@ Disallowed:
127
 
128
  - porting the old planner, governor, or experiment harness into this repo
129
 
 
 
 
 
130
  ## Hackathon Working Note
131
 
132
  This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.
 
22
  - docs are aligned to fresh `P1` wiring in this repo
23
  - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
24
  - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
25
+ - the current 3-knob parameterization has been verified as blocked on P1 triangularity under the real verifier path
26
+ - the next runtime work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, and deployment evidence
27
 
28
  ## Execution Status
29
 
 
37
  - [x] Replace the synthetic evaluator with `constellaration`
38
  - [x] Add a runnable Northflank smoke workflow and note
39
  - [x] Pass the Northflank smoke test on the H100 workspace
40
+ - [x] Verify the current 3-knob family against the real low-fidelity verifier
41
+ - [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
42
+ - [ ] Split boundary construction from boundary evaluation in `server/physics.py`
43
+ - [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
44
  - [ ] Add tracked `P1` fixtures under `server/data/p1/`
45
  - [ ] Run manual playtesting and record the first reward pathology
46
  - [ ] Refresh the heuristic baseline for the real verifier path
 
48
 
49
  ## Known Gaps
50
 
51
+ - The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
52
+ - `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
53
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
54
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
55
+ - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
56
 
57
  Current mode:
58
 
59
  - strategic task choice is already locked
60
+ - the next work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
61
  - new planning text should only appear when a real blocker forces a decision change
62
 
63
  ## Planned Repository Layout
 
110
 
111
  ## Immediate Next Steps
112
 
113
+ 1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
114
+ 2. Split boundary construction from boundary evaluation in `server/physics.py`.
115
+ 3. Update the environment contract to the repaired low-dimensional family and label low-fi vs high-fi truth clearly in observations.
116
+ 4. Add tracked `P1` fixtures under `server/data/p1`.
117
+ 5. Run manual playtest episodes and record the first real reward pathology, if any.
118
+ 6. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
119
+ 7. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
120
+ 8. Deploy the environment to HF Space.
121
+ 9. Add the Colab notebook under `training/notebooks`.
122
 
123
  These are implementation steps, not another planning phase.
124
 
 
136
 
137
  - porting the old planner, governor, or experiment harness into this repo
138
 
139
+ ## Technical Spec
140
+
141
+ The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md).
142
+
143
  ## Hackathon Working Note
144
 
145
  This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -7,13 +7,15 @@
7
  ## 0. Current Branch Status
8
 
9
  - [x] `P1` task family is locked
10
- - [x] rotating-ellipse `P1` contract is implemented in code
11
  - [x] real `constellaration` verifier wiring is in place
12
  - [x] low-fidelity `run` plus high-fidelity `submit` split is documented
13
  - [x] post-terminal `step()` guard is in place
14
  - [x] baseline comparison has been rerun on the real verifier path
15
  - [x] Northflank smoke workflow and note are committed
16
  - [x] Northflank smoke test has passed on the team H100
 
 
17
  - [ ] tracked `P1` fixtures are added
18
  - [ ] manual playtest evidence is recorded
19
  - [ ] heuristic baseline is refreshed for the real verifier path
@@ -21,7 +23,7 @@
21
 
22
  Current caution:
23
 
24
- - the default baseline params are not currently a near-feasible playtest anchor on the real verifier path, so fixture discovery is a real prerequisite for meaningful manual playtesting
25
 
26
  ## 1. Submission Thesis
27
 
@@ -117,7 +119,7 @@ But the evidence order is:
117
  We intentionally narrow the scope to one environment family:
118
 
119
  - `P1` geometrical benchmark
120
- - rotating-ellipse, low-dimensional design space
121
  - official `constellaration` verifier
122
  - low-fidelity evaluation for ordinary interaction
123
  - optional high-fidelity verification for final checks or `submit`
@@ -173,7 +175,7 @@ Allowed reuse:
173
 
174
  Implementation handoff:
175
 
176
- - the remaining work is now fixture coverage, manual playtesting, heuristic refresh, smoke validation, and deployment
177
  - do not treat supporting decision notes as a new planning backlog
178
 
179
  ## 8.1 Compute Surfaces
@@ -212,6 +214,12 @@ Auth stance:
212
 
213
  The environment contract must be frozen before meaningful evaluation.
214
 
 
 
 
 
 
 
215
  ### Observation
216
 
217
  The observation should expose:
@@ -231,7 +239,9 @@ The observation must be interpretable by a human without additional hidden state
231
 
232
  ### Action Space
233
 
234
- The action space stays intentionally small and discrete:
 
 
235
 
236
  - `run`
237
  - `submit`
@@ -243,10 +253,11 @@ For `run`, the controllable fields are:
243
  - `aspect_ratio`
244
  - `elongation`
245
  - `rotational_transform`
 
246
  - direction: increase or decrease
247
  - magnitude: small, medium, large
248
 
249
- This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism.
250
 
251
  ### Episode Flow
252
 
@@ -282,6 +293,18 @@ The environment must preserve:
282
 
283
  The environment may add reward shaping, but it must not redefine what `P1` means.
284
 
 
 
 
 
 
 
 
 
 
 
 
 
285
  ## 11. Reward V0
286
 
287
  The reward in this document is not the final reward. It is `Reward V0`.
@@ -302,6 +325,12 @@ The initial scoring idea should be feasibility-first:
302
  - simple enough to debug from trajectories
303
  - aligned with official `P1` semantics
304
 
 
 
 
 
 
 
305
  ### Reward V0 Failure Modes To Test
306
 
307
  We should expect at least some of these:
@@ -344,7 +373,7 @@ This is calibration, not training.
344
  These are still hypotheses until manually or empirically checked:
345
 
346
  - six steps are enough to create non-trivial decision pressure
347
- - the rotating-ellipse action space is expressive enough for a meaningful `P1` task
348
  - `restore_best` is useful without becoming an exploit
349
  - heuristic should beat random on mean episode reward
350
  - low-fidelity interaction is predictive enough for useful policy learning
 
7
  ## 0. Current Branch Status
8
 
9
  - [x] `P1` task family is locked
10
+ - [x] 3-knob rotating-ellipse `P1` contract is implemented in code
11
  - [x] real `constellaration` verifier wiring is in place
12
  - [x] low-fidelity `run` plus high-fidelity `submit` split is documented
13
  - [x] post-terminal `step()` guard is in place
14
  - [x] baseline comparison has been rerun on the real verifier path
15
  - [x] Northflank smoke workflow and note are committed
16
  - [x] Northflank smoke test has passed on the team H100
17
+ - [x] current 3-knob family has been checked against the real low-fidelity verifier
18
+ - [ ] parameterization repair is implemented so triangularity is controllable
19
  - [ ] tracked `P1` fixtures are added
20
  - [ ] manual playtest evidence is recorded
21
  - [ ] heuristic baseline is refreshed for the real verifier path
 
23
 
24
  Current caution:
25
 
26
+ - the current 3-knob family is structurally blocked on the official triangularity constraint under the real verifier path, so parameterization repair is now the first blocker before fixture discovery or manual playtesting
27
 
28
  ## 1. Submission Thesis
29
 
 
119
  We intentionally narrow the scope to one environment family:
120
 
121
  - `P1` geometrical benchmark
122
+ - repaired low-dimensional boundary family derived from rotating-ellipse seeds
123
  - official `constellaration` verifier
124
  - low-fidelity evaluation for ordinary interaction
125
  - optional high-fidelity verification for final checks or `submit`
 
175
 
176
  Implementation handoff:
177
 
178
+ - the remaining work is now parameterization repair, then fixture coverage, manual playtesting, heuristic refresh, smoke validation, and deployment
179
  - do not treat supporting decision notes as a new planning backlog
180
 
181
  ## 8.1 Compute Surfaces
 
214
 
215
  The environment contract must be frozen before meaningful evaluation.
216
 
217
+ Current verified blocker:
218
+
219
+ - the current upstream 3-knob `generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` family does not expose triangularity control
220
+ - on the real low-fidelity verifier path, sampled points stayed at roughly `average_triangularity=+0.004975` and `p1_feasibility=1.00995`
221
+ - so the next contract revision must repair parameterization before reward iteration becomes meaningful
222
+
223
  ### Observation
224
 
225
  The observation should expose:
 
239
 
240
  ### Action Space
241
 
242
+ The action space stays intentionally small and discrete, but the current 3-knob version is no longer enough. The next contract revision should keep low-dimensional actions while adding an explicit control that can move triangularity.
243
+
244
+ Near-term target:
245
 
246
  - `run`
247
  - `submit`
 
253
  - `aspect_ratio`
254
  - `elongation`
255
  - `rotational_transform`
256
+ - `triangularity_scale` or equivalent low-dimensional triangularity control
257
  - direction: increase or decrease
258
  - magnitude: small, medium, large
259
 
260
+ This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism. The verifier should stay official; the custom logic belongs in the low-dimensional boundary builder, not in reward semantics.
261
 
262
  ### Episode Flow
263
 
 
293
 
294
  The environment may add reward shaping, but it must not redefine what `P1` means.
295
 
296
+ Implementation split:
297
+
298
+ - boundary builder or parameterization adapter:
299
+ - custom low-dimensional family construction
300
+ - rotating-ellipse seed creation
301
+ - triangularity control injection, if used
302
+ - official verifier:
303
+ - boundary in
304
+ - `GeometricalProblem` semantics out
305
+
306
+ The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
307
+
308
  ## 11. Reward V0
309
 
310
  The reward in this document is not the final reward. It is `Reward V0`.
 
325
  - simple enough to debug from trajectories
326
  - aligned with official `P1` semantics
327
 
328
+ Current execution note:
329
+
330
+ - do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
331
+ - once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
332
+ - clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
333
+
334
  ### Reward V0 Failure Modes To Test
335
 
336
  We should expect at least some of these:
 
373
  These are still hypotheses until manually or empirically checked:
374
 
375
  - six steps are enough to create non-trivial decision pressure
376
+ - the repaired low-dimensional action family is expressive enough for a meaningful `P1` task
377
  - `restore_best` is useful without becoming an exploit
378
  - heuristic should beat random on mean episode reward
379
  - low-fidelity interaction is predictive enough for useful policy learning
docs/P1_ENV_CONTRACT_V1.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Environment Contract V1
2
+
3
+ **Status:** Technical implementation plan
4
+ **Role:** Supporting spec for the `P1` environment contract
5
+ **SSOT relationship:** This file refines [FUSION_DESIGN_LAB_PLAN_V2.md](FUSION_DESIGN_LAB_PLAN_V2.md). If this file conflicts with the planning SSOT, update both in the same task.
6
+
7
+ ## Purpose
8
+
9
+ This file captures the technical contract that should drive the next code changes in:
10
+
11
+ - [server/physics.py](../server/physics.py)
12
+ - [fusion_lab/models.py](../fusion_lab/models.py)
13
+ - [server/environment.py](../server/environment.py)
14
+ - [server/app.py](../server/app.py)
15
+
16
+ The central change is now explicit:
17
+
18
+ - the current upstream 3-knob rotating-ellipse family is blocked on P1 triangularity under the real verifier path
19
+ - the next environment contract must repair parameterization before more reward iteration or heuristic work
20
+
21
+ ## Verified Blocker
22
+
23
+ Current verified facts:
24
+
25
+ - upstream `generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` has no triangularity control
26
+ - the current 3-knob environment directly exposes only:
27
+ - `aspect_ratio`
28
+ - `elongation`
29
+ - `rotational_transform`
30
+ - real low-fidelity samples on the current verifier path kept:
31
+ - `average_triangularity` at roughly `+0.004975`
32
+ - `p1_feasibility` at roughly `1.00995`
33
+ - feasible count at `0`
34
+
35
+ Conclusion:
36
+
37
+ - the current 3-knob family is not a meaningful playtest or baseline environment for `P1`
38
+ - reward work is secondary until the boundary family can actually approach the official triangularity constraint
39
+
40
+ ## Design Split
41
+
42
+ Keep three layers separate:
43
+
44
+ 1. **Boundary builder**
45
+ - low-dimensional parameterization
46
+ - rotating-ellipse seed generation
47
+ - optional triangularity control injection
48
+ 2. **Official verifier**
49
+ - boundary in
50
+ - metrics out
51
+ - feasibility, objective, and score semantics from `GeometricalProblem`
52
+ 3. **Environment**
53
+ - reset pool
54
+ - discrete actions
55
+ - episode flow
56
+ - reward shaping
57
+
58
+ ## Verifier Plan
59
+
60
+ `server/physics.py` should expose a boundary-based verifier surface.
61
+
62
+ Target functions:
63
+
64
+ - `build_initial_boundary(...) -> SurfaceRZFourier`
65
+ - `apply_low_dim_perturbation(...) -> SurfaceRZFourier`
66
+ - `evaluate_boundary(boundary, fidelity) -> EvaluationMetrics`
67
+
68
+ The verifier layer should own:
69
+
70
+ - low-fidelity step-time evaluation
71
+ - high-fidelity submit-time evaluation
72
+ - official `P1` feasibility semantics
73
+ - official `P1` objective direction
74
+ - score ordering
75
+
76
+ The verifier layer should not own:
77
+
78
+ - episode budget
79
+ - action semantics
80
+ - reward shaping
81
+ - “best so far” state
82
+
83
+ ## Low-Dimensional Boundary Plan
84
+
85
+ Stay low-dimensional, not Fourier-first.
86
+
87
+ Target controllable knobs:
88
+
89
+ - `aspect_ratio`
90
+ - `elongation`
91
+ - `rotational_transform`
92
+ - `triangularity_scale`
93
+
94
+ Important naming rule:
95
+
96
+ - once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
97
+ - it becomes a custom low-dimensional boundary family derived from a rotating-ellipse seed
98
+
99
+ ## Action Contract
100
+
101
+ Keep the discrete interaction style:
102
+
103
+ - `intent`: `run | submit | restore_best`
104
+ - `direction`: `increase | decrease`
105
+ - `magnitude`: `small | medium | large`
106
+
107
+ For `run`, the controllable parameter should be one of:
108
+
109
+ - `aspect_ratio`
110
+ - `elongation`
111
+ - `rotational_transform`
112
+ - `triangularity_scale`
113
+
114
+ This keeps the environment human-playable and aligned with the historical low-dimensional P1 path.
115
+
116
+ ## Observation Contract
117
+
118
+ The observation should stay metric-centered and human-readable.
119
+
120
+ Keep:
121
+
122
+ - `max_elongation`
123
+ - `aspect_ratio`
124
+ - `average_triangularity`
125
+ - `edge_iota_over_nfp`
126
+ - `p1_feasibility`
127
+ - `p1_score`
128
+ - `budget_remaining`
129
+ - `best_score`
130
+ - `best_feasibility`
131
+ - `diagnostics_text`
132
+
133
+ Add clarity about fidelity:
134
+
135
+ - low-fidelity step-time metrics should be labeled as such
136
+ - high-fidelity submit-time metrics should be labeled as such
137
+ - do not expose them as if they are the same truth surface
138
+
139
+ This can be done either by:
140
+
141
+ - separate observation fields, or
142
+ - explicit fidelity labels in `diagnostics_text`
143
+
144
+ The minimum requirement is that a reader can tell whether a metric came from low-fi `run` or high-fi `submit`.
145
+
146
+ ## Reward V0
147
+
148
+ Keep reward mostly scalar and verifier-driven.
149
+
150
+ Target structure:
151
+
152
+ - infeasible to feasible crossing:
153
+ - clear positive bonus
154
+ - feasible to infeasible regression:
155
+ - clear negative penalty
156
+ - both infeasible:
157
+ - reward reduction in official feasibility scalar
158
+ - both feasible:
159
+ - reward lower `max_elongation`
160
+ - non-submit step:
161
+ - small step cost
162
+ - explicit `submit`:
163
+ - better than passive budget exhaustion when the design is improved
164
+
165
+ Do not add:
166
+
167
+ - reward terms tied to specific Fourier modes
168
+ - bonuses for matching a known winner
169
+ - hand-coded constraint tricks to hide a blocked action family
170
+
171
+ ## Reset Strategy
172
+
173
+ Start with frozen exact seeds, not jitter.
174
+
175
+ Reset pool policy:
176
+
177
+ - `n_field_periods = 3`
178
+ - small frozen seed set
179
+ - each seed must be:
180
+ - reproducible
181
+ - near enough to the feasible boundary that 6 steps is worth testing
182
+ - not already solved
183
+
184
+ Add bounded jitter only if memorization becomes a real problem.
185
+
186
+ ## Manual Playtest Gate
187
+
188
+ Do not move to heuristic redesign or reward tuning until this gate is passed.
189
+
190
+ Manual playtest questions:
191
+
192
+ - can a human tell which constraint is currently blocking progress?
193
+ - can a human choose a plausible next action?
194
+ - can a human reach or approach feasibility within the budget?
195
+ - does `submit` feel meaningfully different from passive exhaustion?
196
+
197
+ If the answer is no, fix:
198
+
199
+ - the boundary family
200
+ - the step magnitudes
201
+ - the seed pool
202
+
203
+ before tuning reward further
204
+
205
+ ## Implementation Order
206
+
207
+ 1. Repair the low-dimensional boundary builder in [server/physics.py](../server/physics.py).
208
+ 2. Split boundary construction from official boundary evaluation in [server/physics.py](../server/physics.py).
209
+ 3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
210
+ 4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
211
+ 5. Update the task summary in [server/app.py](../server/app.py).
212
+ 6. Freeze 1-2 repaired low-dimensional fixtures.
213
+ 7. Run manual playtesting.
214
+ 8. Refresh the heuristic baseline only after that evidence exists.
215
+
216
+ ## Out of Scope
217
+
218
+ - full Fourier-mode action space as the primary environment
219
+ - porting the old `ai-sci-feasible-designs` harness
220
+ - making reward more complex before the repaired low-dimensional family exists
221
+ - building a full benchmark split protocol before the environment is even playable
docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED
@@ -9,15 +9,17 @@ Use this file as rationale for the pivot, not as a fresh planning queue. Once th
9
  ## Current Branch Status
10
 
11
  - [x] pivot accepted
12
- - [x] rotating-ellipse `P1` contract is implemented
13
  - [x] `constellaration` verifier path is wired
 
 
14
  - [ ] tracked fixtures are added
15
  - [ ] manual playtest evidence is recorded
16
  - [ ] heuristic baseline is refreshed for the real verifier path
17
 
18
  Current caution:
19
 
20
- - the default rotating-ellipse baseline params are currently useful as an infeasible reference, not as a near-feasible anchor, so the fixture set still needs a better boundary-region map
21
 
22
  ## Decision
23
 
@@ -66,7 +68,7 @@ Feasibility tolerance: normalized constraint violations <= 1% (0.01).
66
 
67
  ### Parameter Space
68
 
69
- The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
70
 
71
  | Parameter | Role | Typical range |
72
  |---|---|---|
@@ -77,9 +79,17 @@ The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
77
 
78
  These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
79
 
 
 
 
 
 
 
 
 
80
  ### Action Space
81
 
82
- Discrete perturbations on the 3 rotating-ellipse parameters:
83
 
84
  ```
85
  intent: "run" | "submit" | "restore_best"
@@ -88,6 +98,10 @@ direction: "increase" | "decrease"
88
  magnitude: "small" | "medium" | "large"
89
  ```
90
 
 
 
 
 
91
  Magnitude deltas (to be tuned by playtest):
92
 
93
  | Parameter | small | medium | large |
@@ -101,7 +115,7 @@ Magnitude deltas (to be tuned by playtest):
101
  1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
102
  2. Agent chooses action.
103
  3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
104
- 4. If `restore_best`: revert to best-known parameters. No VMEC cost, but costs a budget step.
105
  5. If `submit`: end episode. Optionally run high-fi for final score.
106
  6. Episode ends on `submit` or budget exhaustion.
107
 
@@ -117,8 +131,8 @@ max_elongation: float # P1 objective (minimize)
117
  aspect_ratio: float # constraint: <= 4.0
118
  average_triangularity: float # constraint: <= -0.5
119
  edge_iota_over_nfp: float # constraint: >= 0.3
120
- p1_score: float # official P1 score (0 if infeasible)
121
- p1_feasibility: float # max normalized constraint violation
122
  constraints_satisfied: bool # feasibility <= 0.01
123
  vacuum_well: float # stability indicator
124
  step_number: int
@@ -127,6 +141,10 @@ best_score: float
127
  target_spec: str
128
  ```
129
 
 
 
 
 
130
  ### Reward V0
131
 
132
  Feasibility-first, then objective improvement:
@@ -152,12 +170,18 @@ submit penalty (if infeasible or no improvement):
152
 
153
  This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
154
 
 
 
 
 
 
 
155
  ### State
156
 
157
  ```
158
  step_count: int
159
- current_params: {aspect_ratio, elongation, rotational_transform}
160
- best_params: {aspect_ratio, elongation, rotational_transform}
161
  initial_score: float
162
  best_score: float
163
  best_feasibility: float
@@ -206,7 +230,7 @@ Update `fusion_lab/models.py` for new schemas.
206
 
207
  Status: open.
208
 
209
- Validate hypothesis: "6 actions is enough."
210
  - Play 5-10 episodes manually
211
  - Log: can a human reach feasibility? Improve elongation?
212
  - Tune magnitude deltas if needed
@@ -242,10 +266,9 @@ If full high-fidelity `constellaration` deployment fails (Docker build, HF Space
242
 
243
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
244
 
245
- 1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
246
- 1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative or repair reference only
247
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
248
- 3. **Near-boundary anchor:** still needs to be found from real verifier probing before manual playtesting
249
 
250
  These are for verifier/reward sanity, not a prerequisite seed-mining project.
251
 
@@ -255,6 +278,7 @@ These are for verifier/reward sanity, not a prerequisite seed-mining project.
255
  - Do not make the task "agent writes arbitrary optimization scripts."
256
  - Do not stream the full HF dataset at runtime.
257
  - Do not mix rotating-ellipse and Fourier-repair action spaces.
 
258
  - Do not use high-fidelity eval for interactive steps (24s is too slow).
259
  - Do not narrate "6 actions is enough" as validated until manually playtested.
260
  - Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.
 
9
  ## Current Branch Status
10
 
11
  - [x] pivot accepted
12
+ - [x] 3-knob rotating-ellipse `P1` contract is implemented
13
  - [x] `constellaration` verifier path is wired
14
+ - [x] current 3-knob family is verified as blocked on P1 triangularity
15
+ - [ ] repaired low-dimensional family with explicit triangularity control is implemented
16
  - [ ] tracked fixtures are added
17
  - [ ] manual playtest evidence is recorded
18
  - [ ] heuristic baseline is refreshed for the real verifier path
19
 
20
  Current caution:
21
 
22
+ - the current upstream rotating-ellipse family is useful as a seed generator, but not sufficient as the full environment action family because it does not move triangularity under the real verifier path
23
 
24
  ## Decision
25
 
 
68
 
69
  ### Parameter Space
70
 
71
+ The upstream rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
72
 
73
  | Parameter | Role | Typical range |
74
  |---|---|---|
 
79
 
80
  These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
81
 
82
+ Verified blocker:
83
+
84
+ - on the real low-fidelity verifier path, sampled 3-knob points kept `average_triangularity` at roughly `+0.004975`
85
+ - sampled `p1_feasibility` stayed at roughly `1.00995`
86
+ - no sampled point was feasible
87
+
88
+ So the hackathon environment now needs a custom low-dimensional boundary family on top of the rotating-ellipse seed, with an explicit triangularity control knob or equivalent mechanism.
89
+
90
  ### Action Space
91
 
92
+ Original 3-knob action space:
93
 
94
  ```
95
  intent: "run" | "submit" | "restore_best"
 
98
  magnitude: "small" | "medium" | "large"
99
  ```
100
 
101
+ This is no longer sufficient on its own. The next contract revision should keep the same discrete structure while adding:
102
+
103
+ - `triangularity_scale` or equivalent low-dimensional control
104
+
105
  Magnitude deltas (to be tuned by playtest):
106
 
107
  | Parameter | small | medium | large |
 
115
  1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
116
  2. Agent chooses action.
117
  3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
118
+ 4. If `restore_best`: revert to best-known parameters, re-evaluate low-fidelity metrics, and charge a budget step.
119
  5. If `submit`: end episode. Optionally run high-fi for final score.
120
  6. Episode ends on `submit` or budget exhaustion.
121
 
 
131
  aspect_ratio: float # constraint: <= 4.0
132
  average_triangularity: float # constraint: <= -0.5
133
  edge_iota_over_nfp: float # constraint: >= 0.3
134
+ p1_score: float # current step-time score
135
+ p1_feasibility: float # current step-time max normalized constraint violation
136
  constraints_satisfied: bool # feasibility <= 0.01
137
  vacuum_well: float # stability indicator
138
  step_number: int
 
141
  target_spec: str
142
  ```
143
 
144
+ Follow-up requirement from the verified blocker:
145
+
146
+ - once submit stays high-fidelity, the observation or diagnostics text should make the low-fi vs high-fi distinction explicit
147
+
148
  ### Reward V0
149
 
150
  Feasibility-first, then objective improvement:
 
170
 
171
  This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
172
 
173
+ Execution note after the verified blocker:
174
+
175
+ - keep reward mostly scalar and verifier-driven
176
+ - repair parameterization before further reward tuning
177
+ - do not add mode- or constraint-specific reward hacks to compensate for a blocked action family
178
+
179
  ### State
180
 
181
  ```
182
  step_count: int
183
+ current_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
184
+ best_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
185
  initial_score: float
186
  best_score: float
187
  best_feasibility: float
 
230
 
231
  Status: open.
232
 
233
+ Validate hypothesis: "6 actions is enough" only after parameterization repair.
234
  - Play 5-10 episodes manually
235
  - Log: can a human reach feasibility? Improve elongation?
236
  - Tune magnitude deltas if needed
 
266
 
267
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
268
 
269
+ 1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative reference only until parameterization repair lands
 
270
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
271
+ 3. **Near-boundary anchor:** still needs to be found after parameterization repair and real verifier probing before manual playtesting
272
 
273
  These are for verifier/reward sanity, not a prerequisite seed-mining project.
274
 
 
278
  - Do not make the task "agent writes arbitrary optimization scripts."
279
  - Do not stream the full HF dataset at runtime.
280
  - Do not mix rotating-ellipse and Fourier-repair action spaces.
281
+ - Do not pretend the upstream 3-knob family is enough for P1 after the verified triangularity blocker.
282
  - Do not use high-fidelity eval for interactive steps (24s is too slow).
283
  - Do not narrate "6 actions is enough" as validated until manually playtested.
284
  - Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.
training/notebooks/NORTHFLANK_SMOKE_NOTE.md CHANGED
@@ -13,12 +13,12 @@ Prove all four required conditions in the Northflank Jupyter workspace:
13
 
14
  ## Repo Entry Point
15
 
16
- Use [northflank_smoke.py](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/northflank_smoke.py).
17
 
18
  It uses the repo SSOT values from:
19
 
20
- - [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py)
21
- - [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py)
22
 
23
  ## Northflank Run
24
 
 
13
 
14
  ## Repo Entry Point
15
 
16
+ Use [northflank_smoke.py](northflank_smoke.py).
17
 
18
  It uses the repo SSOT values from:
19
 
20
+ - [server/environment.py](../../server/environment.py)
21
+ - [server/physics.py](../../server/physics.py)
22
 
23
  ## Northflank Run
24