CreativeEngineer commited on
Commit
2d47f4f
·
1 Parent(s): acb992c

docs: refine p1 parameterization plan

Browse files
AGENTS.md CHANGED
@@ -26,6 +26,8 @@ Use these docs as the planning SSOT:
26
 
27
  `docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
28
 
 
 
29
  If code and docs disagree, either:
30
 
31
  1. update code to match the docs, or
 
26
 
27
  `docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
28
 
29
+ `docs/P1_ENV_CONTRACT_V1.md` is a supporting technical spec for the current implementation phase. It should refine the SSOT docs, not silently diverge from them.
30
+
31
  If code and docs disagree, either:
32
 
33
  1. update code to match the docs, or
README.md CHANGED
@@ -41,6 +41,8 @@ Implementation status:
41
  - [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
42
  - [ ] Split boundary construction from boundary evaluation in `server/physics.py`
43
  - [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
 
 
44
  - [ ] Add tracked `P1` fixtures under `server/data/p1/`
45
  - [ ] Run manual playtesting and record the first reward pathology
46
  - [ ] Refresh the heuristic baseline for the real verifier path
@@ -50,7 +52,9 @@ Implementation status:
50
 
51
  - The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
52
  - `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
 
53
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 
54
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
55
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
56
 
@@ -112,13 +116,15 @@ uv sync --extra notebooks
112
 
113
  1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
114
  2. Split boundary construction from boundary evaluation in `server/physics.py`.
115
- 3. Update the environment contract to the repaired low-dimensional family and label low-fi vs high-fi truth clearly in observations.
116
- 4. Add tracked `P1` fixtures under `server/data/p1`.
117
- 5. Run manual playtest episodes and record the first real reward pathology, if any.
118
- 6. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
119
- 7. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
120
- 8. Deploy the environment to HF Space.
121
- 9. Add the Colab notebook under `training/notebooks`.
 
 
122
 
123
  These are implementation steps, not another planning phase.
124
 
 
41
  - [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
42
  - [ ] Split boundary construction from boundary evaluation in `server/physics.py`
43
  - [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
44
+ - [ ] Add explicit VMEC failure semantics to the environment contract
45
+ - [ ] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
46
  - [ ] Add tracked `P1` fixtures under `server/data/p1/`
47
  - [ ] Run manual playtesting and record the first reward pathology
48
  - [ ] Refresh the heuristic baseline for the real verifier path
 
52
 
53
  - The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
54
  - `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
55
+ - The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
56
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
57
+ - The environment still needs explicit VMEC failure semantics. Failed evaluations should cost budget, produce a visible failure observation, and apply a documented penalty; they should not be silently swallowed.
58
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
59
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
60
 
 
116
 
117
  1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
118
  2. Split boundary construction from boundary evaluation in `server/physics.py`.
119
+ 3. Add explicit VMEC failure semantics to the environment loop.
120
+ 4. Update the environment contract to the repaired low-dimensional family and label low-fi vs high-fi truth clearly in observations.
121
+ 5. Run a small measured sweep on the repaired family to choose useful ranges, deltas, and reset seeds.
122
+ 6. Add tracked `P1` fixtures under `server/data/p1`.
123
+ 7. Run manual playtest episodes and record the first real reward pathology, if any.
124
+ 8. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
125
+ 9. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
126
+ 10. Deploy the environment to HF Space.
127
+ 11. Add the Colab notebook under `training/notebooks`.
128
 
129
  These are implementation steps, not another planning phase.
130
 
TODO.md CHANGED
@@ -7,6 +7,7 @@ Use this file for day-of build progress. Use the linked docs for rationale, sequ
7
  - [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
8
  - [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
9
  - [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 
10
  - [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
11
  - [Repo Guardrails](AGENTS.md)
12
 
@@ -26,6 +27,12 @@ Priority source:
26
  - [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
27
  - [x] post-terminal guard in `step()`
28
  - [x] `constellaration` verifier wiring
 
 
 
 
 
 
29
  - [ ] tracked `P1` fixtures
30
  - [ ] manual playtest log
31
  - [x] settle the non-submit terminal reward policy
@@ -39,7 +46,8 @@ flowchart TD
39
  A["Northflank Smoke Test"] --> E["Fixture Checks"]
40
  B["P1 Contract Lock"] --> D["P1 Models + Environment"]
41
  C["constellaration Physics Wiring"] --> D
42
- D --> E["Fixture Checks"]
 
43
  E --> F["Manual Playtest"]
44
  F --> G["Reward V1"]
45
  G --> H["Baselines"]
@@ -57,12 +65,19 @@ flowchart TD
57
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
58
  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
59
 
60
- - [ ] Pass the Northflank smoke test
61
  Related:
62
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
63
  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
64
  [training/notebooks/README.md](training/notebooks/README.md)
65
 
 
 
 
 
 
 
 
66
  ## Fresh Wiring
67
 
68
  - [x] Rewrite the shared models to the locked `P1` contract
@@ -93,14 +108,58 @@ flowchart TD
93
  [server/app.py](server/app.py),
94
  [README.md](README.md)
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  ## Validation and Reward
97
 
 
 
 
 
 
 
98
  - [ ] Add 1-2 tracked `P1` fixtures
99
  Files:
100
  [server/data/p1/README.md](server/data/p1/README.md),
101
  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
102
  Note:
103
- the default baseline params are not near-feasible on the real verifier path, so they are not enough for the fixture set by themselves
104
 
105
  - [ ] Run fixture sanity checks
106
  Goal:
@@ -188,6 +247,8 @@ flowchart TD
188
  ## Guardrails
189
 
190
  - [ ] Do not reopen `P1 + rotating-ellipse` strategy without a real blocker
 
 
191
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
192
  - [ ] Do not let notebook or demo work outrun environment evidence
193
  - [ ] Do not add training-first complexity before manual playtesting
 
7
  - [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
8
  - [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
9
  - [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
10
+ - [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
11
  - [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
12
  - [Repo Guardrails](AGENTS.md)
13
 
 
27
  - [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
28
  - [x] post-terminal guard in `step()`
29
  - [x] `constellaration` verifier wiring
30
+ - [x] verify the current 3-knob family against the real low-fidelity verifier
31
+ - [ ] repair the low-dimensional parameterization so triangularity is controllable
32
+ - [ ] split boundary building from boundary evaluation
33
+ - [ ] update the action schema from 3 knobs to the repaired low-dimensional family
34
+ - [ ] add explicit VMEC failure semantics
35
+ - [ ] label low-fi vs high-fi truth in the observation/task surface
36
  - [ ] tracked `P1` fixtures
37
  - [ ] manual playtest log
38
  - [x] settle the non-submit terminal reward policy
 
46
  A["Northflank Smoke Test"] --> E["Fixture Checks"]
47
  B["P1 Contract Lock"] --> D["P1 Models + Environment"]
48
  C["constellaration Physics Wiring"] --> D
49
+ D --> P["Parameterization Repair"]
50
+ P --> E["Fixture Checks"]
51
  E --> F["Manual Playtest"]
52
  F --> G["Reward V1"]
53
  G --> H["Baselines"]
 
65
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
66
  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
67
 
68
+ - [x] Pass the Northflank smoke test
69
  Related:
70
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
71
  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
72
  [training/notebooks/README.md](training/notebooks/README.md)
73
 
74
+ - [x] Verify that the current 3-knob family can or cannot approach P1 feasibility
75
+ Goal:
76
+ decide whether parameterization repair is a blocker before more reward work
77
+ Related:
78
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md),
79
+ [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
80
+
81
  ## Fresh Wiring
82
 
83
  - [x] Rewrite the shared models to the locked `P1` contract
 
108
  [server/app.py](server/app.py),
109
  [README.md](README.md)
110
 
111
+ - [ ] Repair the low-dimensional boundary family
112
+ Goal:
113
+ add an explicit triangularity control knob or equivalent low-dimensional control so the environment can actually approach P1 feasibility
114
+ Files:
115
+ [server/physics.py](server/physics.py),
116
+ [fusion_lab/models.py](fusion_lab/models.py),
117
+ [server/environment.py](server/environment.py),
118
+ [server/app.py](server/app.py)
119
+ Related:
120
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
121
+
122
+ - [ ] Split boundary construction from boundary evaluation
123
+ Goal:
124
+ make the verifier boundary-based and keep parameterization-specific logic in the environment adapter layer
125
+ Files:
126
+ [server/physics.py](server/physics.py)
127
+ Related:
128
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
129
+
130
+ - [ ] Add explicit VMEC failure semantics
131
+ Goal:
132
+ failed evaluations must cost budget, return a visible failure observation, and apply a documented penalty without silent fallback
133
+ Files:
134
+ [server/physics.py](server/physics.py),
135
+ [server/environment.py](server/environment.py)
136
+ Related:
137
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
138
+
139
+ - [ ] Label low-fi vs high-fi truth in the observation/task surface
140
+ Goal:
141
+ make it obvious whether a metric came from a low-fidelity `run` step or a high-fidelity `submit`
142
+ Files:
143
+ [fusion_lab/models.py](fusion_lab/models.py),
144
+ [server/environment.py](server/environment.py),
145
+ [server/app.py](server/app.py)
146
+ Related:
147
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
148
+
149
  ## Validation and Reward
150
 
151
+ - [ ] Run a small measured sweep on the repaired low-dimensional family
152
+ Goal:
153
+ choose useful parameter ranges, step deltas, and reset seeds from the repaired action family instead of guessing them from prose
154
+ Related:
155
+ [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
156
+
157
  - [ ] Add 1-2 tracked `P1` fixtures
158
  Files:
159
  [server/data/p1/README.md](server/data/p1/README.md),
160
  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
161
  Note:
162
+ add fixtures only after the parameterization repair produces a meaningful near-boundary region
163
 
164
  - [ ] Run fixture sanity checks
165
  Goal:
 
247
  ## Guardrails
248
 
249
  - [ ] Do not reopen `P1 + rotating-ellipse` strategy without a real blocker
250
+ - [ ] Do not pretend the current 3-knob family is sufficient for P1 after the verified triangularity blocker
251
+ - [ ] Do not guess repaired-family ranges, deltas, or budget changes without measurement
252
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
253
  - [ ] Do not let notebook or demo work outrun environment evidence
254
  - [ ] Do not add training-first complexity before manual playtesting
baselines/README.md CHANGED
@@ -6,6 +6,9 @@ Random and heuristic baselines will live here.
6
  - [x] heuristic baseline exists
7
  - [x] baseline comparison script exists
8
  - [x] baseline comparison rerun completed on the real verifier path
 
 
 
9
  - [ ] heuristic refreshed after the real-verifier rerun
10
  - [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
11
  - [ ] presentation-ready comparison trace exported
 
6
  - [x] heuristic baseline exists
7
  - [x] baseline comparison script exists
8
  - [x] baseline comparison rerun completed on the real verifier path
9
+ - [x] verified that the current 3-knob family is blocked on P1 triangularity under the real verifier path
10
+ - [ ] repair the low-dimensional parameterization before further heuristic work
11
+ - [ ] wait for measured repaired-family ranges and reset seeds before retuning the heuristic
12
  - [ ] heuristic refreshed after the real-verifier rerun
13
  - [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
14
  - [ ] presentation-ready comparison trace exported
docs/FUSION_DELIVERABLES_MAP.md CHANGED
@@ -13,6 +13,10 @@ Use this map to sequence execution, not to reopen already-locked task choices.
13
  - [x] baseline comparison has been rerun on the real verifier path
14
  - [x] Northflank smoke workflow and note are committed
15
  - [x] Northflank smoke test has passed on the team H100
 
 
 
 
16
  - [ ] tracked fixtures are checked in
17
  - [ ] manual playtest evidence exists
18
  - [ ] heuristic baseline has been refreshed for the real verifier path
@@ -53,6 +57,8 @@ flowchart TD
53
 
54
  B0 --> F["Observation + action schema frozen"]
55
  B3 --> G["Fresh P1 verifier loop proven"]
 
 
56
  B2 --> H["Exploit observed -> penalty added"]
57
  B4 --> I0["Deterministic action schema"]
58
  D2 --> I["Human can act coherently in env"]
@@ -104,10 +110,13 @@ flowchart LR
104
 
105
  Northflank compute bring-up and smoke validation are complete.
106
 
107
- 1. Add tracked fixtures and run fixture sanity checks.
108
- 2. Manual-playtest the environment and record the first real pathology, if any.
109
- 3. Refresh the heuristic baseline from that evidence.
110
- 4. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
111
- 5. Use the notebook to show traces and comparisons; include training only if it adds signal.
112
- 6. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
113
- 7. Polish the repo only after the artifacts are real.
 
 
 
 
13
  - [x] baseline comparison has been rerun on the real verifier path
14
  - [x] Northflank smoke workflow and note are committed
15
  - [x] Northflank smoke test has passed on the team H100
16
+ - [x] current 3-knob family has been verified as blocked on P1 triangularity
17
+ - [ ] repaired low-dimensional boundary builder is implemented
18
+ - [ ] explicit VMEC failure semantics are implemented
19
+ - [ ] low-fi `run` truth vs high-fi `submit` truth is labeled clearly
20
  - [ ] tracked fixtures are checked in
21
  - [ ] manual playtest evidence exists
22
  - [ ] heuristic baseline has been refreshed for the real verifier path
 
57
 
58
  B0 --> F["Observation + action schema frozen"]
59
  B3 --> G["Fresh P1 verifier loop proven"]
60
+ G --> G1["Parameterization can actually reach P1 feasibility"]
61
+ G --> G2["VMEC failures are explicit and penalized"]
62
  B2 --> H["Exploit observed -> penalty added"]
63
  B4 --> I0["Deterministic action schema"]
64
  D2 --> I["Human can act coherently in env"]
 
110
 
111
  Northflank compute bring-up and smoke validation are complete.
112
 
113
+ 1. Repair the low-dimensional parameterization so triangularity is controllable under the official verifier.
114
+ 2. Add explicit VMEC failure semantics and clear low-fi vs high-fi observation labeling.
115
+ 3. Run a small measured sweep before locking ranges, deltas, reset seeds, or budget changes.
116
+ 4. Add tracked fixtures and run fixture sanity checks.
117
+ 5. Manual-playtest the environment and record the first real pathology, if any.
118
+ 6. Refresh the heuristic baseline from that evidence.
119
+ 7. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
120
+ 8. Use the notebook to show traces and comparisons; include training only if it adds signal.
121
+ 9. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
122
+ 10. Polish the repo only after the artifacts are real.
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -16,6 +16,8 @@
16
  - [x] Northflank smoke test has passed on the team H100
17
  - [x] current 3-knob family has been checked against the real low-fidelity verifier
18
  - [ ] parameterization repair is implemented so triangularity is controllable
 
 
19
  - [ ] tracked `P1` fixtures are added
20
  - [ ] manual playtest evidence is recorded
21
  - [ ] heuristic baseline is refreshed for the real verifier path
@@ -269,6 +271,13 @@ This is not trying to expose the full Fourier-boundary space. The goal is a legi
269
  - `submit`
270
  - exhausted budget
271
 
 
 
 
 
 
 
 
272
  ### Terminal Contract
273
 
274
  The episode should end cleanly and deterministically.
@@ -305,6 +314,10 @@ Implementation split:
305
 
306
  The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
307
 
 
 
 
 
308
  ## 11. Reward V0
309
 
310
  The reward in this document is not the final reward. It is `Reward V0`.
@@ -330,6 +343,7 @@ Current execution note:
330
  - do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
331
  - once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
332
  - clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
 
333
 
334
  ### Reward V0 Failure Modes To Test
335
 
@@ -377,6 +391,8 @@ These are still hypotheses until manually or empirically checked:
377
  - `restore_best` is useful without becoming an exploit
378
  - heuristic should beat random on mean episode reward
379
  - low-fidelity interaction is predictive enough for useful policy learning
 
 
380
 
381
  These should not be narrated as facts in the final demo until validated.
382
 
@@ -517,6 +533,8 @@ The repo should make the environment easy to understand:
517
  - observation schema frozen
518
  - action schema frozen
519
  - terminal conditions frozen
 
 
520
 
521
  ### Gate 2: Verifier Wiring Pass
522
 
@@ -575,21 +593,23 @@ Deliverables:
575
 
576
  ### Phase 1
577
 
578
- Wire the official verifier and run fixture checks.
579
 
580
  Deliverables:
581
 
582
- - one good fixture
583
- - near-boundary fixtures
584
- - bad fixtures
585
- - confidence that reward/verifier ordering is sane
586
 
587
  ### Phase 2
588
 
589
- Manual-playtest the environment.
590
 
591
  Deliverables:
592
 
 
 
593
  - 5 to 10 episode logs
594
  - notes on leverage, ambiguity, and pathologies
595
 
@@ -688,7 +708,7 @@ Instead:
688
  - simplify the initial states
689
  - tighten the action set
690
  - reduce magnitude choices
691
- - keep the environment more learnable within the fixed budget
692
 
693
  ### If the task is too easy
694
 
@@ -696,6 +716,7 @@ Do not add more domains.
696
 
697
  Instead:
698
 
 
699
  - adjust budget
700
  - adjust magnitudes
701
  - adjust reward to discourage trivial submission
@@ -728,10 +749,11 @@ That last line is intentionally conservative. It is strong enough without claimi
728
 
729
  ## 21. Immediate Next Actions
730
 
731
- 1. Freeze the `P1` environment contract in code and docs.
732
- 2. Implement fresh verifier wiring in this repo.
733
- 3. Run fixture checks before heavy training work.
734
- 4. Run manual playtests before heavy training work.
735
- 5. Mark the current reward as `V0`.
736
- 6. Log the first real pathology and reward revision.
737
- 7. Do not let notebook or video work outrun the environment evidence.
 
 
16
  - [x] Northflank smoke test has passed on the team H100
17
  - [x] current 3-knob family has been checked against the real low-fidelity verifier
18
  - [ ] parameterization repair is implemented so triangularity is controllable
19
+ - [ ] explicit VMEC failure semantics are implemented
20
+ - [ ] low-fi `run` truth vs high-fi `submit` truth is labeled clearly in the environment surface
21
  - [ ] tracked `P1` fixtures are added
22
  - [ ] manual playtest evidence is recorded
23
  - [ ] heuristic baseline is refreshed for the real verifier path
 
271
  - `submit`
272
  - exhausted budget
273
 
274
+ Failure semantics must also be explicit:
275
+
276
+ - if VMEC or the forward model fails, the run still consumes budget
277
+ - the observation must expose that the step failed
278
+ - the reward must apply a documented penalty
279
+ - the environment must not silently replace the failed result with a fake success path
280
+
281
  ### Terminal Contract
282
 
283
  The episode should end cleanly and deterministically.
 
314
 
315
  The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
316
 
317
+ Current execution rule:
318
+
319
+ - do not narrate guessed repaired-family ranges, deltas, or a larger budget as settled defaults until they are measured on the repaired family
320
+
321
  ## 11. Reward V0
322
 
323
  The reward in this document is not the final reward. It is `Reward V0`.
 
343
  - do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
344
  - once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
345
  - clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
346
+ - do not use reward complexity to compensate for missing action expressivity or missing VMEC failure semantics
347
 
348
  ### Reward V0 Failure Modes To Test
349
 
 
391
  - `restore_best` is useful without becoming an exploit
392
  - heuristic should beat random on mean episode reward
393
  - low-fidelity interaction is predictive enough for useful policy learning
394
+ - useful repaired-family parameter ranges and deltas
395
+ - whether the current budget should stay at `6` or change after playtesting
396
 
397
  These should not be narrated as facts in the final demo until validated.
398
 
 
533
  - observation schema frozen
534
  - action schema frozen
535
  - terminal conditions frozen
536
+ - explicit VMEC failure semantics defined
537
+ - low-fi vs high-fi metric labeling defined
538
 
539
  ### Gate 2: Verifier Wiring Pass
540
 
 
593
 
594
  ### Phase 1
595
 
596
+ Repair the low-dimensional parameterization, wire the verifier split cleanly, and run a small measured sweep before fixture checks.
597
 
598
  Deliverables:
599
 
600
+ - repaired low-dimensional boundary builder
601
+ - boundary-based verifier split
602
+ - explicit VMEC failure semantics
603
+ - measured parameter ranges, deltas, and candidate reset seeds
604
 
605
  ### Phase 2
606
 
607
+ Freeze initial fixtures and manual-playtest the environment.
608
 
609
  Deliverables:
610
 
611
+ - one good or near-boundary fixture
612
+ - bad fixtures
613
  - 5 to 10 episode logs
614
  - notes on leverage, ambiguity, and pathologies
615
 
 
708
  - simplify the initial states
709
  - tighten the action set
710
  - reduce magnitude choices
711
+ - keep the environment more learnable before changing the budget
712
 
713
  ### If the task is too easy
714
 
 
716
 
717
  Instead:
718
 
719
+ - first verify that parameterization repair and reset seeds did not make the task trivial
720
  - adjust budget
721
  - adjust magnitudes
722
  - adjust reward to discourage trivial submission
 
749
 
750
  ## 21. Immediate Next Actions
751
 
752
+ 1. Repair the low-dimensional boundary parameterization so triangularity is controllable.
753
+ 2. Split boundary construction from official boundary evaluation.
754
+ 3. Add explicit VMEC failure semantics and clear low-fi vs high-fi labeling.
755
+ 4. Run a small measured sweep before locking ranges, deltas, or budget changes.
756
+ 5. Freeze fixtures and run manual playtests before heavy training work.
757
+ 6. Mark the current reward as `V0`.
758
+ 7. Log the first real pathology and reward revision.
759
+ 8. Do not let notebook or video work outrun the environment evidence.
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED
@@ -9,19 +9,23 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
9
  ## Current Branch Status
10
 
11
  - [x] `P1` task is locked
12
- - [x] rotating-ellipse `P1` contract is implemented in the working tree
13
  - [x] baselines and API surface have been moved to the `P1` contract
14
  - [x] add a post-terminal guard in `step()`
15
  - [x] replace the synthetic evaluator with `constellaration`
16
  - [x] re-run baselines on the real verifier path
17
  - [x] commit the Northflank smoke workflow and note
18
  - [x] pass the Northflank smoke test on the team H100
 
 
 
 
19
  - [ ] add tracked fixtures and manual playtest evidence
20
  - [ ] refresh the heuristic baseline after the real-verifier rerun
21
 
22
  Current caution:
23
 
24
- - do not assume the default baseline params are a near-feasible playtest start; on the current real verifier path they are still deeply infeasible, so fixture discovery comes first
25
 
26
  ## Plan V2 Inheritance
27
 
@@ -73,9 +77,10 @@ Artifacts:
73
  - reward shaping
74
  - manual playtesting
75
  5. Mark open assumptions explicitly:
76
- - whether the rotating-ellipse action set is expressive enough
77
  - whether the fixed step budget is enough
78
  - whether `restore_best` is useful without becoming an exploit
 
79
 
80
  Exit condition: a human can read the spec and understand how to act in the environment.
81
 
@@ -90,34 +95,41 @@ Transition rule:
90
 
91
  ## Hour 2-4: Verify Wiring, Then Manual Playtest
92
 
93
- 1. Run fixture checks:
 
 
 
 
94
  - known-good or near-winning design
95
  - near-boundary designs
96
  - clearly bad designs
97
- - do not rely on the default baseline params as the only starting point
98
- 2. Confirm:
99
  - verifier outputs are sane
100
  - reward ordering is sane
101
  - objective direction is correct
102
- 3. Manually play 5 to 10 episodes.
103
- 4. Log for each step:
104
  - observation
105
  - chosen action
106
  - expected effect
107
  - returned reward
108
  - confusion or exploit if observed
109
- 5. Identify at least one bad incentive or exploit.
110
- 6. Patch reward or penalty logic immediately.
111
- 7. Write the reward shaping story:
112
  - initial reward V0
113
  - bad behavior
114
  - refinement to reward V1
115
  - improved behavior
116
- 8. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
117
 
118
  Exit condition: you can explain why the environment now rewards the intended behavior.
119
 
120
  Artifacts:
 
 
 
121
  - fixture check note
122
  - manual playtest log
123
  - reward shaping note
@@ -205,16 +217,17 @@ Artifacts:
205
  ## Artifact Order
206
 
207
  1. Environment spec
208
- 2. Fixture check note
209
- 3. Manual playtest log
210
- 4. Reward revision note
211
- 5. Stable task run
212
- 6. Random baseline
213
- 7. Heuristic baseline
214
- 8. Northflank traces or training evidence
215
- 9. Colab training or eval evidence
216
- 10. Demo recording
217
- 11. Repo polish
 
218
 
219
  ## Non-Negotiables
220
 
@@ -223,6 +236,7 @@ Artifacts:
223
  - Do not optimize training before manual playtesting.
224
  - Do not rely on reward curves alone; keep trajectory evidence.
225
  - Do not narrate hypotheses as facts before they are checked.
 
226
  - Do not polish the repo or video before the environment and baselines are real.
227
  - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
228
  - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
 
9
  ## Current Branch Status
10
 
11
  - [x] `P1` task is locked
12
+ - [x] 3-knob rotating-ellipse `P1` contract is implemented in the working tree
13
  - [x] baselines and API surface have been moved to the `P1` contract
14
  - [x] add a post-terminal guard in `step()`
15
  - [x] replace the synthetic evaluator with `constellaration`
16
  - [x] re-run baselines on the real verifier path
17
  - [x] commit the Northflank smoke workflow and note
18
  - [x] pass the Northflank smoke test on the team H100
19
+ - [x] verify that the current 3-knob family is blocked on P1 triangularity under the real verifier path
20
+ - [ ] repair the low-dimensional parameterization
21
+ - [ ] add explicit VMEC failure semantics
22
+ - [ ] label low-fi `run` truth vs high-fi `submit` truth in the task surface
23
  - [ ] add tracked fixtures and manual playtest evidence
24
  - [ ] refresh the heuristic baseline after the real-verifier rerun
25
 
26
  Current caution:
27
 
28
+ - do not assume the current 3-knob family is a viable playtest start; parameterization repair comes before fixture discovery, manual playtesting, and heuristic refresh
29
 
30
  ## Plan V2 Inheritance
31
 
 
77
  - reward shaping
78
  - manual playtesting
79
  5. Mark open assumptions explicitly:
80
+ - whether the repaired low-dimensional action set is expressive enough
81
  - whether the fixed step budget is enough
82
  - whether `restore_best` is useful without becoming an exploit
83
+ - whether repaired-family ranges and deltas need adjustment after measurement
84
 
85
  Exit condition: a human can read the spec and understand how to act in the environment.
86
 
 
95
 
96
  ## Hour 2-4: Verify Wiring, Then Manual Playtest
97
 
98
+ 1. Repair the low-dimensional parameterization so triangularity is controllable.
99
+ 2. Add explicit VMEC failure semantics and visible failure observations.
100
+ 3. Label low-fi `run` truth vs high-fi `submit` truth clearly.
101
+ 4. Run a small measured sweep on the repaired family before freezing defaults.
102
+ 5. Run fixture checks:
103
  - known-good or near-winning design
104
  - near-boundary designs
105
  - clearly bad designs
106
+ - do not rely on the current default baseline params as the only starting point
107
+ 6. Confirm:
108
  - verifier outputs are sane
109
  - reward ordering is sane
110
  - objective direction is correct
111
+ 7. Manually play 5 to 10 episodes.
112
+ 8. Log for each step:
113
  - observation
114
  - chosen action
115
  - expected effect
116
  - returned reward
117
  - confusion or exploit if observed
118
+ 9. Identify at least one bad incentive or exploit.
119
+ 10. Patch reward or penalty logic immediately.
120
+ 11. Write the reward shaping story:
121
  - initial reward V0
122
  - bad behavior
123
  - refinement to reward V1
124
  - improved behavior
125
+ 12. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
126
 
127
  Exit condition: you can explain why the environment now rewards the intended behavior.
128
 
129
  Artifacts:
130
+ - repaired low-dimensional boundary plan
131
+ - explicit failure semantics note
132
+ - measured range and delta note
133
  - fixture check note
134
  - manual playtest log
135
  - reward shaping note
 
217
  ## Artifact Order
218
 
219
  1. Environment spec
220
+ 2. Repaired parameterization note
221
+ 3. Fixture check note
222
+ 4. Manual playtest log
223
+ 5. Reward revision note
224
+ 6. Stable task run
225
+ 7. Random baseline
226
+ 8. Heuristic baseline
227
+ 9. Northflank traces or training evidence
228
+ 10. Colab training or eval evidence
229
+ 11. Demo recording
230
+ 12. Repo polish
231
 
232
  ## Non-Negotiables
233
 
 
236
  - Do not optimize training before manual playtesting.
237
  - Do not rely on reward curves alone; keep trajectory evidence.
238
  - Do not narrate hypotheses as facts before they are checked.
239
+ - Do not guess repaired-family ranges, deltas, or budget changes without a measured sweep.
240
  - Do not polish the repo or video before the environment and baselines are real.
241
  - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
242
  - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
docs/P1_ENV_CONTRACT_V1.md CHANGED
@@ -72,6 +72,7 @@ The verifier layer should own:
72
  - official `P1` feasibility semantics
73
  - official `P1` objective direction
74
  - score ordering
 
75
 
76
  The verifier layer should not own:
77
 
@@ -91,6 +92,12 @@ Target controllable knobs:
91
  - `rotational_transform`
92
  - `triangularity_scale`
93
 
 
 
 
 
 
 
94
  Important naming rule:
95
 
96
  - once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
@@ -168,6 +175,8 @@ Do not add:
168
  - bonuses for matching a known winner
169
  - hand-coded constraint tricks to hide a blocked action family
170
 
 
 
171
  ## Reset Strategy
172
 
173
  Start with frozen exact seeds, not jitter.
@@ -209,9 +218,11 @@ before tuning reward further
209
  3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
210
  4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
211
  5. Update the task summary in [server/app.py](../server/app.py).
212
- 6. Freeze 1-2 repaired low-dimensional fixtures.
213
- 7. Run manual playtesting.
214
- 8. Refresh the heuristic baseline only after that evidence exists.
 
 
215
 
216
  ## Out of Scope
217
 
 
72
  - official `P1` feasibility semantics
73
  - official `P1` objective direction
74
  - score ordering
75
+ - explicit failure results when VMEC or forward-model evaluation fails
76
 
77
  The verifier layer should not own:
78
 
 
92
  - `rotational_transform`
93
  - `triangularity_scale`
94
 
95
+ Current measurement rule:
96
+
97
+ - do not lock exact repaired-family ranges or deltas from prose alone
98
+ - measure them on the repaired boundary family before presenting them as defaults
99
+ - especially treat `rotational_transform` bounds, `triangularity_scale` deltas, and budget changes as open until measured
100
+
101
  Important naming rule:
102
 
103
  - once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
 
175
  - bonuses for matching a known winner
176
  - hand-coded constraint tricks to hide a blocked action family
177
 
178
+ Do not use reward complexity to compensate for missing action expressivity or missing crash semantics.
179
+
180
  ## Reset Strategy
181
 
182
  Start with frozen exact seeds, not jitter.
 
218
  3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
219
  4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
220
  5. Update the task summary in [server/app.py](../server/app.py).
221
+ 6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
222
+ 7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
223
+ 8. Freeze 1-2 repaired low-dimensional fixtures.
224
+ 9. Run manual playtesting.
225
+ 10. Refresh the heuristic baseline only after that evidence exists.
226
 
227
  ## Out of Scope
228