Commit ·
2d47f4f
1
Parent(s): acb992c
docs: refine p1 parameterization plan
Browse files- AGENTS.md +2 -0
- README.md +13 -7
- TODO.md +64 -3
- baselines/README.md +3 -0
- docs/FUSION_DELIVERABLES_MAP.md +16 -7
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +36 -14
- docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +36 -22
- docs/P1_ENV_CONTRACT_V1.md +14 -3
AGENTS.md
CHANGED
|
@@ -26,6 +26,8 @@ Use these docs as the planning SSOT:
|
|
| 26 |
|
| 27 |
`docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
|
| 28 |
|
|
|
|
|
|
|
| 29 |
If code and docs disagree, either:
|
| 30 |
|
| 31 |
1. update code to match the docs, or
|
|
|
|
| 26 |
|
| 27 |
`docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
|
| 28 |
|
| 29 |
+
`docs/P1_ENV_CONTRACT_V1.md` is a supporting technical spec for the current implementation phase. It should refine the SSOT docs, not silently diverge from them.
|
| 30 |
+
|
| 31 |
If code and docs disagree, either:
|
| 32 |
|
| 33 |
1. update code to match the docs, or
|
README.md
CHANGED
|
@@ -41,6 +41,8 @@ Implementation status:
|
|
| 41 |
- [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
|
| 42 |
- [ ] Split boundary construction from boundary evaluation in `server/physics.py`
|
| 43 |
- [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
|
|
|
|
|
|
|
| 44 |
- [ ] Add tracked `P1` fixtures under `server/data/p1/`
|
| 45 |
- [ ] Run manual playtesting and record the first reward pathology
|
| 46 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
|
@@ -50,7 +52,9 @@ Implementation status:
|
|
| 50 |
|
| 51 |
- The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
|
| 52 |
- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
|
|
|
|
| 53 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
|
|
|
| 54 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 55 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 56 |
|
|
@@ -112,13 +116,15 @@ uv sync --extra notebooks
|
|
| 112 |
|
| 113 |
1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
|
| 114 |
2. Split boundary construction from boundary evaluation in `server/physics.py`.
|
| 115 |
-
3.
|
| 116 |
-
4.
|
| 117 |
-
5. Run
|
| 118 |
-
6.
|
| 119 |
-
7.
|
| 120 |
-
8.
|
| 121 |
-
9.
|
|
|
|
|
|
|
| 122 |
|
| 123 |
These are implementation steps, not another planning phase.
|
| 124 |
|
|
|
|
| 41 |
- [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
|
| 42 |
- [ ] Split boundary construction from boundary evaluation in `server/physics.py`
|
| 43 |
- [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
|
| 44 |
+
- [ ] Add explicit VMEC failure semantics to the environment contract
|
| 45 |
+
- [ ] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
|
| 46 |
- [ ] Add tracked `P1` fixtures under `server/data/p1/`
|
| 47 |
- [ ] Run manual playtesting and record the first reward pathology
|
| 48 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
|
|
|
| 52 |
|
| 53 |
- The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
|
| 54 |
- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
|
| 55 |
+
- The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
|
| 56 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 57 |
+
- The environment still needs explicit VMEC failure semantics. Failed evaluations should cost budget, produce a visible failure observation, and apply a documented penalty; they should not be silently swallowed.
|
| 58 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 59 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 60 |
|
|
|
|
| 116 |
|
| 117 |
1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
|
| 118 |
2. Split boundary construction from boundary evaluation in `server/physics.py`.
|
| 119 |
+
3. Add explicit VMEC failure semantics to the environment loop.
|
| 120 |
+
4. Update the environment contract to the repaired low-dimensional family and label low-fi vs high-fi truth clearly in observations.
|
| 121 |
+
5. Run a small measured sweep on the repaired family to choose useful ranges, deltas, and reset seeds.
|
| 122 |
+
6. Add tracked `P1` fixtures under `server/data/p1`.
|
| 123 |
+
7. Run manual playtest episodes and record the first real reward pathology, if any.
|
| 124 |
+
8. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
|
| 125 |
+
9. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 126 |
+
10. Deploy the environment to HF Space.
|
| 127 |
+
11. Add the Colab notebook under `training/notebooks`.
|
| 128 |
|
| 129 |
These are implementation steps, not another planning phase.
|
| 130 |
|
TODO.md
CHANGED
|
@@ -7,6 +7,7 @@ Use this file for day-of build progress. Use the linked docs for rationale, sequ
|
|
| 7 |
- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 8 |
- [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
|
| 9 |
- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
|
|
|
| 10 |
- [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 11 |
- [Repo Guardrails](AGENTS.md)
|
| 12 |
|
|
@@ -26,6 +27,12 @@ Priority source:
|
|
| 26 |
- [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
|
| 27 |
- [x] post-terminal guard in `step()`
|
| 28 |
- [x] `constellaration` verifier wiring
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
- [ ] tracked `P1` fixtures
|
| 30 |
- [ ] manual playtest log
|
| 31 |
- [x] settle the non-submit terminal reward policy
|
|
@@ -39,7 +46,8 @@ flowchart TD
|
|
| 39 |
A["Northflank Smoke Test"] --> E["Fixture Checks"]
|
| 40 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 41 |
C["constellaration Physics Wiring"] --> D
|
| 42 |
-
D -->
|
|
|
|
| 43 |
E --> F["Manual Playtest"]
|
| 44 |
F --> G["Reward V1"]
|
| 45 |
G --> H["Baselines"]
|
|
@@ -57,12 +65,19 @@ flowchart TD
|
|
| 57 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 58 |
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 59 |
|
| 60 |
-
- [
|
| 61 |
Related:
|
| 62 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 63 |
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
|
| 64 |
[training/notebooks/README.md](training/notebooks/README.md)
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
## Fresh Wiring
|
| 67 |
|
| 68 |
- [x] Rewrite the shared models to the locked `P1` contract
|
|
@@ -93,14 +108,58 @@ flowchart TD
|
|
| 93 |
[server/app.py](server/app.py),
|
| 94 |
[README.md](README.md)
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
## Validation and Reward
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
- [ ] Add 1-2 tracked `P1` fixtures
|
| 99 |
Files:
|
| 100 |
[server/data/p1/README.md](server/data/p1/README.md),
|
| 101 |
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 102 |
Note:
|
| 103 |
-
|
| 104 |
|
| 105 |
- [ ] Run fixture sanity checks
|
| 106 |
Goal:
|
|
@@ -188,6 +247,8 @@ flowchart TD
|
|
| 188 |
## Guardrails
|
| 189 |
|
| 190 |
- [ ] Do not reopen `P1 + rotating-ellipse` strategy without a real blocker
|
|
|
|
|
|
|
| 191 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 192 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 193 |
- [ ] Do not add training-first complexity before manual playtesting
|
|
|
|
| 7 |
- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 8 |
- [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
|
| 9 |
- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 10 |
+
- [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 11 |
- [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 12 |
- [Repo Guardrails](AGENTS.md)
|
| 13 |
|
|
|
|
| 27 |
- [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
|
| 28 |
- [x] post-terminal guard in `step()`
|
| 29 |
- [x] `constellaration` verifier wiring
|
| 30 |
+
- [x] verify the current 3-knob family against the real low-fidelity verifier
|
| 31 |
+
- [ ] repair the low-dimensional parameterization so triangularity is controllable
|
| 32 |
+
- [ ] split boundary building from boundary evaluation
|
| 33 |
+
- [ ] update the action schema from 3 knobs to the repaired low-dimensional family
|
| 34 |
+
- [ ] add explicit VMEC failure semantics
|
| 35 |
+
- [ ] label low-fi vs high-fi truth in the observation/task surface
|
| 36 |
- [ ] tracked `P1` fixtures
|
| 37 |
- [ ] manual playtest log
|
| 38 |
- [x] settle the non-submit terminal reward policy
|
|
|
|
| 46 |
A["Northflank Smoke Test"] --> E["Fixture Checks"]
|
| 47 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 48 |
C["constellaration Physics Wiring"] --> D
|
| 49 |
+
D --> P["Parameterization Repair"]
|
| 50 |
+
P --> E["Fixture Checks"]
|
| 51 |
E --> F["Manual Playtest"]
|
| 52 |
F --> G["Reward V1"]
|
| 53 |
G --> H["Baselines"]
|
|
|
|
| 65 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 66 |
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 67 |
|
| 68 |
+
- [x] Pass the Northflank smoke test
|
| 69 |
Related:
|
| 70 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 71 |
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
|
| 72 |
[training/notebooks/README.md](training/notebooks/README.md)
|
| 73 |
|
| 74 |
+
- [x] Verify that the current 3-knob family can or cannot approach P1 feasibility
|
| 75 |
+
Goal:
|
| 76 |
+
decide whether parameterization repair is a blocker before more reward work
|
| 77 |
+
Related:
|
| 78 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md),
|
| 79 |
+
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 80 |
+
|
| 81 |
## Fresh Wiring
|
| 82 |
|
| 83 |
- [x] Rewrite the shared models to the locked `P1` contract
|
|
|
|
| 108 |
[server/app.py](server/app.py),
|
| 109 |
[README.md](README.md)
|
| 110 |
|
| 111 |
+
- [ ] Repair the low-dimensional boundary family
|
| 112 |
+
Goal:
|
| 113 |
+
add an explicit triangularity control knob or equivalent low-dimensional control so the environment can actually approach P1 feasibility
|
| 114 |
+
Files:
|
| 115 |
+
[server/physics.py](server/physics.py),
|
| 116 |
+
[fusion_lab/models.py](fusion_lab/models.py),
|
| 117 |
+
[server/environment.py](server/environment.py),
|
| 118 |
+
[server/app.py](server/app.py)
|
| 119 |
+
Related:
|
| 120 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 121 |
+
|
| 122 |
+
- [ ] Split boundary construction from boundary evaluation
|
| 123 |
+
Goal:
|
| 124 |
+
make the verifier boundary-based and keep parameterization-specific logic in the environment adapter layer
|
| 125 |
+
Files:
|
| 126 |
+
[server/physics.py](server/physics.py)
|
| 127 |
+
Related:
|
| 128 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 129 |
+
|
| 130 |
+
- [ ] Add explicit VMEC failure semantics
|
| 131 |
+
Goal:
|
| 132 |
+
failed evaluations must cost budget, return a visible failure observation, and apply a documented penalty without silent fallback
|
| 133 |
+
Files:
|
| 134 |
+
[server/physics.py](server/physics.py),
|
| 135 |
+
[server/environment.py](server/environment.py)
|
| 136 |
+
Related:
|
| 137 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 138 |
+
|
| 139 |
+
- [ ] Label low-fi vs high-fi truth in the observation/task surface
|
| 140 |
+
Goal:
|
| 141 |
+
make it obvious whether a metric came from a low-fidelity `run` step or a high-fidelity `submit`
|
| 142 |
+
Files:
|
| 143 |
+
[fusion_lab/models.py](fusion_lab/models.py),
|
| 144 |
+
[server/environment.py](server/environment.py),
|
| 145 |
+
[server/app.py](server/app.py)
|
| 146 |
+
Related:
|
| 147 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 148 |
+
|
| 149 |
## Validation and Reward
|
| 150 |
|
| 151 |
+
- [ ] Run a small measured sweep on the repaired low-dimensional family
|
| 152 |
+
Goal:
|
| 153 |
+
choose useful parameter ranges, step deltas, and reset seeds from the repaired action family instead of guessing them from prose
|
| 154 |
+
Related:
|
| 155 |
+
[P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
|
| 156 |
+
|
| 157 |
- [ ] Add 1-2 tracked `P1` fixtures
|
| 158 |
Files:
|
| 159 |
[server/data/p1/README.md](server/data/p1/README.md),
|
| 160 |
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 161 |
Note:
|
| 162 |
+
add fixtures only after the parameterization repair produces a meaningful near-boundary region
|
| 163 |
|
| 164 |
- [ ] Run fixture sanity checks
|
| 165 |
Goal:
|
|
|
|
| 247 |
## Guardrails
|
| 248 |
|
| 249 |
- [ ] Do not reopen `P1 + rotating-ellipse` strategy without a real blocker
|
| 250 |
+
- [ ] Do not pretend the current 3-knob family is sufficient for P1 after the verified triangularity blocker
|
| 251 |
+
- [ ] Do not guess repaired-family ranges, deltas, or budget changes without measurement
|
| 252 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 253 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 254 |
- [ ] Do not add training-first complexity before manual playtesting
|
baselines/README.md
CHANGED
|
@@ -6,6 +6,9 @@ Random and heuristic baselines will live here.
|
|
| 6 |
- [x] heuristic baseline exists
|
| 7 |
- [x] baseline comparison script exists
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
|
|
|
|
|
|
|
|
|
| 9 |
- [ ] heuristic refreshed after the real-verifier rerun
|
| 10 |
- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
|
| 11 |
- [ ] presentation-ready comparison trace exported
|
|
|
|
| 6 |
- [x] heuristic baseline exists
|
| 7 |
- [x] baseline comparison script exists
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
+
- [x] verified that the current 3-knob family is blocked on P1 triangularity under the real verifier path
|
| 10 |
+
- [ ] repair the low-dimensional parameterization before further heuristic work
|
| 11 |
+
- [ ] wait for measured repaired-family ranges and reset seeds before retuning the heuristic
|
| 12 |
- [ ] heuristic refreshed after the real-verifier rerun
|
| 13 |
- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
|
| 14 |
- [ ] presentation-ready comparison trace exported
|
docs/FUSION_DELIVERABLES_MAP.md
CHANGED
|
@@ -13,6 +13,10 @@ Use this map to sequence execution, not to reopen already-locked task choices.
|
|
| 13 |
- [x] baseline comparison has been rerun on the real verifier path
|
| 14 |
- [x] Northflank smoke workflow and note are committed
|
| 15 |
- [x] Northflank smoke test has passed on the team H100
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
- [ ] tracked fixtures are checked in
|
| 17 |
- [ ] manual playtest evidence exists
|
| 18 |
- [ ] heuristic baseline has been refreshed for the real verifier path
|
|
@@ -53,6 +57,8 @@ flowchart TD
|
|
| 53 |
|
| 54 |
B0 --> F["Observation + action schema frozen"]
|
| 55 |
B3 --> G["Fresh P1 verifier loop proven"]
|
|
|
|
|
|
|
| 56 |
B2 --> H["Exploit observed -> penalty added"]
|
| 57 |
B4 --> I0["Deterministic action schema"]
|
| 58 |
D2 --> I["Human can act coherently in env"]
|
|
@@ -104,10 +110,13 @@ flowchart LR
|
|
| 104 |
|
| 105 |
Northflank compute bring-up and smoke validation are complete.
|
| 106 |
|
| 107 |
-
1.
|
| 108 |
-
2.
|
| 109 |
-
3.
|
| 110 |
-
4.
|
| 111 |
-
5.
|
| 112 |
-
6.
|
| 113 |
-
7.
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- [x] baseline comparison has been rerun on the real verifier path
|
| 14 |
- [x] Northflank smoke workflow and note are committed
|
| 15 |
- [x] Northflank smoke test has passed on the team H100
|
| 16 |
+
- [x] current 3-knob family has been verified as blocked on P1 triangularity
|
| 17 |
+
- [ ] repaired low-dimensional boundary builder is implemented
|
| 18 |
+
- [ ] explicit VMEC failure semantics are implemented
|
| 19 |
+
- [ ] low-fi `run` truth vs high-fi `submit` truth is labeled clearly
|
| 20 |
- [ ] tracked fixtures are checked in
|
| 21 |
- [ ] manual playtest evidence exists
|
| 22 |
- [ ] heuristic baseline has been refreshed for the real verifier path
|
|
|
|
| 57 |
|
| 58 |
B0 --> F["Observation + action schema frozen"]
|
| 59 |
B3 --> G["Fresh P1 verifier loop proven"]
|
| 60 |
+
G --> G1["Parameterization can actually reach P1 feasibility"]
|
| 61 |
+
G --> G2["VMEC failures are explicit and penalized"]
|
| 62 |
B2 --> H["Exploit observed -> penalty added"]
|
| 63 |
B4 --> I0["Deterministic action schema"]
|
| 64 |
D2 --> I["Human can act coherently in env"]
|
|
|
|
| 110 |
|
| 111 |
Northflank compute bring-up and smoke validation are complete.
|
| 112 |
|
| 113 |
+
1. Repair the low-dimensional parameterization so triangularity is controllable under the official verifier.
|
| 114 |
+
2. Add explicit VMEC failure semantics and clear low-fi vs high-fi observation labeling.
|
| 115 |
+
3. Run a small measured sweep before locking ranges, deltas, reset seeds, or budget changes.
|
| 116 |
+
4. Add tracked fixtures and run fixture sanity checks.
|
| 117 |
+
5. Manual-playtest the environment and record the first real pathology, if any.
|
| 118 |
+
6. Refresh the heuristic baseline from that evidence.
|
| 119 |
+
7. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
|
| 120 |
+
8. Use the notebook to show traces and comparisons; include training only if it adds signal.
|
| 121 |
+
9. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
|
| 122 |
+
10. Polish the repo only after the artifacts are real.
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -16,6 +16,8 @@
|
|
| 16 |
- [x] Northflank smoke test has passed on the team H100
|
| 17 |
- [x] current 3-knob family has been checked against the real low-fidelity verifier
|
| 18 |
- [ ] parameterization repair is implemented so triangularity is controllable
|
|
|
|
|
|
|
| 19 |
- [ ] tracked `P1` fixtures are added
|
| 20 |
- [ ] manual playtest evidence is recorded
|
| 21 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
|
@@ -269,6 +271,13 @@ This is not trying to expose the full Fourier-boundary space. The goal is a legi
|
|
| 269 |
- `submit`
|
| 270 |
- exhausted budget
|
| 271 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 272 |
### Terminal Contract
|
| 273 |
|
| 274 |
The episode should end cleanly and deterministically.
|
|
@@ -305,6 +314,10 @@ Implementation split:
|
|
| 305 |
|
| 306 |
The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
|
| 307 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
## 11. Reward V0
|
| 309 |
|
| 310 |
The reward in this document is not the final reward. It is `Reward V0`.
|
|
@@ -330,6 +343,7 @@ Current execution note:
|
|
| 330 |
- do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
|
| 331 |
- once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
|
| 332 |
- clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
|
|
|
|
| 333 |
|
| 334 |
### Reward V0 Failure Modes To Test
|
| 335 |
|
|
@@ -377,6 +391,8 @@ These are still hypotheses until manually or empirically checked:
|
|
| 377 |
- `restore_best` is useful without becoming an exploit
|
| 378 |
- heuristic should beat random on mean episode reward
|
| 379 |
- low-fidelity interaction is predictive enough for useful policy learning
|
|
|
|
|
|
|
| 380 |
|
| 381 |
These should not be narrated as facts in the final demo until validated.
|
| 382 |
|
|
@@ -517,6 +533,8 @@ The repo should make the environment easy to understand:
|
|
| 517 |
- observation schema frozen
|
| 518 |
- action schema frozen
|
| 519 |
- terminal conditions frozen
|
|
|
|
|
|
|
| 520 |
|
| 521 |
### Gate 2: Verifier Wiring Pass
|
| 522 |
|
|
@@ -575,21 +593,23 @@ Deliverables:
|
|
| 575 |
|
| 576 |
### Phase 1
|
| 577 |
|
| 578 |
-
|
| 579 |
|
| 580 |
Deliverables:
|
| 581 |
|
| 582 |
-
-
|
| 583 |
-
-
|
| 584 |
-
-
|
| 585 |
-
-
|
| 586 |
|
| 587 |
### Phase 2
|
| 588 |
|
| 589 |
-
|
| 590 |
|
| 591 |
Deliverables:
|
| 592 |
|
|
|
|
|
|
|
| 593 |
- 5 to 10 episode logs
|
| 594 |
- notes on leverage, ambiguity, and pathologies
|
| 595 |
|
|
@@ -688,7 +708,7 @@ Instead:
|
|
| 688 |
- simplify the initial states
|
| 689 |
- tighten the action set
|
| 690 |
- reduce magnitude choices
|
| 691 |
-
- keep the environment more learnable
|
| 692 |
|
| 693 |
### If the task is too easy
|
| 694 |
|
|
@@ -696,6 +716,7 @@ Do not add more domains.
|
|
| 696 |
|
| 697 |
Instead:
|
| 698 |
|
|
|
|
| 699 |
- adjust budget
|
| 700 |
- adjust magnitudes
|
| 701 |
- adjust reward to discourage trivial submission
|
|
@@ -728,10 +749,11 @@ That last line is intentionally conservative. It is strong enough without claimi
|
|
| 728 |
|
| 729 |
## 21. Immediate Next Actions
|
| 730 |
|
| 731 |
-
1.
|
| 732 |
-
2.
|
| 733 |
-
3.
|
| 734 |
-
4. Run
|
| 735 |
-
5.
|
| 736 |
-
6.
|
| 737 |
-
7.
|
|
|
|
|
|
| 16 |
- [x] Northflank smoke test has passed on the team H100
|
| 17 |
- [x] current 3-knob family has been checked against the real low-fidelity verifier
|
| 18 |
- [ ] parameterization repair is implemented so triangularity is controllable
|
| 19 |
+
- [ ] explicit VMEC failure semantics are implemented
|
| 20 |
+
- [ ] low-fi `run` truth vs high-fi `submit` truth is labeled clearly in the environment surface
|
| 21 |
- [ ] tracked `P1` fixtures are added
|
| 22 |
- [ ] manual playtest evidence is recorded
|
| 23 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
|
|
|
| 271 |
- `submit`
|
| 272 |
- exhausted budget
|
| 273 |
|
| 274 |
+
Failure semantics must also be explicit:
|
| 275 |
+
|
| 276 |
+
- if VMEC or the forward model fails, the run still consumes budget
|
| 277 |
+
- the observation must expose that the step failed
|
| 278 |
+
- the reward must apply a documented penalty
|
| 279 |
+
- the environment must not silently replace the failed result with a fake success path
|
| 280 |
+
|
| 281 |
### Terminal Contract
|
| 282 |
|
| 283 |
The episode should end cleanly and deterministically.
|
|
|
|
| 314 |
|
| 315 |
The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
|
| 316 |
|
| 317 |
+
Current execution rule:
|
| 318 |
+
|
| 319 |
+
- do not narrate guessed repaired-family ranges, deltas, or a larger budget as settled defaults until they are measured on the repaired family
|
| 320 |
+
|
| 321 |
## 11. Reward V0
|
| 322 |
|
| 323 |
The reward in this document is not the final reward. It is `Reward V0`.
|
|
|
|
| 343 |
- do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
|
| 344 |
- once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
|
| 345 |
- clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
|
| 346 |
+
- do not use reward complexity to compensate for missing action expressivity or missing VMEC failure semantics
|
| 347 |
|
| 348 |
### Reward V0 Failure Modes To Test
|
| 349 |
|
|
|
|
| 391 |
- `restore_best` is useful without becoming an exploit
|
| 392 |
- heuristic should beat random on mean episode reward
|
| 393 |
- low-fidelity interaction is predictive enough for useful policy learning
|
| 394 |
+
- useful repaired-family parameter ranges and deltas
|
| 395 |
+
- whether the current budget should stay at `6` or change after playtesting
|
| 396 |
|
| 397 |
These should not be narrated as facts in the final demo until validated.
|
| 398 |
|
|
|
|
| 533 |
- observation schema frozen
|
| 534 |
- action schema frozen
|
| 535 |
- terminal conditions frozen
|
| 536 |
+
- explicit VMEC failure semantics defined
|
| 537 |
+
- low-fi vs high-fi metric labeling defined
|
| 538 |
|
| 539 |
### Gate 2: Verifier Wiring Pass
|
| 540 |
|
|
|
|
| 593 |
|
| 594 |
### Phase 1
|
| 595 |
|
| 596 |
+
Repair the low-dimensional parameterization, wire the verifier split cleanly, and run a small measured sweep before fixture checks.
|
| 597 |
|
| 598 |
Deliverables:
|
| 599 |
|
| 600 |
+
- repaired low-dimensional boundary builder
|
| 601 |
+
- boundary-based verifier split
|
| 602 |
+
- explicit VMEC failure semantics
|
| 603 |
+
- measured parameter ranges, deltas, and candidate reset seeds
|
| 604 |
|
| 605 |
### Phase 2
|
| 606 |
|
| 607 |
+
Freeze initial fixtures and manual-playtest the environment.
|
| 608 |
|
| 609 |
Deliverables:
|
| 610 |
|
| 611 |
+
- one good or near-boundary fixture
|
| 612 |
+
- bad fixtures
|
| 613 |
- 5 to 10 episode logs
|
| 614 |
- notes on leverage, ambiguity, and pathologies
|
| 615 |
|
|
|
|
| 708 |
- simplify the initial states
|
| 709 |
- tighten the action set
|
| 710 |
- reduce magnitude choices
|
| 711 |
+
- keep the environment more learnable before changing the budget
|
| 712 |
|
| 713 |
### If the task is too easy
|
| 714 |
|
|
|
|
| 716 |
|
| 717 |
Instead:
|
| 718 |
|
| 719 |
+
- first verify that parameterization repair and reset seeds did not make the task trivial
|
| 720 |
- adjust budget
|
| 721 |
- adjust magnitudes
|
| 722 |
- adjust reward to discourage trivial submission
|
|
|
|
| 749 |
|
| 750 |
## 21. Immediate Next Actions
|
| 751 |
|
| 752 |
+
1. Repair the low-dimensional boundary parameterization so triangularity is controllable.
|
| 753 |
+
2. Split boundary construction from official boundary evaluation.
|
| 754 |
+
3. Add explicit VMEC failure semantics and clear low-fi vs high-fi labeling.
|
| 755 |
+
4. Run a small measured sweep before locking ranges, deltas, or budget changes.
|
| 756 |
+
5. Freeze fixtures and run manual playtests before heavy training work.
|
| 757 |
+
6. Mark the current reward as `V0`.
|
| 758 |
+
7. Log the first real pathology and reward revision.
|
| 759 |
+
8. Do not let notebook or video work outrun the environment evidence.
|
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -9,19 +9,23 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
|
|
| 9 |
## Current Branch Status
|
| 10 |
|
| 11 |
- [x] `P1` task is locked
|
| 12 |
-
- [x] rotating-ellipse `P1` contract is implemented in the working tree
|
| 13 |
- [x] baselines and API surface have been moved to the `P1` contract
|
| 14 |
- [x] add a post-terminal guard in `step()`
|
| 15 |
- [x] replace the synthetic evaluator with `constellaration`
|
| 16 |
- [x] re-run baselines on the real verifier path
|
| 17 |
- [x] commit the Northflank smoke workflow and note
|
| 18 |
- [x] pass the Northflank smoke test on the team H100
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
- [ ] add tracked fixtures and manual playtest evidence
|
| 20 |
- [ ] refresh the heuristic baseline after the real-verifier rerun
|
| 21 |
|
| 22 |
Current caution:
|
| 23 |
|
| 24 |
-
- do not assume the
|
| 25 |
|
| 26 |
## Plan V2 Inheritance
|
| 27 |
|
|
@@ -73,9 +77,10 @@ Artifacts:
|
|
| 73 |
- reward shaping
|
| 74 |
- manual playtesting
|
| 75 |
5. Mark open assumptions explicitly:
|
| 76 |
-
- whether the
|
| 77 |
- whether the fixed step budget is enough
|
| 78 |
- whether `restore_best` is useful without becoming an exploit
|
|
|
|
| 79 |
|
| 80 |
Exit condition: a human can read the spec and understand how to act in the environment.
|
| 81 |
|
|
@@ -90,34 +95,41 @@ Transition rule:
|
|
| 90 |
|
| 91 |
## Hour 2-4: Verify Wiring, Then Manual Playtest
|
| 92 |
|
| 93 |
-
1.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
- known-good or near-winning design
|
| 95 |
- near-boundary designs
|
| 96 |
- clearly bad designs
|
| 97 |
-
- do not rely on the default baseline params as the only starting point
|
| 98 |
-
|
| 99 |
- verifier outputs are sane
|
| 100 |
- reward ordering is sane
|
| 101 |
- objective direction is correct
|
| 102 |
-
|
| 103 |
-
|
| 104 |
- observation
|
| 105 |
- chosen action
|
| 106 |
- expected effect
|
| 107 |
- returned reward
|
| 108 |
- confusion or exploit if observed
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
- initial reward V0
|
| 113 |
- bad behavior
|
| 114 |
- refinement to reward V1
|
| 115 |
- improved behavior
|
| 116 |
-
|
| 117 |
|
| 118 |
Exit condition: you can explain why the environment now rewards the intended behavior.
|
| 119 |
|
| 120 |
Artifacts:
|
|
|
|
|
|
|
|
|
|
| 121 |
- fixture check note
|
| 122 |
- manual playtest log
|
| 123 |
- reward shaping note
|
|
@@ -205,16 +217,17 @@ Artifacts:
|
|
| 205 |
## Artifact Order
|
| 206 |
|
| 207 |
1. Environment spec
|
| 208 |
-
2.
|
| 209 |
-
3.
|
| 210 |
-
4.
|
| 211 |
-
5.
|
| 212 |
-
6.
|
| 213 |
-
7.
|
| 214 |
-
8.
|
| 215 |
-
9.
|
| 216 |
-
10.
|
| 217 |
-
11.
|
|
|
|
| 218 |
|
| 219 |
## Non-Negotiables
|
| 220 |
|
|
@@ -223,6 +236,7 @@ Artifacts:
|
|
| 223 |
- Do not optimize training before manual playtesting.
|
| 224 |
- Do not rely on reward curves alone; keep trajectory evidence.
|
| 225 |
- Do not narrate hypotheses as facts before they are checked.
|
|
|
|
| 226 |
- Do not polish the repo or video before the environment and baselines are real.
|
| 227 |
- Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
|
| 228 |
- Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
|
|
|
|
| 9 |
## Current Branch Status
|
| 10 |
|
| 11 |
- [x] `P1` task is locked
|
| 12 |
+
- [x] 3-knob rotating-ellipse `P1` contract is implemented in the working tree
|
| 13 |
- [x] baselines and API surface have been moved to the `P1` contract
|
| 14 |
- [x] add a post-terminal guard in `step()`
|
| 15 |
- [x] replace the synthetic evaluator with `constellaration`
|
| 16 |
- [x] re-run baselines on the real verifier path
|
| 17 |
- [x] commit the Northflank smoke workflow and note
|
| 18 |
- [x] pass the Northflank smoke test on the team H100
|
| 19 |
+
- [x] verify that the current 3-knob family is blocked on P1 triangularity under the real verifier path
|
| 20 |
+
- [ ] repair the low-dimensional parameterization
|
| 21 |
+
- [ ] add explicit VMEC failure semantics
|
| 22 |
+
- [ ] label low-fi `run` truth vs high-fi `submit` truth in the task surface
|
| 23 |
- [ ] add tracked fixtures and manual playtest evidence
|
| 24 |
- [ ] refresh the heuristic baseline after the real-verifier rerun
|
| 25 |
|
| 26 |
Current caution:
|
| 27 |
|
| 28 |
+
- do not assume the current 3-knob family is a viable playtest start; parameterization repair comes before fixture discovery, manual playtesting, and heuristic refresh
|
| 29 |
|
| 30 |
## Plan V2 Inheritance
|
| 31 |
|
|
|
|
| 77 |
- reward shaping
|
| 78 |
- manual playtesting
|
| 79 |
5. Mark open assumptions explicitly:
|
| 80 |
+
- whether the repaired low-dimensional action set is expressive enough
|
| 81 |
- whether the fixed step budget is enough
|
| 82 |
- whether `restore_best` is useful without becoming an exploit
|
| 83 |
+
- whether repaired-family ranges and deltas need adjustment after measurement
|
| 84 |
|
| 85 |
Exit condition: a human can read the spec and understand how to act in the environment.
|
| 86 |
|
|
|
|
| 95 |
|
| 96 |
## Hour 2-4: Verify Wiring, Then Manual Playtest
|
| 97 |
|
| 98 |
+
1. Repair the low-dimensional parameterization so triangularity is controllable.
|
| 99 |
+
2. Add explicit VMEC failure semantics and visible failure observations.
|
| 100 |
+
3. Label low-fi `run` truth vs high-fi `submit` truth clearly.
|
| 101 |
+
4. Run a small measured sweep on the repaired family before freezing defaults.
|
| 102 |
+
5. Run fixture checks:
|
| 103 |
- known-good or near-winning design
|
| 104 |
- near-boundary designs
|
| 105 |
- clearly bad designs
|
| 106 |
+
- do not rely on the current default baseline params as the only starting point
|
| 107 |
+
6. Confirm:
|
| 108 |
- verifier outputs are sane
|
| 109 |
- reward ordering is sane
|
| 110 |
- objective direction is correct
|
| 111 |
+
7. Manually play 5 to 10 episodes.
|
| 112 |
+
8. Log for each step:
|
| 113 |
- observation
|
| 114 |
- chosen action
|
| 115 |
- expected effect
|
| 116 |
- returned reward
|
| 117 |
- confusion or exploit if observed
|
| 118 |
+
9. Identify at least one bad incentive or exploit.
|
| 119 |
+
10. Patch reward or penalty logic immediately.
|
| 120 |
+
11. Write the reward shaping story:
|
| 121 |
- initial reward V0
|
| 122 |
- bad behavior
|
| 123 |
- refinement to reward V1
|
| 124 |
- improved behavior
|
| 125 |
+
12. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
|
| 126 |
|
| 127 |
Exit condition: you can explain why the environment now rewards the intended behavior.
|
| 128 |
|
| 129 |
Artifacts:
|
| 130 |
+
- repaired low-dimensional boundary plan
|
| 131 |
+
- explicit failure semantics note
|
| 132 |
+
- measured range and delta note
|
| 133 |
- fixture check note
|
| 134 |
- manual playtest log
|
| 135 |
- reward shaping note
|
|
|
|
| 217 |
## Artifact Order
|
| 218 |
|
| 219 |
1. Environment spec
|
| 220 |
+
2. Repaired parameterization note
|
| 221 |
+
3. Fixture check note
|
| 222 |
+
4. Manual playtest log
|
| 223 |
+
5. Reward revision note
|
| 224 |
+
6. Stable task run
|
| 225 |
+
7. Random baseline
|
| 226 |
+
8. Heuristic baseline
|
| 227 |
+
9. Northflank traces or training evidence
|
| 228 |
+
10. Colab training or eval evidence
|
| 229 |
+
11. Demo recording
|
| 230 |
+
12. Repo polish
|
| 231 |
|
| 232 |
## Non-Negotiables
|
| 233 |
|
|
|
|
| 236 |
- Do not optimize training before manual playtesting.
|
| 237 |
- Do not rely on reward curves alone; keep trajectory evidence.
|
| 238 |
- Do not narrate hypotheses as facts before they are checked.
|
| 239 |
+
- Do not guess repaired-family ranges, deltas, or budget changes without a measured sweep.
|
| 240 |
- Do not polish the repo or video before the environment and baselines are real.
|
| 241 |
- Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
|
| 242 |
- Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
|
docs/P1_ENV_CONTRACT_V1.md
CHANGED
|
@@ -72,6 +72,7 @@ The verifier layer should own:
|
|
| 72 |
- official `P1` feasibility semantics
|
| 73 |
- official `P1` objective direction
|
| 74 |
- score ordering
|
|
|
|
| 75 |
|
| 76 |
The verifier layer should not own:
|
| 77 |
|
|
@@ -91,6 +92,12 @@ Target controllable knobs:
|
|
| 91 |
- `rotational_transform`
|
| 92 |
- `triangularity_scale`
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
Important naming rule:
|
| 95 |
|
| 96 |
- once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
|
|
@@ -168,6 +175,8 @@ Do not add:
|
|
| 168 |
- bonuses for matching a known winner
|
| 169 |
- hand-coded constraint tricks to hide a blocked action family
|
| 170 |
|
|
|
|
|
|
|
| 171 |
## Reset Strategy
|
| 172 |
|
| 173 |
Start with frozen exact seeds, not jitter.
|
|
@@ -209,9 +218,11 @@ before tuning reward further
|
|
| 209 |
3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
|
| 210 |
4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
|
| 211 |
5. Update the task summary in [server/app.py](../server/app.py).
|
| 212 |
-
6.
|
| 213 |
-
7. Run
|
| 214 |
-
8.
|
|
|
|
|
|
|
| 215 |
|
| 216 |
## Out of Scope
|
| 217 |
|
|
|
|
| 72 |
- official `P1` feasibility semantics
|
| 73 |
- official `P1` objective direction
|
| 74 |
- score ordering
|
| 75 |
+
- explicit failure results when VMEC or forward-model evaluation fails
|
| 76 |
|
| 77 |
The verifier layer should not own:
|
| 78 |
|
|
|
|
| 92 |
- `rotational_transform`
|
| 93 |
- `triangularity_scale`
|
| 94 |
|
| 95 |
+
Current measurement rule:
|
| 96 |
+
|
| 97 |
+
- do not lock exact repaired-family ranges or deltas from prose alone
|
| 98 |
+
- measure them on the repaired boundary family before presenting them as defaults
|
| 99 |
+
- especially treat `rotational_transform` bounds, `triangularity_scale` deltas, and budget changes as open until measured
|
| 100 |
+
|
| 101 |
Important naming rule:
|
| 102 |
|
| 103 |
- once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
|
|
|
|
| 175 |
- bonuses for matching a known winner
|
| 176 |
- hand-coded constraint tricks to hide a blocked action family
|
| 177 |
|
| 178 |
+
Do not use reward complexity to compensate for missing action expressivity or missing crash semantics.
|
| 179 |
+
|
| 180 |
## Reset Strategy
|
| 181 |
|
| 182 |
Start with frozen exact seeds, not jitter.
|
|
|
|
| 218 |
3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
|
| 219 |
4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
|
| 220 |
5. Update the task summary in [server/app.py](../server/app.py).
|
| 221 |
+
6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
|
| 222 |
+
7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
|
| 223 |
+
8. Freeze 1-2 repaired low-dimensional fixtures.
|
| 224 |
+
9. Run manual playtesting.
|
| 225 |
+
10. Refresh the heuristic baseline only after that evidence exists.
|
| 226 |
|
| 227 |
## Out of Scope
|
| 228 |
|