File size: 12,917 Bytes
61fc39b
 
 
 
e815b38
61fc39b
daba1b9
2d47f4f
e815b38
daba1b9
61fc39b
e815b38
 
 
 
 
 
61fc39b
 
daba1b9
e815b38
 
61fc39b
 
daba1b9
 
 
6deaccc
 
daba1b9
 
d58c100
daba1b9
d58c100
2d47f4f
fe3a41d
 
 
 
 
88d9b78
e815b38
 
daba1b9
d58c100
c3a24db
 
 
f238af4
 
 
daba1b9
61fc39b
 
 
 
ba716cf
61fc39b
ba716cf
2d47f4f
8bf0155
 
 
cdc237b
1c1f314
 
 
 
61fc39b
 
 
 
daba1b9
61fc39b
2fccde8
61fc39b
daba1b9
e815b38
61fc39b
2d47f4f
61fc39b
daba1b9
e815b38
daba1b9
61fc39b
2d47f4f
 
6deaccc
2d47f4f
 
e815b38
2d47f4f
61fc39b
 
daba1b9
61fc39b
daba1b9
 
61fc39b
daba1b9
61fc39b
daba1b9
 
e815b38
61fc39b
daba1b9
61fc39b
daba1b9
 
 
61fc39b
d58c100
61fc39b
daba1b9
cdc237b
daba1b9
 
 
 
 
 
61fc39b
fe3a41d
2d47f4f
 
 
 
 
 
 
 
 
 
fe3a41d
2d47f4f
 
 
 
 
 
 
fe3a41d
2d47f4f
 
 
 
 
 
 
 
fe3a41d
2d47f4f
 
 
 
 
 
 
 
 
88d9b78
 
 
 
 
 
 
 
 
61fc39b
 
e815b38
2d47f4f
 
 
 
 
6deaccc
 
 
 
 
 
e815b38
61fc39b
daba1b9
e815b38
918007b
c3a24db
61fc39b
c3a24db
61fc39b
e815b38
61fc39b
daba1b9
e815b38
61fc39b
c3a24db
1c1f314
 
 
 
8bf0155
 
c3a24db
 
513a2e2
1c1f314
61fc39b
 
1c1f314
61fc39b
daba1b9
e815b38
61fc39b
2fccde8
61fc39b
 
 
daba1b9
 
 
cdc237b
 
 
 
 
 
 
2fccde8
ba716cf
2fccde8
ba716cf
 
 
 
daba1b9
 
 
 
 
 
61fc39b
 
 
daba1b9
 
 
 
 
 
61fc39b
daba1b9
 
61fc39b
d58c100
61fc39b
daba1b9
61fc39b
d58c100
 
 
 
61fc39b
 
 
 
 
 
 
 
e815b38
daba1b9
61fc39b
 
 
daba1b9
61fc39b
 
 
 
 
 
 
daba1b9
61fc39b
513a2e2
61fc39b
daba1b9
e815b38
61fc39b
 
 
 
2d47f4f
 
61fc39b
 
1c1f314
513a2e2
d58c100
88d9b78
918007b
cdc237b
2fccde8
cdc237b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
# Fusion Design Lab TODO

This is the execution tracker for the hackathon repo.

Use this file for day-of build progress. Use the linked docs for rationale, contract truth, and submission framing:

- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
- [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
- [P1 Parameterization Deep-Dive](docs/P1_PARAMETERIZATION_DEEPDIVE.md)
- [Repo Guardrails](AGENTS.md)

Archived legacy references:

- [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)
- [Deliverables Map](docs/archive/FUSION_DELIVERABLES_MAP.md)
- [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)

Priority source:

- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
- [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md) is the technical contract SSOT
- [P1 Parameterization Deep-Dive](docs/P1_PARAMETERIZATION_DEEPDIVE.md) is the evidence and rationale record
- this file should track execution progress only

## Current State

- [x] `P1` strategy is locked
- [x] shared models reflect the repaired low-dimensional `P1` contract
- [x] environment loop reflects the repaired low-dimensional `P1` contract
- [x] API/task surface reflects `P1`
- [x] baselines reflect the `P1` contract
- [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
- [x] post-terminal guard in `step()`
- [x] `constellaration` verifier wiring
- [x] verify the current 3-knob family against the real low-fidelity verifier
- [x] repair the low-dimensional parameterization so triangularity is controllable
- [x] split boundary building from boundary evaluation
- [x] update the action schema from 3 knobs to the repaired low-dimensional family
- [x] add explicit VMEC failure semantics
- [x] label low-fi vs high-fi truth in the observation/task surface
- [x] separate high-fi submit scoring/reporting from low-fi rollout score state
- [x] tracked `P1` fixtures
- [x] manual playtest log
- [x] settle the non-submit terminal reward policy
- [x] baseline comparison has been re-run on the `constellaration` branch state
- [x] tiny low-fi PPO smoke run exists
  Note:
  `training/ppo_smoke.py` now runs a diagnostic-only low-fidelity PPO smoke pass and the first artifact is summarized in `docs/P1_PPO_SMOKE_NOTE.md`
- [x] refresh the heuristic baseline for the real verifier path
  Note:
  the refreshed heuristic now uses the measured `rotational_transform -> triangularity_scale -> elongation -> submit` path; a fresh `uv run python baselines/compare.py 5` rerun finished at `5/5` feasible high-fidelity finals and `5/5` wins over random

## Execution Graph

```mermaid
flowchart TD
    A["Northflank Smoke Test"] --> E["Fixture Checks"]
    B["P1 Contract Lock"] --> D["P1 Models + Environment"]
    C["constellaration Physics Wiring"] --> D
    D --> P["Parameterization Repair"]
    P --> F["Tiny PPO Smoke"]
    F --> E["Fixture Checks"]
    E --> G["Submit-side Manual Playtest"]
    G --> H["Reward V2"]
    H --> I["Baselines"]
    I --> J["HF Space Deploy"]
    J --> K["Colab Notebook"]
    K --> L["Demo + README"]
```

## Hour 0-2

- [x] Lock the exact `P1` environment contract
  Goal:
  freeze observation schema, action schema, episode loop, terminal conditions, and the live reward contract
  Related:
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)

- [x] Pass the Northflank smoke test
  Related:
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md),
  [training/notebooks/README.md](training/notebooks/README.md)

- [x] Verify that the current 3-knob family can or cannot approach P1 feasibility
  Goal:
  resolve the historical gating question about whether parameterization repair was required before more reward work
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md),
  [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)

## Fresh Wiring

- [x] Rewrite the shared models to the locked `P1` contract
  Files:
  [fusion_lab/models.py](fusion_lab/models.py),
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)

- [x] Rewrite the environment loop to the locked `P1` contract
  Files:
  [server/environment.py](server/environment.py),
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)

- [x] Add a post-terminal guard to the environment loop
  Files:
  [server/environment.py](server/environment.py)
  Goal:
  reject or no-op any `step()` call after terminal state so budget and step count do not drift past episode end

- [x] Replace the synthetic physics path with `constellaration` wiring
  Files:
  [server/physics.py](server/physics.py),
  [Dockerfile](Dockerfile),
  [pyproject.toml](pyproject.toml)

- [x] Update the API/task surface to match `P1`
  Files:
  [server/app.py](server/app.py),
  [README.md](README.md)

- [x] Repair the low-dimensional boundary family
  Goal:
  add an explicit triangularity control knob or equivalent low-dimensional control so the environment can actually approach P1 feasibility
  Files:
  [server/physics.py](server/physics.py),
  [fusion_lab/models.py](fusion_lab/models.py),
  [server/environment.py](server/environment.py),
  [server/app.py](server/app.py)
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Split boundary construction from boundary evaluation
  Goal:
  make the verifier boundary-based and keep parameterization-specific logic in the environment adapter layer
  Files:
  [server/physics.py](server/physics.py)
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Add explicit VMEC failure semantics
  Goal:
  failed evaluations must cost budget, return a visible failure observation, and apply a documented penalty without silent fallback
  Files:
  [server/physics.py](server/physics.py),
  [server/environment.py](server/environment.py)
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Label low-fi vs high-fi truth in the observation/task surface
  Goal:
  make it obvious whether a metric came from a low-fidelity `run` step or a high-fidelity `submit`
  Files:
  [fusion_lab/models.py](fusion_lab/models.py),
  [server/environment.py](server/environment.py),
  [server/app.py](server/app.py)
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Separate high-fi submit scoring/reporting from low-fi rollout score state
  Completed:
  submit-time reward now uses a high-fidelity initial reference, and submit summaries / displayed best score use high-fidelity state instead of low-fidelity rollout state
  Files:
  [server/environment.py](server/environment.py)
  [fusion_lab/models.py](fusion_lab/models.py)
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

## Validation and Reward

- [x] Run a small measured sweep on the repaired low-dimensional family
  Goal:
  choose useful parameter ranges, step deltas, and reset seeds from the repaired action family instead of guessing them from prose
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Clarify or split fidelity-dependent best-state observation fields
  Goal:
  replace ambiguous mixed best-state reporting with explicit low-fidelity and high-fidelity best-state fields before fixture evidence or baseline comparisons
  Related:
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Add 1-2 tracked `P1` fixtures
  Files:
  [server/data/p1/README.md](server/data/p1/README.md),
  [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)
  Note:
  paired high-fidelity submit checks are now written into each tracked fixture and summarized in `baselines/fixture_high_fidelity_pairs.json`

- [x] Run fixture sanity checks
  Goal:
  confirm paired low-fi/high-fi verifier outputs, objective direction, and reward ordering
  Related:
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)

- [x] Run a tiny low-fi PPO smoke pass
  Goal:
  fail quickly on learnability, reward exploits, and action-space problems before investing in longer training
  Note:
  treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
  stop after a few readable trajectories or one clear failure mode
  paired high-fidelity fixture checks must happen immediately after this smoke pass
  Status:
  first smoke artifact exists; next use of this step should only happen if a follow-up reward or observation change needs re-checking
  high-fidelity VMEC-backed `submit` should stay out of the normal RL inner loop

- [ ] Manual-playtest 5-10 episodes
  Goal:
  start with one submit-side trace, then expand the initial low-fidelity playtest note into 5-10 episodes and surface at least one pathology or ambiguity
  Related:
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [Deliverables Map](docs/archive/FUSION_DELIVERABLES_MAP.md)

- [x] Update reward from `V0` to `V1` after playtesting exposed a real repair-path pathology
  Goal:
  keep a short exploit -> fix -> behavior improvement story
  Related:
  [AGENTS.md](AGENTS.md),
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)

- [x] Update reward from `V1` to `V2` after the verifier-native shaping exposed short-horizon gaps
  Goal:
  add bounded new-best, near-feasible, and anti-stagnation terms without breaking the verifier-native reward story
  Related:
  [AGENTS.md](AGENTS.md),
  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)

- [x] Write down why `Reward V0` did not survive unchanged
  Goal:
  document the concrete pathology: pure `Δ official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
  Related:
  [README.md](README.md),
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)

- [x] Decide the non-submit terminal reward policy
  Goal:
  budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
  Files:
  [server/environment.py](server/environment.py),
  [README.md](README.md)

## Baselines

- [x] Implement the random baseline
  Files:
  [baselines/random_agent.py](baselines/random_agent.py),
  [baselines/compare.py](baselines/compare.py)

- [x] Implement the heuristic baseline
  Files:
  [baselines/heuristic_agent.py](baselines/heuristic_agent.py),
  [baselines/compare.py](baselines/compare.py)

- [x] Run the baseline comparison on the current `constellaration` branch state
  Files:
  [baselines/compare.py](baselines/compare.py)

- [ ] Refresh the heuristic baseline after the `constellaration` rerun
  Goal:
  the old synthetic-path heuristic no longer gives a useful anchor on the real verifier path; redesign it after manual playtesting

- [ ] Save one comparison trace that is presentation-ready
  Goal:
  show at least one stable trajectory and one heuristic-vs-random comparison

## Submission Surfaces

- [ ] Deploy the environment to HF Space
  Related:
  [Deliverables Map](docs/archive/FUSION_DELIVERABLES_MAP.md),
  [README.md](README.md)

- [ ] Create the thin public Colab notebook
  Files:
  [training/notebooks/README.md](training/notebooks/README.md)

- [ ] Record the 1-minute demo
  Goal:
  explain `P1`, show one trajectory, show reward iteration, show baseline evidence

- [ ] Finalize the public README
  Files:
  [README.md](README.md)

- [ ] Only treat training evidence as submission-ready if low-fidelity gains survive sparse high-fidelity evaluation
  Related:
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)

## Guardrails

- [ ] Do not reopen `P1 + rotating-ellipse` strategy without a real blocker
- [ ] Do not pretend the current 3-knob family is sufficient for P1 after the verified triangularity blocker
- [ ] Do not guess repaired-family ranges, deltas, or budget changes without measurement
- [ ] Do not port the old `ai-sci-feasible-designs` harness
- [ ] Do not let notebook or demo work outrun environment evidence
- [ ] Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
- [ ] Do not move high-fidelity VMEC-backed `submit` into the normal RL inner loop
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
- [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
- [x] Do not force a new reward-version story until the previous reward version shows a real pathology
  Note:
  completed by recording the concrete `Reward V0` pathology before `Reward V1`, then recording the concrete short-horizon `Reward V1` gaps before `Reward V2`