CreativeEngineer commited on
Commit
918007b
·
1 Parent(s): 1ff8c37

docs: correct baseline fixture status

Browse files
README.md CHANGED
@@ -42,7 +42,7 @@ Implementation status:
42
 
43
  ## Known Gaps
44
 
45
- - `BASELINE_PARAMS` is intentionally repairable but currently infeasible at reset; do not describe it as a feasible anchor.
46
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
47
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
48
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
 
42
 
43
  ## Known Gaps
44
 
45
+ - `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen before meaningful manual playtesting.
46
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
47
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
48
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
TODO.md CHANGED
@@ -99,6 +99,8 @@ flowchart TD
99
  Files:
100
  [server/data/p1/README.md](server/data/p1/README.md),
101
  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
 
 
102
 
103
  - [ ] Run fixture sanity checks
104
  Goal:
@@ -190,5 +192,5 @@ flowchart TD
190
  - [ ] Do not let notebook or demo work outrun environment evidence
191
  - [ ] Do not add training-first complexity before manual playtesting
192
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
193
- - [ ] Do not describe the current baseline reset state as already feasible
194
  - [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
 
99
  Files:
100
  [server/data/p1/README.md](server/data/p1/README.md),
101
  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
102
+ Note:
103
+ the default baseline params are not near-feasible on the real verifier path, so they are not enough for the fixture set by themselves
104
 
105
  - [ ] Run fixture sanity checks
106
  Goal:
 
192
  - [ ] Do not let notebook or demo work outrun environment evidence
193
  - [ ] Do not add training-first complexity before manual playtesting
194
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
195
+ - [ ] Do not describe the current baseline reset state as feasible or near-feasible
196
  - [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
baselines/README.md CHANGED
@@ -7,6 +7,7 @@ Random and heuristic baselines will live here.
7
  - [x] baseline comparison script exists
8
  - [x] baseline comparison rerun completed on the real verifier path
9
  - [ ] heuristic refreshed after the real-verifier rerun
 
10
  - [ ] presentation-ready comparison trace exported
11
 
12
  The first baseline milestone is:
 
7
  - [x] baseline comparison script exists
8
  - [x] baseline comparison rerun completed on the real verifier path
9
  - [ ] heuristic refreshed after the real-verifier rerun
10
+ - [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
11
  - [ ] presentation-ready comparison trace exported
12
 
13
  The first baseline milestone is:
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -17,6 +17,10 @@
17
  - [ ] heuristic baseline is refreshed for the real verifier path
18
  - [ ] HF Space deployment evidence is recorded
19
 
 
 
 
 
20
  ## 1. Submission Thesis
21
 
22
  We are not primarily submitting "a trained model for fusion."
@@ -323,6 +327,8 @@ Use:
323
  - a few near-boundary designs
324
  - a few clearly infeasible designs
325
 
 
 
326
  Purpose:
327
 
328
  - verify the verifier is wired correctly
 
17
  - [ ] heuristic baseline is refreshed for the real verifier path
18
  - [ ] HF Space deployment evidence is recorded
19
 
20
+ Current caution:
21
+
22
+ - the default baseline params are not currently a near-feasible playtest anchor on the real verifier path, so fixture discovery is a real prerequisite for meaningful manual playtesting
23
+
24
  ## 1. Submission Thesis
25
 
26
  We are not primarily submitting "a trained model for fusion."
 
327
  - a few near-boundary designs
328
  - a few clearly infeasible designs
329
 
330
+ Do not assume the default baseline params are enough for this set. They are currently useful as an infeasible reference, not as a near-feasible anchor.
331
+
332
  Purpose:
333
 
334
  - verify the verifier is wired correctly
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED
@@ -17,6 +17,10 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
17
  - [ ] add tracked fixtures and manual playtest evidence
18
  - [ ] refresh the heuristic baseline after the real-verifier rerun
19
 
 
 
 
 
20
  ## Plan V2 Inheritance
21
 
22
  Carry these rules through the whole checklist:
@@ -87,6 +91,7 @@ Transition rule:
87
  - known-good or near-winning design
88
  - near-boundary designs
89
  - clearly bad designs
 
90
  2. Confirm:
91
  - verifier outputs are sane
92
  - reward ordering is sane
 
17
  - [ ] add tracked fixtures and manual playtest evidence
18
  - [ ] refresh the heuristic baseline after the real-verifier rerun
19
 
20
+ Current caution:
21
+
22
+ - do not assume the default baseline params are a near-feasible playtest start; on the current real verifier path they are still deeply infeasible, so fixture discovery comes first
23
+
24
  ## Plan V2 Inheritance
25
 
26
  Carry these rules through the whole checklist:
 
91
  - known-good or near-winning design
92
  - near-boundary designs
93
  - clearly bad designs
94
+ - do not rely on the default baseline params as the only starting point
95
  2. Confirm:
96
  - verifier outputs are sane
97
  - reward ordering is sane
docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED
@@ -15,6 +15,10 @@ Use this file as rationale for the pivot, not as a fresh planning queue. Once th
15
  - [ ] manual playtest evidence is recorded
16
  - [ ] heuristic baseline is refreshed for the real verifier path
17
 
 
 
 
 
18
  ## Decision
19
 
20
  Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
@@ -239,8 +243,9 @@ If full high-fidelity `constellaration` deployment fails (Docker build, HF Space
239
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
240
 
241
  1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
 
242
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
243
- 3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
244
 
245
  These are for verifier/reward sanity, not a prerequisite seed-mining project.
246
 
 
15
  - [ ] manual playtest evidence is recorded
16
  - [ ] heuristic baseline is refreshed for the real verifier path
17
 
18
+ Current caution:
19
+
20
+ - the default rotating-ellipse baseline params are currently useful as an infeasible reference, not as a near-feasible anchor, so the fixture set still needs a better boundary-region map
21
+
22
  ## Decision
23
 
24
  Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
 
243
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
244
 
245
  1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
246
+ 1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative or repair reference only
247
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
248
+ 3. **Near-boundary anchor:** still needs to be found from real verifier probing before manual playtesting
249
 
250
  These are for verifier/reward sanity, not a prerequisite seed-mining project.
251