Spaces:

CreativeEngineer
/

fusion-design-lab

Running on CPU Upgrade

App Files Files Community

CreativeEngineer commited on 10 days ago

Commit

918007b

1 Parent(s): 1ff8c37

docs: correct baseline fixture status

Browse files

Files changed (6) hide show

README.md +1 -1
TODO.md +3 -1
baselines/README.md +1 -0
docs/FUSION_DESIGN_LAB_PLAN_V2.md +6 -0
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +5 -0
docs/PIVOT_P1_ROTATING_ELLIPSE.md +6 -1

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ Implementation status:
 ## Known Gaps
-- `BASELINE_PARAMS` is intentionally repairable but currently infeasible at reset; do not describe it as a feasible anchor.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.

 ## Known Gaps
+- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen before meaningful manual playtesting.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.

TODO.md CHANGED Viewed

@@ -99,6 +99,8 @@ flowchart TD
   Files:
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
 - [ ] Run fixture sanity checks
   Goal:
@@ -190,5 +192,5 @@ flowchart TD
 - [ ] Do not let notebook or demo work outrun environment evidence
 - [ ] Do not add training-first complexity before manual playtesting
 - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
-- [ ] Do not describe the current baseline reset state as already feasible
 - [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting

   Files:
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
+  Note:
+  the default baseline params are not near-feasible on the real verifier path, so they are not enough for the fixture set by themselves
 - [ ] Run fixture sanity checks
   Goal:
 - [ ] Do not let notebook or demo work outrun environment evidence
 - [ ] Do not add training-first complexity before manual playtesting
 - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
+- [ ] Do not describe the current baseline reset state as feasible or near-feasible
 - [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting

baselines/README.md CHANGED Viewed

@@ -7,6 +7,7 @@ Random and heuristic baselines will live here.
 - [x] baseline comparison script exists
 - [x] baseline comparison rerun completed on the real verifier path
 - [ ] heuristic refreshed after the real-verifier rerun
 - [ ] presentation-ready comparison trace exported
 The first baseline milestone is:

 - [x] baseline comparison script exists
 - [x] baseline comparison rerun completed on the real verifier path
 - [ ] heuristic refreshed after the real-verifier rerun
+- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
 - [ ] presentation-ready comparison trace exported
 The first baseline milestone is:

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -17,6 +17,10 @@
 - [ ] heuristic baseline is refreshed for the real verifier path
 - [ ] HF Space deployment evidence is recorded
 ## 1. Submission Thesis
 We are not primarily submitting "a trained model for fusion."
@@ -323,6 +327,8 @@ Use:
 - a few near-boundary designs
 - a few clearly infeasible designs
 Purpose:
 - verify the verifier is wired correctly

 - [ ] heuristic baseline is refreshed for the real verifier path
 - [ ] HF Space deployment evidence is recorded
+Current caution:
+- the default baseline params are not currently a near-feasible playtest anchor on the real verifier path, so fixture discovery is a real prerequisite for meaningful manual playtesting
 ## 1. Submission Thesis
 We are not primarily submitting "a trained model for fusion."
 - a few near-boundary designs
 - a few clearly infeasible designs
+Do not assume the default baseline params are enough for this set. They are currently useful as an infeasible reference, not as a near-feasible anchor.
 Purpose:
 - verify the verifier is wired correctly

docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED Viewed

@@ -17,6 +17,10 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
 - [ ] add tracked fixtures and manual playtest evidence
 - [ ] refresh the heuristic baseline after the real-verifier rerun
 ## Plan V2 Inheritance
 Carry these rules through the whole checklist:
@@ -87,6 +91,7 @@ Transition rule:
    - known-good or near-winning design
    - near-boundary designs
    - clearly bad designs
 2. Confirm:
    - verifier outputs are sane
    - reward ordering is sane

 - [ ] add tracked fixtures and manual playtest evidence
 - [ ] refresh the heuristic baseline after the real-verifier rerun
+Current caution:
+- do not assume the default baseline params are a near-feasible playtest start; on the current real verifier path they are still deeply infeasible, so fixture discovery comes first
 ## Plan V2 Inheritance
 Carry these rules through the whole checklist:
    - known-good or near-winning design
    - near-boundary designs
    - clearly bad designs
+   - do not rely on the default baseline params as the only starting point
 2. Confirm:
    - verifier outputs are sane
    - reward ordering is sane

docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED Viewed

@@ -15,6 +15,10 @@ Use this file as rationale for the pivot, not as a fresh planning queue. Once th
 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
 ## Decision
 Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
@@ -239,8 +243,9 @@ If full high-fidelity `constellaration` deployment fails (Docker build, HF Space
 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
 1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
-3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
 These are for verifier/reward sanity, not a prerequisite seed-mining project.

 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
+Current caution:
+- the default rotating-ellipse baseline params are currently useful as an infeasible reference, not as a near-feasible anchor, so the fixture set still needs a better boundary-region map
 ## Decision
 Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
 1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
+1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative or repair reference only
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
+3. **Near-boundary anchor:** still needs to be found from real verifier probing before manual playtesting
 These are for verifier/reward sanity, not a prerequisite seed-mining project.