Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
918007b
1
Parent(s): 1ff8c37
docs: correct baseline fixture status
Browse files- README.md +1 -1
- TODO.md +3 -1
- baselines/README.md +1 -0
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +6 -0
- docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +5 -0
- docs/PIVOT_P1_ROTATING_ELLIPSE.md +6 -1
README.md
CHANGED
|
@@ -42,7 +42,7 @@ Implementation status:
|
|
| 42 |
|
| 43 |
## Known Gaps
|
| 44 |
|
| 45 |
-
- `BASELINE_PARAMS` is
|
| 46 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 47 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 48 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
|
|
|
|
| 42 |
|
| 43 |
## Known Gaps
|
| 44 |
|
| 45 |
+
- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen before meaningful manual playtesting.
|
| 46 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 47 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 48 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
|
TODO.md
CHANGED
|
@@ -99,6 +99,8 @@ flowchart TD
|
|
| 99 |
Files:
|
| 100 |
[server/data/p1/README.md](server/data/p1/README.md),
|
| 101 |
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
|
|
|
|
|
|
| 102 |
|
| 103 |
- [ ] Run fixture sanity checks
|
| 104 |
Goal:
|
|
@@ -190,5 +192,5 @@ flowchart TD
|
|
| 190 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 191 |
- [ ] Do not add training-first complexity before manual playtesting
|
| 192 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 193 |
-
- [ ] Do not describe the current baseline reset state as
|
| 194 |
- [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
|
|
|
|
| 99 |
Files:
|
| 100 |
[server/data/p1/README.md](server/data/p1/README.md),
|
| 101 |
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 102 |
+
Note:
|
| 103 |
+
the default baseline params are not near-feasible on the real verifier path, so they are not enough for the fixture set by themselves
|
| 104 |
|
| 105 |
- [ ] Run fixture sanity checks
|
| 106 |
Goal:
|
|
|
|
| 192 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 193 |
- [ ] Do not add training-first complexity before manual playtesting
|
| 194 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 195 |
+
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
|
| 196 |
- [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
|
baselines/README.md
CHANGED
|
@@ -7,6 +7,7 @@ Random and heuristic baselines will live here.
|
|
| 7 |
- [x] baseline comparison script exists
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
- [ ] heuristic refreshed after the real-verifier rerun
|
|
|
|
| 10 |
- [ ] presentation-ready comparison trace exported
|
| 11 |
|
| 12 |
The first baseline milestone is:
|
|
|
|
| 7 |
- [x] baseline comparison script exists
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
- [ ] heuristic refreshed after the real-verifier rerun
|
| 10 |
+
- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
|
| 11 |
- [ ] presentation-ready comparison trace exported
|
| 12 |
|
| 13 |
The first baseline milestone is:
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -17,6 +17,10 @@
|
|
| 17 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 18 |
- [ ] HF Space deployment evidence is recorded
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## 1. Submission Thesis
|
| 21 |
|
| 22 |
We are not primarily submitting "a trained model for fusion."
|
|
@@ -323,6 +327,8 @@ Use:
|
|
| 323 |
- a few near-boundary designs
|
| 324 |
- a few clearly infeasible designs
|
| 325 |
|
|
|
|
|
|
|
| 326 |
Purpose:
|
| 327 |
|
| 328 |
- verify the verifier is wired correctly
|
|
|
|
| 17 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 18 |
- [ ] HF Space deployment evidence is recorded
|
| 19 |
|
| 20 |
+
Current caution:
|
| 21 |
+
|
| 22 |
+
- the default baseline params are not currently a near-feasible playtest anchor on the real verifier path, so fixture discovery is a real prerequisite for meaningful manual playtesting
|
| 23 |
+
|
| 24 |
## 1. Submission Thesis
|
| 25 |
|
| 26 |
We are not primarily submitting "a trained model for fusion."
|
|
|
|
| 327 |
- a few near-boundary designs
|
| 328 |
- a few clearly infeasible designs
|
| 329 |
|
| 330 |
+
Do not assume the default baseline params are enough for this set. They are currently useful as an infeasible reference, not as a near-feasible anchor.
|
| 331 |
+
|
| 332 |
Purpose:
|
| 333 |
|
| 334 |
- verify the verifier is wired correctly
|
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -17,6 +17,10 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
|
|
| 17 |
- [ ] add tracked fixtures and manual playtest evidence
|
| 18 |
- [ ] refresh the heuristic baseline after the real-verifier rerun
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Plan V2 Inheritance
|
| 21 |
|
| 22 |
Carry these rules through the whole checklist:
|
|
@@ -87,6 +91,7 @@ Transition rule:
|
|
| 87 |
- known-good or near-winning design
|
| 88 |
- near-boundary designs
|
| 89 |
- clearly bad designs
|
|
|
|
| 90 |
2. Confirm:
|
| 91 |
- verifier outputs are sane
|
| 92 |
- reward ordering is sane
|
|
|
|
| 17 |
- [ ] add tracked fixtures and manual playtest evidence
|
| 18 |
- [ ] refresh the heuristic baseline after the real-verifier rerun
|
| 19 |
|
| 20 |
+
Current caution:
|
| 21 |
+
|
| 22 |
+
- do not assume the default baseline params are a near-feasible playtest start; on the current real verifier path they are still deeply infeasible, so fixture discovery comes first
|
| 23 |
+
|
| 24 |
## Plan V2 Inheritance
|
| 25 |
|
| 26 |
Carry these rules through the whole checklist:
|
|
|
|
| 91 |
- known-good or near-winning design
|
| 92 |
- near-boundary designs
|
| 93 |
- clearly bad designs
|
| 94 |
+
- do not rely on the default baseline params as the only starting point
|
| 95 |
2. Confirm:
|
| 96 |
- verifier outputs are sane
|
| 97 |
- reward ordering is sane
|
docs/PIVOT_P1_ROTATING_ELLIPSE.md
CHANGED
|
@@ -15,6 +15,10 @@ Use this file as rationale for the pivot, not as a fresh planning queue. Once th
|
|
| 15 |
- [ ] manual playtest evidence is recorded
|
| 16 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## Decision
|
| 19 |
|
| 20 |
Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
|
|
@@ -239,8 +243,9 @@ If full high-fidelity `constellaration` deployment fails (Docker build, HF Space
|
|
| 239 |
Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
|
| 240 |
|
| 241 |
1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
|
|
|
|
| 242 |
2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
|
| 243 |
-
3. **
|
| 244 |
|
| 245 |
These are for verifier/reward sanity, not a prerequisite seed-mining project.
|
| 246 |
|
|
|
|
| 15 |
- [ ] manual playtest evidence is recorded
|
| 16 |
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 17 |
|
| 18 |
+
Current caution:
|
| 19 |
+
|
| 20 |
+
- the default rotating-ellipse baseline params are currently useful as an infeasible reference, not as a near-feasible anchor, so the fixture set still needs a better boundary-region map
|
| 21 |
+
|
| 22 |
## Decision
|
| 23 |
|
| 24 |
Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
|
|
|
|
| 243 |
Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
|
| 244 |
|
| 245 |
1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
|
| 246 |
+
1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative or repair reference only
|
| 247 |
2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
|
| 248 |
+
3. **Near-boundary anchor:** still needs to be found from real verifier probing before manual playtesting
|
| 249 |
|
| 250 |
These are for verifier/reward sanity, not a prerequisite seed-mining project.
|
| 251 |
|