Spaces:

CreativeEngineer
/

fusion-design-lab

Running on CPU Upgrade

App Files Files Community

CreativeEngineer commited on 10 days ago

Commit

acb992c

1 Parent(s): 3f7be89

docs: fix p1 parameterization blocker fallout

Browse files

Files changed (5) hide show

README.md +23 -10
docs/FUSION_DESIGN_LAB_PLAN_V2.md +36 -7
docs/P1_ENV_CONTRACT_V1.md +221 -0
docs/PIVOT_P1_ROTATING_ELLIPSE.md +37 -13
training/notebooks/NORTHFLANK_SMOKE_NOTE.md +3 -3

README.md CHANGED Viewed

@@ -22,7 +22,8 @@ Implementation status:
 - docs are aligned to fresh `P1` wiring in this repo
 - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
 - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
-- the remaining runtime work is fixture coverage, manual playtesting, heuristic refresh, and deployment evidence
 ## Execution Status
@@ -36,6 +37,10 @@ Implementation status:
 - [x] Replace the synthetic evaluator with `constellaration`
 - [x] Add a runnable Northflank smoke workflow and note
 - [x] Pass the Northflank smoke test on the H100 workspace
 - [ ] Add tracked `P1` fixtures under `server/data/p1/`
 - [ ] Run manual playtesting and record the first reward pathology
 - [ ] Refresh the heuristic baseline for the real verifier path
@@ -43,15 +48,16 @@ Implementation status:
 ## Known Gaps
-- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen before meaningful manual playtesting.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
-- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after manual playtesting.
 Current mode:
 - strategic task choice is already locked
-- the next work is fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout
@@ -104,12 +110,15 @@ uv sync --extra notebooks
 ## Immediate Next Steps
-1. Add tracked `P1` fixtures under `server/data/p1`.
-2. Run manual playtest episodes and record the first real reward pathology, if any.
-3. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
-4. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
-5. Deploy the environment to HF Space.
-6. Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.
@@ -127,6 +136,10 @@ Disallowed:
 - porting the old planner, governor, or experiment harness into this repo
 ## Hackathon Working Note
 This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.

 - docs are aligned to fresh `P1` wiring in this repo
 - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
 - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
+- the current 3-knob parameterization has been verified as blocked on P1 triangularity under the real verifier path
+- the next runtime work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, and deployment evidence
 ## Execution Status
 - [x] Replace the synthetic evaluator with `constellaration`
 - [x] Add a runnable Northflank smoke workflow and note
 - [x] Pass the Northflank smoke test on the H100 workspace
+- [x] Verify the current 3-knob family against the real low-fidelity verifier
+- [ ] Add a custom low-dimensional boundary builder with an explicit triangularity control knob
+- [ ] Split boundary construction from boundary evaluation in `server/physics.py`
+- [ ] Update the action contract from 3 knobs to the repaired low-dimensional family
 - [ ] Add tracked `P1` fixtures under `server/data/p1/`
 - [ ] Run manual playtesting and record the first reward pathology
 - [ ] Refresh the heuristic baseline for the real verifier path
 ## Known Gaps
+- The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
+- `BASELINE_PARAMS` is not a near-feasible anchor on the real verifier path. The current low-fidelity measurement is roughly `p1_feasibility=1.01`, `average_triangularity=+0.005`, and `edge_iota_over_nfp=0.059`, so fixture discovery has to happen after parameterization repair, not before.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
+- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
 Current mode:
 - strategic task choice is already locked
+- the next work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout
 ## Immediate Next Steps
+1. Repair the low-dimensional boundary parameterization so it can actually move P1 triangularity.
+2. Split boundary construction from boundary evaluation in `server/physics.py`.
+3. Update the environment contract to the repaired low-dimensional family and label low-fi vs high-fi truth clearly in observations.
+4. Add tracked `P1` fixtures under `server/data/p1`.
+5. Run manual playtest episodes and record the first real reward pathology, if any.
+6. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
+7. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
+8. Deploy the environment to HF Space.
+9. Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.
 - porting the old planner, governor, or experiment harness into this repo
+## Technical Spec
+The focused technical plan for the repaired `P1` environment lives in [docs/P1_ENV_CONTRACT_V1.md](docs/P1_ENV_CONTRACT_V1.md).
 ## Hackathon Working Note
 This repo is intentionally biased toward executable demos, manual playtesting, and clear environment behavior over building out test coverage during the hackathon.

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -7,13 +7,15 @@
 ## 0. Current Branch Status
 - [x] `P1` task family is locked
-- [x] rotating-ellipse `P1` contract is implemented in code
 - [x] real `constellaration` verifier wiring is in place
 - [x] low-fidelity `run` plus high-fidelity `submit` split is documented
 - [x] post-terminal `step()` guard is in place
 - [x] baseline comparison has been rerun on the real verifier path
 - [x] Northflank smoke workflow and note are committed
 - [x] Northflank smoke test has passed on the team H100
 - [ ] tracked `P1` fixtures are added
 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
@@ -21,7 +23,7 @@
 Current caution:
-- the default baseline params are not currently a near-feasible playtest anchor on the real verifier path, so fixture discovery is a real prerequisite for meaningful manual playtesting
 ## 1. Submission Thesis
@@ -117,7 +119,7 @@ But the evidence order is:
 We intentionally narrow the scope to one environment family:
 - `P1` geometrical benchmark
-- rotating-ellipse, low-dimensional design space
 - official `constellaration` verifier
 - low-fidelity evaluation for ordinary interaction
 - optional high-fidelity verification for final checks or `submit`
@@ -173,7 +175,7 @@ Allowed reuse:
 Implementation handoff:
-- the remaining work is now fixture coverage, manual playtesting, heuristic refresh, smoke validation, and deployment
 - do not treat supporting decision notes as a new planning backlog
 ## 8.1 Compute Surfaces
@@ -212,6 +214,12 @@ Auth stance:
 The environment contract must be frozen before meaningful evaluation.
 ### Observation
 The observation should expose:
@@ -231,7 +239,9 @@ The observation must be interpretable by a human without additional hidden state
 ### Action Space
-The action space stays intentionally small and discrete:
 - `run`
 - `submit`
@@ -243,10 +253,11 @@ For `run`, the controllable fields are:
   - `aspect_ratio`
   - `elongation`
   - `rotational_transform`
 - direction: increase or decrease
 - magnitude: small, medium, large
-This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism.
 ### Episode Flow
@@ -282,6 +293,18 @@ The environment must preserve:
 The environment may add reward shaping, but it must not redefine what `P1` means.
 ## 11. Reward V0
 The reward in this document is not the final reward. It is `Reward V0`.
@@ -302,6 +325,12 @@ The initial scoring idea should be feasibility-first:
 - simple enough to debug from trajectories
 - aligned with official `P1` semantics
 ### Reward V0 Failure Modes To Test
 We should expect at least some of these:
@@ -344,7 +373,7 @@ This is calibration, not training.
 These are still hypotheses until manually or empirically checked:
 - six steps are enough to create non-trivial decision pressure
-- the rotating-ellipse action space is expressive enough for a meaningful `P1` task
 - `restore_best` is useful without becoming an exploit
 - heuristic should beat random on mean episode reward
 - low-fidelity interaction is predictive enough for useful policy learning

 ## 0. Current Branch Status
 - [x] `P1` task family is locked
+- [x] 3-knob rotating-ellipse `P1` contract is implemented in code
 - [x] real `constellaration` verifier wiring is in place
 - [x] low-fidelity `run` plus high-fidelity `submit` split is documented
 - [x] post-terminal `step()` guard is in place
 - [x] baseline comparison has been rerun on the real verifier path
 - [x] Northflank smoke workflow and note are committed
 - [x] Northflank smoke test has passed on the team H100
+- [x] current 3-knob family has been checked against the real low-fidelity verifier
+- [ ] parameterization repair is implemented so triangularity is controllable
 - [ ] tracked `P1` fixtures are added
 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
 Current caution:
+- the current 3-knob family is structurally blocked on the official triangularity constraint under the real verifier path, so parameterization repair is now the first blocker before fixture discovery or manual playtesting
 ## 1. Submission Thesis
 We intentionally narrow the scope to one environment family:
 - `P1` geometrical benchmark
+- repaired low-dimensional boundary family derived from rotating-ellipse seeds
 - official `constellaration` verifier
 - low-fidelity evaluation for ordinary interaction
 - optional high-fidelity verification for final checks or `submit`
 Implementation handoff:
+- the remaining work is now parameterization repair, then fixture coverage, manual playtesting, heuristic refresh, smoke validation, and deployment
 - do not treat supporting decision notes as a new planning backlog
 ## 8.1 Compute Surfaces
 The environment contract must be frozen before meaningful evaluation.
+Current verified blocker:
+- the current upstream 3-knob `generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` family does not expose triangularity control
+- on the real low-fidelity verifier path, sampled points stayed at roughly `average_triangularity=+0.004975` and `p1_feasibility=1.00995`
+- so the next contract revision must repair parameterization before reward iteration becomes meaningful
 ### Observation
 The observation should expose:
 ### Action Space
+The action space stays intentionally small and discrete, but the current 3-knob version is no longer enough. The next contract revision should keep low-dimensional actions while adding an explicit control that can move triangularity.
+Near-term target:
 - `run`
 - `submit`
   - `aspect_ratio`
   - `elongation`
   - `rotational_transform`
+  - `triangularity_scale` or equivalent low-dimensional triangularity control
 - direction: increase or decrease
 - magnitude: small, medium, large
+This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism. The verifier should stay official; the custom logic belongs in the low-dimensional boundary builder, not in reward semantics.
 ### Episode Flow
 The environment may add reward shaping, but it must not redefine what `P1` means.
+Implementation split:
+- boundary builder or parameterization adapter:
+  - custom low-dimensional family construction
+  - rotating-ellipse seed creation
+  - triangularity control injection, if used
+- official verifier:
+  - boundary in
+  - `GeometricalProblem` semantics out
+The verifier should be boundary-based. Parameterization-specific logic should not be treated as verifier truth.
 ## 11. Reward V0
 The reward in this document is not the final reward. It is `Reward V0`.
 - simple enough to debug from trajectories
 - aligned with official `P1` semantics
+Current execution note:
+- do not tune reward further until the repaired low-dimensional family can actually approach P1 feasibility
+- once parameterization is repaired, keep `Reward V0` scalar and feasibility-first
+- clearly distinguish low-fidelity step-time metrics from high-fidelity submit-time truth in the observation contract and docs
 ### Reward V0 Failure Modes To Test
 We should expect at least some of these:
 These are still hypotheses until manually or empirically checked:
 - six steps are enough to create non-trivial decision pressure
+- the repaired low-dimensional action family is expressive enough for a meaningful `P1` task
 - `restore_best` is useful without becoming an exploit
 - heuristic should beat random on mean episode reward
 - low-fidelity interaction is predictive enough for useful policy learning

docs/P1_ENV_CONTRACT_V1.md ADDED Viewed

	@@ -0,0 +1,221 @@

+# P1 Environment Contract V1
+**Status:** Technical implementation plan
+**Role:** Supporting spec for the `P1` environment contract
+**SSOT relationship:** This file refines [FUSION_DESIGN_LAB_PLAN_V2.md](FUSION_DESIGN_LAB_PLAN_V2.md). If this file conflicts with the planning SSOT, update both in the same task.
+## Purpose
+This file captures the technical contract that should drive the next code changes in:
+- [server/physics.py](../server/physics.py)
+- [fusion_lab/models.py](../fusion_lab/models.py)
+- [server/environment.py](../server/environment.py)
+- [server/app.py](../server/app.py)
+The central change is now explicit:
+- the current upstream 3-knob rotating-ellipse family is blocked on P1 triangularity under the real verifier path
+- the next environment contract must repair parameterization before more reward iteration or heuristic work
+## Verified Blocker
+Current verified facts:
+- upstream `generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` has no triangularity control
+- the current 3-knob environment directly exposes only:
+  - `aspect_ratio`
+  - `elongation`
+  - `rotational_transform`
+- real low-fidelity samples on the current verifier path kept:
+  - `average_triangularity` at roughly `+0.004975`
+  - `p1_feasibility` at roughly `1.00995`
+  - feasible count at `0`
+Conclusion:
+- the current 3-knob family is not a meaningful playtest or baseline environment for `P1`
+- reward work is secondary until the boundary family can actually approach the official triangularity constraint
+## Design Split
+Keep three layers separate:
+1. **Boundary builder**
+   - low-dimensional parameterization
+   - rotating-ellipse seed generation
+   - optional triangularity control injection
+2. **Official verifier**
+   - boundary in
+   - metrics out
+   - feasibility, objective, and score semantics from `GeometricalProblem`
+3. **Environment**
+   - reset pool
+   - discrete actions
+   - episode flow
+   - reward shaping
+## Verifier Plan
+`server/physics.py` should expose a boundary-based verifier surface.
+Target functions:
+- `build_initial_boundary(...) -> SurfaceRZFourier`
+- `apply_low_dim_perturbation(...) -> SurfaceRZFourier`
+- `evaluate_boundary(boundary, fidelity) -> EvaluationMetrics`
+The verifier layer should own:
+- low-fidelity step-time evaluation
+- high-fidelity submit-time evaluation
+- official `P1` feasibility semantics
+- official `P1` objective direction
+- score ordering
+The verifier layer should not own:
+- episode budget
+- action semantics
+- reward shaping
+- “best so far” state
+## Low-Dimensional Boundary Plan
+Stay low-dimensional, not Fourier-first.
+Target controllable knobs:
+- `aspect_ratio`
+- `elongation`
+- `rotational_transform`
+- `triangularity_scale`
+Important naming rule:
+- once triangularity is injected explicitly, stop describing the family as plain upstream “rotating ellipse”
+- it becomes a custom low-dimensional boundary family derived from a rotating-ellipse seed
+## Action Contract
+Keep the discrete interaction style:
+- `intent`: `run | submit | restore_best`
+- `direction`: `increase | decrease`
+- `magnitude`: `small | medium | large`
+For `run`, the controllable parameter should be one of:
+- `aspect_ratio`
+- `elongation`
+- `rotational_transform`
+- `triangularity_scale`
+This keeps the environment human-playable and aligned with the historical low-dimensional P1 path.
+## Observation Contract
+The observation should stay metric-centered and human-readable.
+Keep:
+- `max_elongation`
+- `aspect_ratio`
+- `average_triangularity`
+- `edge_iota_over_nfp`
+- `p1_feasibility`
+- `p1_score`
+- `budget_remaining`
+- `best_score`
+- `best_feasibility`
+- `diagnostics_text`
+Add clarity about fidelity:
+- low-fidelity step-time metrics should be labeled as such
+- high-fidelity submit-time metrics should be labeled as such
+- do not expose them as if they are the same truth surface
+This can be done either by:
+- separate observation fields, or
+- explicit fidelity labels in `diagnostics_text`
+The minimum requirement is that a reader can tell whether a metric came from low-fi `run` or high-fi `submit`.
+## Reward V0
+Keep reward mostly scalar and verifier-driven.
+Target structure:
+- infeasible to feasible crossing:
+  - clear positive bonus
+- feasible to infeasible regression:
+  - clear negative penalty
+- both infeasible:
+  - reward reduction in official feasibility scalar
+- both feasible:
+  - reward lower `max_elongation`
+- non-submit step:
+  - small step cost
+- explicit `submit`:
+  - better than passive budget exhaustion when the design is improved
+Do not add:
+- reward terms tied to specific Fourier modes
+- bonuses for matching a known winner
+- hand-coded constraint tricks to hide a blocked action family
+## Reset Strategy
+Start with frozen exact seeds, not jitter.
+Reset pool policy:
+- `n_field_periods = 3`
+- small frozen seed set
+- each seed must be:
+  - reproducible
+  - near enough to the feasible boundary that 6 steps is worth testing
+  - not already solved
+Add bounded jitter only if memorization becomes a real problem.
+## Manual Playtest Gate
+Do not move to heuristic redesign or reward tuning until this gate is passed.
+Manual playtest questions:
+- can a human tell which constraint is currently blocking progress?
+- can a human choose a plausible next action?
+- can a human reach or approach feasibility within the budget?
+- does `submit` feel meaningfully different from passive exhaustion?
+If the answer is no, fix:
+- the boundary family
+- the step magnitudes
+- the seed pool
+before tuning reward further
+## Implementation Order
+1. Repair the low-dimensional boundary builder in [server/physics.py](../server/physics.py).
+2. Split boundary construction from official boundary evaluation in [server/physics.py](../server/physics.py).
+3. Update the action and state schema in [fusion_lab/models.py](../fusion_lab/models.py).
+4. Update the episode loop and observation labeling in [server/environment.py](../server/environment.py).
+5. Update the task summary in [server/app.py](../server/app.py).
+6. Freeze 1-2 repaired low-dimensional fixtures.
+7. Run manual playtesting.
+8. Refresh the heuristic baseline only after that evidence exists.
+## Out of Scope
+- full Fourier-mode action space as the primary environment
+- porting the old `ai-sci-feasible-designs` harness
+- making reward more complex before the repaired low-dimensional family exists
+- building a full benchmark split protocol before the environment is even playable

docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED Viewed

@@ -9,15 +9,17 @@ Use this file as rationale for the pivot, not as a fresh planning queue. Once th
 ## Current Branch Status
 - [x] pivot accepted
-- [x] rotating-ellipse `P1` contract is implemented
 - [x] `constellaration` verifier path is wired
 - [ ] tracked fixtures are added
 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
 Current caution:
-- the default rotating-ellipse baseline params are currently useful as an infeasible reference, not as a near-feasible anchor, so the fixture set still needs a better boundary-region map
 ## Decision
@@ -66,7 +68,7 @@ Feasibility tolerance: normalized constraint violations <= 1% (0.01).
 ### Parameter Space
-The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
 | Parameter | Role | Typical range |
 |---|---|---|
@@ -77,9 +79,17 @@ The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
 These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
 ### Action Space
-Discrete perturbations on the 3 rotating-ellipse parameters:
 ```
 intent: "run" | "submit" | "restore_best"
@@ -88,6 +98,10 @@ direction: "increase" | "decrease"
 magnitude: "small" | "medium" | "large"
 ```
 Magnitude deltas (to be tuned by playtest):
 | Parameter | small | medium | large |
@@ -101,7 +115,7 @@ Magnitude deltas (to be tuned by playtest):
 1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
 2. Agent chooses action.
 3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
-4. If `restore_best`: revert to best-known parameters. No VMEC cost, but costs a budget step.
 5. If `submit`: end episode. Optionally run high-fi for final score.
 6. Episode ends on `submit` or budget exhaustion.
@@ -117,8 +131,8 @@ max_elongation: float          # P1 objective (minimize)
 aspect_ratio: float            # constraint: <= 4.0
 average_triangularity: float   # constraint: <= -0.5
 edge_iota_over_nfp: float     # constraint: >= 0.3
-p1_score: float                # official P1 score (0 if infeasible)
-p1_feasibility: float          # max normalized constraint violation
 constraints_satisfied: bool    # feasibility <= 0.01
 vacuum_well: float             # stability indicator
 step_number: int
@@ -127,6 +141,10 @@ best_score: float
 target_spec: str
 ```
 ### Reward V0
 Feasibility-first, then objective improvement:
@@ -152,12 +170,18 @@ submit penalty (if infeasible or no improvement):
 This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
 ### State
 ```
 step_count: int
-current_params: {aspect_ratio, elongation, rotational_transform}
-best_params: {aspect_ratio, elongation, rotational_transform}
 initial_score: float
 best_score: float
 best_feasibility: float
@@ -206,7 +230,7 @@ Update `fusion_lab/models.py` for new schemas.
 Status: open.
-Validate hypothesis: "6 actions is enough."
 - Play 5-10 episodes manually
 - Log: can a human reach feasibility? Improve elongation?
 - Tune magnitude deltas if needed
@@ -242,10 +266,9 @@ If full high-fidelity `constellaration` deployment fails (Docker build, HF Space
 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
-1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
-1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative or repair reference only
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
-3. **Near-boundary anchor:** still needs to be found from real verifier probing before manual playtesting
 These are for verifier/reward sanity, not a prerequisite seed-mining project.
@@ -255,6 +278,7 @@ These are for verifier/reward sanity, not a prerequisite seed-mining project.
 - Do not make the task "agent writes arbitrary optimization scripts."
 - Do not stream the full HF dataset at runtime.
 - Do not mix rotating-ellipse and Fourier-repair action spaces.
 - Do not use high-fidelity eval for interactive steps (24s is too slow).
 - Do not narrate "6 actions is enough" as validated until manually playtested.
 - Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.

 ## Current Branch Status
 - [x] pivot accepted
+- [x] 3-knob rotating-ellipse `P1` contract is implemented
 - [x] `constellaration` verifier path is wired
+- [x] current 3-knob family is verified as blocked on P1 triangularity
+- [ ] repaired low-dimensional family with explicit triangularity control is implemented
 - [ ] tracked fixtures are added
 - [ ] manual playtest evidence is recorded
 - [ ] heuristic baseline is refreshed for the real verifier path
 Current caution:
+- the current upstream rotating-ellipse family is useful as a seed generator, but not sufficient as the full environment action family because it does not move triangularity under the real verifier path
 ## Decision
 ### Parameter Space
+The upstream rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
 | Parameter | Role | Typical range |
 |---|---|---|
 These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
+Verified blocker:
+- on the real low-fidelity verifier path, sampled 3-knob points kept `average_triangularity` at roughly `+0.004975`
+- sampled `p1_feasibility` stayed at roughly `1.00995`
+- no sampled point was feasible
+So the hackathon environment now needs a custom low-dimensional boundary family on top of the rotating-ellipse seed, with an explicit triangularity control knob or equivalent mechanism.
 ### Action Space
+Original 3-knob action space:
 ```
 intent: "run" | "submit" | "restore_best"
 magnitude: "small" | "medium" | "large"
 ```
+This is no longer sufficient on its own. The next contract revision should keep the same discrete structure while adding:
+- `triangularity_scale` or equivalent low-dimensional control
 Magnitude deltas (to be tuned by playtest):
 | Parameter | small | medium | large |
 1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
 2. Agent chooses action.
 3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
+4. If `restore_best`: revert to best-known parameters, re-evaluate low-fidelity metrics, and charge a budget step.
 5. If `submit`: end episode. Optionally run high-fi for final score.
 6. Episode ends on `submit` or budget exhaustion.
 aspect_ratio: float            # constraint: <= 4.0
 average_triangularity: float   # constraint: <= -0.5
 edge_iota_over_nfp: float     # constraint: >= 0.3
+p1_score: float                # current step-time score
+p1_feasibility: float          # current step-time max normalized constraint violation
 constraints_satisfied: bool    # feasibility <= 0.01
 vacuum_well: float             # stability indicator
 step_number: int
 target_spec: str
 ```
+Follow-up requirement from the verified blocker:
+- once submit stays high-fidelity, the observation or diagnostics text should make the low-fi vs high-fi distinction explicit
 ### Reward V0
 Feasibility-first, then objective improvement:
 This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
+Execution note after the verified blocker:
+- keep reward mostly scalar and verifier-driven
+- repair parameterization before further reward tuning
+- do not add mode- or constraint-specific reward hacks to compensate for a blocked action family
 ### State
 ```
 step_count: int
+current_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
+best_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
 initial_score: float
 best_score: float
 best_feasibility: float
 Status: open.
+Validate hypothesis: "6 actions is enough" only after parameterization repair.
 - Play 5-10 episodes manually
 - Log: can a human reach feasibility? Improve elongation?
 - Tune magnitude deltas if needed
 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
+1. **Current default baseline reference:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — currently deeply infeasible on the real verifier path; keep as a negative reference only until parameterization repair lands
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
+3. **Near-boundary anchor:** still needs to be found after parameterization repair and real verifier probing before manual playtesting
 These are for verifier/reward sanity, not a prerequisite seed-mining project.
 - Do not make the task "agent writes arbitrary optimization scripts."
 - Do not stream the full HF dataset at runtime.
 - Do not mix rotating-ellipse and Fourier-repair action spaces.
+- Do not pretend the upstream 3-knob family is enough for P1 after the verified triangularity blocker.
 - Do not use high-fidelity eval for interactive steps (24s is too slow).
 - Do not narrate "6 actions is enough" as validated until manually playtested.
 - Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.

training/notebooks/NORTHFLANK_SMOKE_NOTE.md CHANGED Viewed

@@ -13,12 +13,12 @@ Prove all four required conditions in the Northflank Jupyter workspace:
 ## Repo Entry Point
-Use [northflank_smoke.py](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/northflank_smoke.py).
 It uses the repo SSOT values from:
-- [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py)
-- [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py)
 ## Northflank Run

 ## Repo Entry Point
+Use [northflank_smoke.py](northflank_smoke.py).
 It uses the repo SSOT values from:
+- [server/environment.py](../../server/environment.py)
+- [server/physics.py](../../server/physics.py)
 ## Northflank Run