Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 8

Commit

5354ca9

1 Parent(s): 98ffb4a

docs: lock p1 plan and hackathon runtime setup

Browse files

Files changed (13) hide show

.gitignore +1 -0
.pre-commit-config.yaml +1 -0
AGENTS.md +5 -0
README.md +74 -12
docs/FUSION_DELIVERABLES_MAP.md +36 -13
docs/FUSION_DESIGN_LAB_PLAN_V2.md +280 -86
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +81 -32
docs/PIVOT_P1_ROTATING_ELLIPSE.md +238 -0
pyproject.toml +9 -5
server/Dockerfile +50 -0
server/data/p1/README.md +13 -0
training/notebooks/README.md +29 -0
uv.lock +0 -0

.gitignore CHANGED Viewed

@@ -8,6 +8,7 @@ __pycache__/
 .ipynb_checkpoints/
 dist/
 build/
 *.sqlite
 *.db
 reports/

 .ipynb_checkpoints/
 dist/
 build/
+*.egg-info/
 *.sqlite
 *.db
 reports/

.pre-commit-config.yaml CHANGED Viewed

@@ -14,3 +14,4 @@ repos:
       - id: check-yaml
       - id: check-toml
       - id: check-added-large-files

       - id: check-yaml
       - id: check-toml
       - id: check-added-large-files
+        exclude: ^uv\.lock$

AGENTS.md CHANGED Viewed

@@ -24,6 +24,8 @@ Use these docs as the planning SSOT:
 - `docs/FUSION_DELIVERABLES_MAP.md`
 - `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`
 If code and docs disagree, either:
 1. update code to match the docs, or
@@ -39,6 +41,7 @@ Do not leave silent divergence.
 4. Manual-playtest before investing heavily in training.
 5. Prefer behavior traces and baselines over reward-curve-only storytelling.
 6. Keep claims conservative and evidence-backed.
 ## Working Rules
@@ -48,6 +51,8 @@ Do not leave silent divergence.
 - Do not add new tests during the hackathon unless the user explicitly requests them.
 - Do not add complicated reward shaping until the simpler version has been tested against actual trajectories.
 - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
 ## Environment Contract Rules

 - `docs/FUSION_DELIVERABLES_MAP.md`
 - `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`
+`docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
 If code and docs disagree, either:
 1. update code to match the docs, or
 4. Manual-playtest before investing heavily in training.
 5. Prefer behavior traces and baselines over reward-curve-only storytelling.
 6. Keep claims conservative and evidence-backed.
+7. Once the task family is locked, shift to implementation instead of reopening strategy.
 ## Working Rules
 - Do not add new tests during the hackathon unless the user explicitly requests them.
 - Do not add complicated reward shaping until the simpler version has been tested against actual trajectories.
 - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
+- Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
+- Treat supporting decision records as rationale, not as a fresh task queue.
 ## Environment Contract Rules

README.md CHANGED Viewed

@@ -1,13 +1,13 @@
 # Fusion Design Lab
-Fusion Design Lab is an environment-first OpenEnv hackathon project for budget-constrained stellarator design.
 The repo is organized around one clear submission thesis:
-- a narrow, reproducible stellarator design task
-- a small discrete action space
-- real simulator feedback
-- explicit constraints
 - a reward function that is iteratively improved through observed behavior
 Training is supporting evidence. The environment is the product.
@@ -18,10 +18,16 @@ This repository is the clean hackathon workspace. The detailed planning docs liv
 Implementation status:
-- repo scaffolded
-- shared models defined
-- server and client entry points stubbed
-- environment contract ready to be implemented next
 ## Planned Repository Layout
@@ -32,17 +38,73 @@ fusion-design-lab/
 ├── docs/
 ├── fusion_lab/
 ├── server/
 ├── training/
 ├── openenv.yaml
 ├── pyproject.toml
 └── README.md
 ```
 ## Immediate Next Steps
-1. Implement the environment contract in `server/environment.py`.
-2. Implement the VMEC-backed physics loop in `server/physics.py`.
-3. Run manual-playtest episodes before heavy training work.
 ## Hackathon Working Note

 # Fusion Design Lab
+Fusion Design Lab is an environment-first OpenEnv hackathon project for the `P1` stellarator benchmark.
 The repo is organized around one clear submission thesis:
+- an official `P1` task with `constellaration` as the verifier of record
+- a narrow, reproducible action space
+- real verifier feedback
+- explicit constraints and feasibility semantics
 - a reward function that is iteratively improved through observed behavior
 Training is supporting evidence. The environment is the product.
 Implementation status:
+- `P1` is locked as the benchmark task
+- docs are aligned to fresh `P1` wiring in this repo
+- shared models and server/client entry points exist
+- the runtime environment still needs to be rewired from the old toy scaffold to the real `P1` contract
+Current mode:
+- strategic task choice is already locked
+- the next work is implementation, smoke validation, and manual playtesting
+- new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout
 ├── docs/
 ├── fusion_lab/
 ├── server/
+├── server/data/p1/
 ├── training/
 ├── openenv.yaml
 ├── pyproject.toml
 └── README.md
 ```
+## Setup
+Base runtime:
+```bash
+uv sync
+```
+Development tooling:
+```bash
+uv sync --extra dev
+pre-commit install
+```
+Optional local notebook tooling:
+```bash
+uv sync --extra notebooks
+```
+## Runtime Assumptions
+- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
+- OpenEnv deployment target: Hugging Face Spaces
+- Minimal submission notebook target: Colab
+- Verifier of record: `constellaration.problems.GeometricalProblem`
+- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
+- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
+- Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
+- Preferred Colab/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook
 ## Immediate Next Steps
+1. Set up the Northflank Jupyter Notebook with PyTorch and attach persistent storage.
+2. Pass a Northflank smoke test:
+   - import `constellaration`
+   - run one rotating-ellipse generation plus one low-fidelity verifier call
+   - write an artifact to persistent storage
+3. Rewrite [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py) to the locked `P1` contract.
+4. Rewrite [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py) to use `constellaration`-based `P1` verification.
+5. Add tracked `P1` fixtures under [server/data/p1](/Users/suhjungdae/code/fusion-design-lab/server/data/p1).
+6. Add the Colab notebook under [training/notebooks](/Users/suhjungdae/code/fusion-design-lab/training/notebooks).
+7. Run manual playtest episodes before heavy training work.
+These are implementation steps, not another planning phase.
+## Fixture Policy
+This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.
+Allowed examples:
+- a known-good or near-winning `P1` boundary
+- near-boundary cases
+- clearly bad cases
+Disallowed:
+- porting the old planner, governor, or experiment harness into this repo
 ## Hackathon Working Note

docs/FUSION_DELIVERABLES_MAP.md CHANGED Viewed

@@ -1,6 +1,10 @@
 # Fusion Design Lab Deliverables Map
-This is the output-first map for the hackathon. It is aligned to Plan V2: environment-first, reward-iteration-driven, and conservative about training claims. Everything branches from the four final artifacts the judges and submission flow will actually see.
 ## Deliverables Tree
@@ -10,8 +14,9 @@ flowchart TD
     A --> C["Colab Eval / Training Notebook"]
     A --> D["1-Minute Demo"]
     A --> E["Public Repo + README"]
-    B --> B0["Environment contract frozen"]
     B --> B1["Remote reset/step works"]
     B --> B2["Reward V0 -> V1 documented"]
     B --> B3["One stable task runs end-to-end"]
@@ -29,13 +34,22 @@ flowchart TD
     E --> E2["Setup + run instructions"]
     E --> E3["Submission links and artifacts"]
     B0 --> F["Observation + action schema frozen"]
-    B3 --> G["Standalone physics loop proven"]
     B2 --> H["Exploit observed -> penalty added"]
     B4 --> I0["Deterministic action schema"]
     D2 --> I["Human can act coherently in env"]
     C3 --> J["Random baseline"]
     C3 --> K["Heuristic baseline"]
 ```
 ## Reverse Timeline
@@ -46,6 +60,7 @@ flowchart LR
     S --> R["Repo public and readable"]
     S --> T["Training / eval evidence exported"]
     S --> H["HF Space live"]
     V --> V1["Recorded clean demo trajectory"]
     V --> V2["Scripted 60-second story"]
@@ -54,27 +69,35 @@ flowchart LR
     T --> T2["Baseline comparison numbers"]
     T --> T3["Colab notebook runs end-to-end"]
-    H --> H1["OpenEnv environment packaged"]
     H --> H2["Remote client can reset and step"]
     H --> H3["Verifier and reward stable"]
     H --> H4["Rules are clear and reproducible"]
     H4 --> P["Environment contract locked first"]
     P --> Q["Manual playtest completed first"]
-    H3 --> M["Local physics loop proven first"]
     T2 --> B["Random + heuristic baselines done"]
     T3 --> X["Training included only if persuasive"]
     V1 --> Y["One stable task only"]
     V2 --> Z["Explain reward fix, not just reward gain"]
 ```
 ## Priority Order
-1. Prove the local physics loop.
-2. Freeze the environment contract and mark the initial reward as `V0`.
-3. Manual-playtest the environment and fix obvious reward/pathology issues.
-4. Make one stable OpenEnv task work remotely with clear, reproducible rules.
-5. Get random and heuristic baselines.
-6. Use the notebook to show traces and comparisons; include training only if it adds signal.
-7. Record the demo around environment clarity, reward shaping, and one stable trajectory.
-8. Polish the repo only after the artifacts are real.

 # Fusion Design Lab Deliverables Map
+This is the output-first map for the hackathon. It is aligned to Plan V2: `P1` is locked, the environment is built fresh in this repo, the old harness is not ported, and training claims stay conservative. Everything branches from the four final artifacts the judges and submission flow will actually see.
+Northflank is the recommended compute workspace behind those artifacts. HF Space and Colab remain the actual submission surfaces.
+Use this map to sequence execution, not to reopen already-locked task choices.
 ## Deliverables Tree
     A --> C["Colab Eval / Training Notebook"]
     A --> D["1-Minute Demo"]
     A --> E["Public Repo + README"]
+    A --> N["Northflank H100 Workspace"]
+    B --> B0["P1 environment contract frozen"]
     B --> B1["Remote reset/step works"]
     B --> B2["Reward V0 -> V1 documented"]
     B --> B3["One stable task runs end-to-end"]
     E --> E2["Setup + run instructions"]
     E --> E3["Submission links and artifacts"]
+    N --> N1["Jupyter Notebook with PyTorch live"]
+    N --> N2["Persistent storage attached"]
+    N --> N3["Verifier + baseline runs happen here"]
+    N --> N4["Northflank smoke test passes"]
     B0 --> F["Observation + action schema frozen"]
+    B3 --> G["Fresh P1 verifier loop proven"]
     B2 --> H["Exploit observed -> penalty added"]
     B4 --> I0["Deterministic action schema"]
     D2 --> I["Human can act coherently in env"]
     C3 --> J["Random baseline"]
     C3 --> K["Heuristic baseline"]
+    G --> L["Official constellaration P1 verifier wired correctly"]
+    L --> M["Good / boundary / bad fixture checks pass"]
+    N4 --> N3
+    N3 --> G
 ```
 ## Reverse Timeline
     S --> R["Repo public and readable"]
     S --> T["Training / eval evidence exported"]
     S --> H["HF Space live"]
+    S --> N1["Northflank compute ready"]
     V --> V1["Recorded clean demo trajectory"]
     V --> V2["Scripted 60-second story"]
     T --> T2["Baseline comparison numbers"]
     T --> T3["Colab notebook runs end-to-end"]
+    H --> H1["OpenEnv P1 environment packaged"]
     H --> H2["Remote client can reset and step"]
     H --> H3["Verifier and reward stable"]
     H --> H4["Rules are clear and reproducible"]
     H4 --> P["Environment contract locked first"]
+    N1 --> N2["Jupyter with PyTorch up first"]
+    N2 --> N3["Persistent storage attached"]
+    N3 --> N4["Import + low-fi verifier smoke passes"]
+    N4 --> M0
     P --> Q["Manual playtest completed first"]
+    H3 --> M0["Local verifier loop proven first"]
     T2 --> B["Random + heuristic baselines done"]
     T3 --> X["Training included only if persuasive"]
     V1 --> Y["One stable task only"]
     V2 --> Z["Explain reward fix, not just reward gain"]
+    M0 --> N["Fresh wiring, not legacy harness port"]
 ```
 ## Priority Order
+1. Bring up the Northflank H100 workspace with persistent storage.
+2. Pass the Northflank smoke test.
+3. Prove the fresh local `P1` verifier loop.
+4. Freeze the environment contract and mark the initial reward as `V0`.
+5. Run verifier/fixture checks and then manual-playtest the environment.
+6. Fix obvious reward/pathology issues.
+7. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
+8. Get random and heuristic baselines.
+9. Use the notebook to show traces and comparisons; include training only if it adds signal.
+10. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
+11. Polish the repo only after the artifacts are real.

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Hackathon:** OpenEnv Hackathon, March 7-8, 2026
 **Track:** Statement 3.1 (World Modeling — Professional Tasks)
-**Status:** Judge-aligned rewrite of the main plan
 ## 1. Submission Thesis
@@ -10,16 +10,30 @@ We are not primarily submitting "a trained model for fusion."
 We are submitting a clear, reproducible training environment for a constrained scientific design task:
-- a junior plasma-scientist-style agent
-- a small VMEC budget
-- a narrow action space
-- real simulator feedback
 - explicit constraints
 - a reward function that is understandable and iteratively improved
 Training is supporting evidence. The environment is the product.
-## 2. What Changed From V1
 This version changes the center of gravity:
@@ -27,6 +41,7 @@ This version changes the center of gravity:
 - `reward shaping story > polished final reward formula`
 - `manual playtesting > training-first iteration`
 - `clarity and reproducibility > broad unsupported transfer claims`
 This version also separates:
@@ -34,7 +49,7 @@ This version also separates:
 - what is a working hypothesis
 - what must be validated before it becomes part of the final pitch
-## 3. Judge-Aligned Priorities
 The judging signal now implies four priorities:
@@ -43,7 +58,7 @@ The judging signal now implies four priorities:
 3. A human should be able to act in the environment coherently before we invest heavily in training.
 4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
-## 4. Final Artifacts
 The four visible artifacts remain:
@@ -52,6 +67,12 @@ The four visible artifacts remain:
 3. 1-minute demo video
 4. Public repo and README
 But the evidence order is:
 1. environment contract
@@ -62,38 +83,113 @@ But the evidence order is:
 6. training or eval notebook evidence
 7. demo and repo polish
-## 5. Non-Negotiables
 - One stable task only.
 - No broad cross-science claims unless evidence exists.
 - No training-first drift.
 - No dependence on reward curves alone.
 - No repo/video polish before environment and baselines are real.
-## 6. Single Stable Task
 We intentionally narrow the scope to one environment family:
-- fixed-boundary, low-resolution, 2-period quasi-helical stellarator
-- one baseline input
-- small seed perturbation for episode variety
-- budget of 6 VMEC runs per episode
 The task is:
-> improve quasi-symmetry under explicit constraints with limited simulation budget
 ### Constraints
-- aspect ratio in `[4.5, 7.0]`
-- edge iota in `[0.3, 0.6]`
-- volume `> 0.5 m^3`
 ### Objective
-- minimize quasi-symmetry residual
-## 7. Environment Contract
 The environment contract must be frozen before meaningful evaluation.
@@ -101,17 +197,15 @@ The environment contract must be frozen before meaningful evaluation.
 The observation should expose:
-- current quasi-symmetry residual
-- best residual so far
-- improvement from initial
-- aspect ratio
-- axis and edge iota
-- volume
-- magnetic well
-- VMEC convergence status
 - step number
 - budget remaining
-- target description
 - concise textual summary of the last action outcome
 The observation must be interpretable by a human without additional hidden state.
@@ -126,18 +220,20 @@ The action space stays intentionally small and discrete:
 For `run`, the controllable fields are:
-- operator: one of a small fixed set of coefficients
 - direction: increase or decrease
 - magnitude: small, medium, large
-- restart mode: hot or cold
-This is not trying to expose the full plasma design space. The goal is a legible environment, not maximal realism.
 ### Episode Flow
-1. Reset from baseline plus optional small seed perturbation.
 2. Agent chooses one action.
-3. Simulator or verifier runs.
 4. Environment returns diagnostics and reward.
 5. Episode ends on:
    - `submit`
@@ -154,17 +250,30 @@ At termination, the environment should provide:
 - total reward
 - short human-readable summary of the trajectory
-## 8. Reward V0
 The reward in this document is not the final reward. It is `Reward V0`.
-The initial scoring idea remains:
-- improvement in quasi-symmetry should help
-- constraint violations should hurt
-- VMEC non-convergence should hurt
 - wasting budget should have some cost
-- successful early submission may deserve a small bonus
 ### Reward V0 Design Goals
@@ -172,33 +281,52 @@ The initial scoring idea remains:
 - sensitive to genuine progress
 - hostile to obvious degenerate behavior
 - simple enough to debug from trajectories
 ### Reward V0 Failure Modes To Test
 We should expect at least some of these:
-- the agent spams large perturbations
 - the agent oscillates between equivalent moves
-- the agent overuses `restore_best`
-- the agent never submits
 - the agent submits too early
-- the agent learns to preserve safety but not improve objective
 The reward is only acceptable after we test for those behaviors.
-## 9. What Is Hypothesis vs Validated
 These are still hypotheses until manually or empirically checked:
-- `large` perturbations are risky enough to make restart choice meaningful
-- six runs are enough to create non-trivial decision pressure
-- the chosen coefficients create a task that is neither trivial nor impossible
 - `restore_best` is useful without becoming an exploit
 - heuristic should beat random on mean episode reward
 These should not be narrated as facts in the final demo until validated.
-## 10. Manual Playtest Plan
 Before heavy training, we should act as the agent ourselves.
@@ -209,7 +337,7 @@ Run 5 to 10 episodes manually and log for each step:
 - observation seen
 - action chosen
 - reason for the action
-- simulator outcome
 - reward returned
 - whether the reward matched intuitive quality
@@ -217,7 +345,7 @@ Run 5 to 10 episodes manually and log for each step:
 - can a human understand what to do from the observation?
 - do action labels map to meaningful decisions?
-- is six-run budgeting interesting or arbitrary?
 - which actions are high leverage?
 - do obvious bad actions get punished?
 - do obviously good actions get rewarded?
@@ -229,7 +357,7 @@ Run 5 to 10 episodes manually and log for each step:
 - one paragraph on what a good episode looks like
 - one paragraph on what broke or felt ambiguous
-## 11. Reward Iteration Story
 The reward iteration story is not a side note. It is likely part of the pitch.
@@ -242,13 +370,13 @@ We should aim to document at least one concrete sequence:
 Examples of acceptable story structure:
-- "The agent kept making risky large moves, so we increased the non-convergence penalty."
-- "The agent kept deferring commitment, so we adjusted terminal incentives."
-- "The agent overused restore-best, so we changed the reward/step logic to make stalling unprofitable."
 This is stronger than saying only "reward improved after training."
-## 12. Evidence Plan
 ### HF Space
@@ -259,6 +387,22 @@ Must prove:
 - one stable episode runs end-to-end
 - the remote behavior matches the local contract
 ### Colab Notebook
 Primary job:
@@ -273,11 +417,18 @@ Secondary job:
 If training is weak but the environment and eval traces are strong, the notebook still ships.
 ### Demo Video
 The video should show:
-1. the task
 2. the environment observation and action space
 3. one manual or agent trajectory
 4. one reward pathology and fix
@@ -289,14 +440,21 @@ Reward curves are optional supporting visuals, not the center of the story.
 The repo should make the environment easy to understand:
-- what the task is
 - what the agent sees
 - what the agent can do
 - how reward works
 - how to run one episode
 - where the demo evidence lives
-## 13. Success Gates
 ### Gate 1: Environment Contract Locked
@@ -305,53 +463,74 @@ The repo should make the environment easy to understand:
 - action schema frozen
 - terminal conditions frozen
-### Gate 2: Manual Playtest Pass
 - human can act coherently
 - at least one trajectory feels sensible
 - at least one pathology identified or ruled out
-### Gate 3: Stable Local Episode
-- local modify -> solve -> observe loop works
 - at least one end-to-end episode is stable
-### Gate 4: Reward V1
 - at least one reward revision completed
 - story is documented with before/after behavior
-### Gate 5: Baselines
 - random baseline complete
 - heuristic baseline complete
 - heuristic is at least competitive and preferably better than random
-### Gate 6: Remote Environment
 - HF Space live
 - remote client runs one clean episode
-### Gate 7: Notebook Evidence
 - notebook runs end-to-end
 - traces exported
 - training evidence included only if it adds signal
-## 14. Timeline
 ### Phase 0
-Lock the environment contract and validate the minimal toolchain needed to play the game.
 Deliverables:
 - frozen task definition
 - frozen action and observation schema
-- proof that one VMEC modify -> run -> diagnose loop works
 ### Phase 1
 Manual-playtest the environment.
 Deliverables:
@@ -359,7 +538,7 @@ Deliverables:
 - 5 to 10 episode logs
 - notes on leverage, ambiguity, and pathologies
-### Phase 2
 Implement or refine Reward V0 into Reward V1 based on real behavior.
@@ -369,7 +548,7 @@ Deliverables:
 - documented fix
 - updated reward logic
-### Phase 3
 Stabilize one local task and run baselines.
@@ -379,7 +558,7 @@ Deliverables:
 - random baseline
 - heuristic baseline
-### Phase 4
 Deploy HF Space and validate remote parity.
@@ -388,18 +567,19 @@ Deliverables:
 - live environment
 - one stable remote episode
-### Phase 5
 Produce notebook evidence.
 Deliverables:
 - Colab notebook
 - traces
 - baseline comparison
 - training outputs only if persuasive
-### Phase 6
 Record the demo and make the repo readable.
@@ -409,7 +589,7 @@ Deliverables:
 - public README
 - linked artifacts
-## 15. Fallback Rules
 If something goes wrong, the fallback should preserve the environment story.
@@ -420,11 +600,22 @@ Do not force a training-centric pitch.
 Ship:
 - strong environment
 - manual playtest evidence
 - reward iteration story
 - baseline traces
 - one stable remote demo
 ### If reward is unstable
 Reduce ambition:
@@ -439,9 +630,10 @@ Do not broaden scope.
 Instead:
-- simplify the starting configuration
 - tighten the action set
-- make the task more learnable within six runs
 ### If the task is too easy
@@ -453,13 +645,13 @@ Instead:
 - adjust magnitudes
 - adjust reward to discourage trivial submission
-## 16. Demo Story
 The recommended demo structure is:
 ### Part 1: Problem
-"The agent gets a small VMEC budget to improve a stellarator design while staying within constraints."
 ### Part 2: Environment
@@ -475,14 +667,16 @@ The recommended demo structure is:
 ### Part 5: Why It Matters
-"This is a clear, reproducible simulation environment for budget-constrained scientific decision-making."
 That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
-## 17. Immediate Next Actions
-1. Freeze the environment contract in code and docs.
-2. Run manual playtests before heavy training work.
-3. Mark the current reward as `V0`.
-4. Log the first real pathology and reward revision.
-5. Do not let notebook or video work outrun the environment evidence.

 **Hackathon:** OpenEnv Hackathon, March 7-8, 2026
 **Track:** Statement 3.1 (World Modeling — Professional Tasks)
+**Status:** Judge-aligned plan with `P1` locked
 ## 1. Submission Thesis
 We are submitting a clear, reproducible training environment for a constrained scientific design task:
+- official `P1` benchmark semantics
+- a narrow, human-playable action space
+- real verifier feedback from `constellaration`
 - explicit constraints
 - a reward function that is understandable and iteratively improved
 Training is supporting evidence. The environment is the product.
+## 2. Locked Decisions
+These decisions are now fixed unless a hard blocker appears:
+- benchmark task: `P1`
+- submission framing: `Statement 3.1`
+- verifier of record: `constellaration.problems.GeometricalProblem`
+- implementation strategy: fresh wiring in this repo
+- reuse policy: do not port the old `ai-sci-feasible-designs` harness; only reuse selected JSON artifacts or boundaries when useful
+Execution rule after lock:
+- do not reopen these decisions in new planning passes unless a real blocker appears
+- once a decision is locked, translate it into code, fixtures, baselines, or deployment work
+## 3. What Changed From V1
 This version changes the center of gravity:
 - `reward shaping story > polished final reward formula`
 - `manual playtesting > training-first iteration`
 - `clarity and reproducibility > broad unsupported transfer claims`
+- `fresh, minimal environment wiring > transplanting legacy orchestration`
 This version also separates:
 - what is a working hypothesis
 - what must be validated before it becomes part of the final pitch
+## 4. Judge-Aligned Priorities
 The judging signal now implies four priorities:
 3. A human should be able to act in the environment coherently before we invest heavily in training.
 4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
+## 5. Final Artifacts
 The four visible artifacts remain:
 3. 1-minute demo video
 4. Public repo and README
+The primary compute workspace should be Northflank:
+- Northflank Jupyter Notebook with PyTorch on the team H100 for development, verifier integration, baselines, and training/debugging
+- HF Space as the hosted environment surface
+- Colab as the minimal required public notebook artifact
 But the evidence order is:
 1. environment contract
 6. training or eval notebook evidence
 7. demo and repo polish
+## 6. Non-Negotiables
 - One stable task only.
 - No broad cross-science claims unless evidence exists.
 - No training-first drift.
 - No dependence on reward curves alone.
 - No repo/video polish before environment and baselines are real.
+- No harness transplant from `ai-sci-feasible-designs`.
+- No new strategy churn after `P1` + rotating-ellipse is locked unless a blocker forces it.
+## 7. Single Stable Task
 We intentionally narrow the scope to one environment family:
+- `P1` geometrical benchmark
+- rotating-ellipse, low-dimensional design space
+- official `constellaration` verifier
+- low-fidelity evaluation for ordinary interaction
+- optional high-fidelity verification for final checks or `submit`
 The task is:
+> improve a stellarator boundary on the `P1` benchmark under explicit constraints and limited evaluation budget
 ### Constraints
+Use the official `P1` constraints:
+- aspect ratio `<= 4.0`
+- average triangularity `<= -0.5`
+- edge rotational transform over field periods `>= 0.3`
 ### Objective
+Use the official `P1` objective:
+- minimize `max_elongation`
+### Why This Task
+- it is official rather than invented
+- it is cheaper than `P2` and `P3` because `P1` skips QI
+- it maps cleanly to a tool-using scientific workflow
+- it is easier to explain than a broader fusion-design claim
+## 8. Fresh Wiring Rule
+This repo should implement a minimal environment directly for the hackathon.
+That means:
+- define our own environment contract
+- define our own reward logic on top of the official verifier
+- define our own baselines
+- define our own HF Space interface
+That does not mean:
+- importing the old governor
+- importing the old planner
+- importing the old experiment harness
+- recreating the old agent-as-coder stack
+Allowed reuse:
+- official `constellaration` library behavior
+- selected JSON artifacts or seed boundaries
+- problem notes as human reference
+Implementation handoff:
+- the remaining work is now wiring, smoke validation, manual playtesting, baselines, and deployment
+- do not treat supporting decision notes as a new planning backlog
+## 8.1 Compute Surfaces
+Use each surface for one clear purpose:
+- Northflank Jupyter Notebook with PyTorch:
+  - main development and compute workspace
+  - verifier sanity checks
+  - manual playtesting
+  - baseline runs
+  - optional RL fine-tuning
+- HF Space:
+  - public OpenEnv environment surface
+  - remote `reset` and `step` endpoint for the final demo path
+- Colab:
+  - minimal reproducible evaluation or training notebook required by the hackathon
+Northflank-specific constraint:
+- containers are ephemeral, so persistent storage must be attached before relying on saved models, caches, or fixture downloads
+Deployment path:
+- develop and verify in Northflank or local
+- commit and push changes to the public GitHub repo
+- have HF Space build and serve from that repo path
+- do not rely on manual copy-paste deployment as the default path
+Auth stance:
+- prefer a public HF Space for the hackathon to keep the Colab artifact simple
+- if the Space must be private, the notebook must explicitly document token-based access
+## 9. Environment Contract
 The environment contract must be frozen before meaningful evaluation.
 The observation should expose:
+- current `max_elongation`
+- current aspect ratio
+- current average triangularity
+- current edge rotational transform over field periods
+- current feasibility score or normalized violation summary
+- best-so-far feasible score
+- best-so-far least-violating design summary
 - step number
 - budget remaining
 - concise textual summary of the last action outcome
 The observation must be interpretable by a human without additional hidden state.
 For `run`, the controllable fields are:
+- parameter: one of
+  - `aspect_ratio`
+  - `elongation`
+  - `rotational_transform`
 - direction: increase or decrease
 - magnitude: small, medium, large
+This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism.
 ### Episode Flow
+1. Reset from one rotating-ellipse initial state or a small frozen set of initial states.
 2. Agent chooses one action.
+3. Low-fidelity verifier runs for normal interaction.
 4. Environment returns diagnostics and reward.
 5. Episode ends on:
    - `submit`
 - total reward
 - short human-readable summary of the trajectory
+## 10. Verifier Contract
+The verifier of record is `constellaration.problems.GeometricalProblem`.
+The environment must preserve:
+- objective direction
+- constraint direction
+- feasibility semantics
+- score ordering
+The environment may add reward shaping, but it must not redefine what `P1` means.
+## 11. Reward V0
 The reward in this document is not the final reward. It is `Reward V0`.
+The initial scoring idea should be feasibility-first:
+- reducing normalized constraint violation should help
+- becoming feasible should give a meaningful bonus
+- once feasible, lower `max_elongation` should help
 - wasting budget should have some cost
+- successful submission may deserve a small bonus
 ### Reward V0 Design Goals
 - sensitive to genuine progress
 - hostile to obvious degenerate behavior
 - simple enough to debug from trajectories
+- aligned with official `P1` semantics
 ### Reward V0 Failure Modes To Test
 We should expect at least some of these:
 - the agent oscillates between equivalent moves
 - the agent submits too early
+- the agent never submits
+- the agent learns to improve objective before it learns feasibility
+- the agent camps near one constraint while breaking another
+- the agent overuses `restore_best`
 The reward is only acceptable after we test for those behaviors.
+## 12. Verifier and Reward Fixture Checks
+Before training, we should validate environment wiring with a few fixed fixtures.
+Use:
+- one known-good design or near-winning design
+- a few near-boundary designs
+- a few clearly infeasible designs
+Purpose:
+- verify the verifier is wired correctly
+- verify the reward ordering makes sense
+- verify feasible designs outrank clearly infeasible ones
+This is calibration, not training.
+## 13. What Is Hypothesis vs Validated
 These are still hypotheses until manually or empirically checked:
+- six steps are enough to create non-trivial decision pressure
+- the rotating-ellipse action space is expressive enough for a meaningful `P1` task
 - `restore_best` is useful without becoming an exploit
 - heuristic should beat random on mean episode reward
+- low-fidelity interaction is predictive enough for useful policy learning
 These should not be narrated as facts in the final demo until validated.
+## 14. Manual Playtest Plan
 Before heavy training, we should act as the agent ourselves.
 - observation seen
 - action chosen
 - reason for the action
+- verifier outcome
 - reward returned
 - whether the reward matched intuitive quality
 - can a human understand what to do from the observation?
 - do action labels map to meaningful decisions?
+- is the step budget interesting or arbitrary?
 - which actions are high leverage?
 - do obvious bad actions get punished?
 - do obviously good actions get rewarded?
 - one paragraph on what a good episode looks like
 - one paragraph on what broke or felt ambiguous
+## 15. Reward Iteration Story
 The reward iteration story is not a side note. It is likely part of the pitch.
 Examples of acceptable story structure:
+- "The agent improved elongation while staying deeply infeasible, so we increased feasibility-first shaping."
+- "The agent hovered near one constraint and ignored another, so we changed the violation shaping."
+- "The agent overused restore-best, so we changed the reward or step logic to make stalling unprofitable."
 This is stronger than saying only "reward improved after training."
+## 16. Evidence Plan
 ### HF Space
 - one stable episode runs end-to-end
 - the remote behavior matches the local contract
+HF Space is the serving surface, not the main heavy-compute workspace.
+### Northflank Notebook
+Must prove:
+- Jupyter Notebook with PyTorch is live on the team H100
+- persistent storage is attached
+- verifier and baseline work runs there without local-machine dependency
+- environment/debug/training work can proceed there even if local runtime is inconvenient
+- one smoke check passes:
+  - import `constellaration`
+  - generate one rotating-ellipse boundary
+  - run one low-fidelity verifier call
+  - write a result artifact to persistent storage
 ### Colab Notebook
 Primary job:
 If training is weak but the environment and eval traces are strong, the notebook still ships.
+Colab is a required artifact, but it is not the preferred main compute surface.
+Connectivity rule:
+- if HF Space is public, the notebook uses direct HTTP calls with no extra auth flow
+- if HF Space is private, the notebook must state the required token path and setup explicitly
 ### Demo Video
 The video should show:
+1. the `P1` task
 2. the environment observation and action space
 3. one manual or agent trajectory
 4. one reward pathology and fix
 The repo should make the environment easy to understand:
+- what `P1` is
 - what the agent sees
 - what the agent can do
 - how reward works
 - how to run one episode
 - where the demo evidence lives
+- why the repo is freshly wired rather than copied from the old project
+## 17. Success Gates
+### Prerequisite: Northflank Compute Ready
+- notebook starts on the team H100
+- persistent storage mount is usable
+- smoke test artifact is written successfully
 ### Gate 1: Environment Contract Locked
 - action schema frozen
 - terminal conditions frozen
+### Gate 2: Verifier Wiring Pass
+- official `P1` verifier returns expected outputs
+- fixture ordering is sensible
+- objective direction is correct
+### Gate 3: Manual Playtest Pass
 - human can act coherently
 - at least one trajectory feels sensible
 - at least one pathology identified or ruled out
+### Gate 4: Stable Local Episode
+- local modify -> verify -> observe loop works
 - at least one end-to-end episode is stable
+### Gate 5: Reward V1
 - at least one reward revision completed
 - story is documented with before/after behavior
+### Gate 6: Baselines
 - random baseline complete
 - heuristic baseline complete
 - heuristic is at least competitive and preferably better than random
+### Gate 7: Remote Environment
 - HF Space live
 - remote client runs one clean episode
+### Gate 8: Notebook Evidence
 - notebook runs end-to-end
 - traces exported
 - training evidence included only if it adds signal
+## 18. Timeline
 ### Phase 0
+Run two parallel tracks:
+- Track A: Northflank compute setup and smoke validation
+- Track B: lock the `P1` environment contract
 Deliverables:
 - frozen task definition
 - frozen action and observation schema
+- proof that one local `P1` loop works
+- Northflank smoke test pass
 ### Phase 1
+Wire the official verifier and run fixture checks.
+Deliverables:
+- one good fixture
+- near-boundary fixtures
+- bad fixtures
+- confidence that reward/verifier ordering is sane
+### Phase 2
 Manual-playtest the environment.
 Deliverables:
 - 5 to 10 episode logs
 - notes on leverage, ambiguity, and pathologies
+### Phase 3
 Implement or refine Reward V0 into Reward V1 based on real behavior.
 - documented fix
 - updated reward logic
+### Phase 4
 Stabilize one local task and run baselines.
 - random baseline
 - heuristic baseline
+### Phase 5
 Deploy HF Space and validate remote parity.
 - live environment
 - one stable remote episode
+### Phase 6
 Produce notebook evidence.
 Deliverables:
 - Colab notebook
+- Northflank traces or run exports
 - traces
 - baseline comparison
 - training outputs only if persuasive
+### Phase 7
 Record the demo and make the repo readable.
 - public README
 - linked artifacts
+## 19. Fallback Rules
 If something goes wrong, the fallback should preserve the environment story.
 Ship:
 - strong environment
+- verifier and fixture evidence
 - manual playtest evidence
 - reward iteration story
 - baseline traces
 - one stable remote demo
+### If Northflank is delayed or unavailable
+Do not block environment design on it.
+Fallback:
+- continue contract definition, reward design, and basic wiring locally
+- use local CPU or Colab for limited verifier/debug work
+- keep Northflank as the preferred compute target, but do not stall the whole plan waiting for it
 ### If reward is unstable
 Reduce ambition:
 Instead:
+- simplify the initial states
 - tighten the action set
+- reduce magnitude choices
+- keep the environment more learnable within the fixed budget
 ### If the task is too easy
 - adjust magnitudes
 - adjust reward to discourage trivial submission
+## 20. Demo Story
 The recommended demo structure is:
 ### Part 1: Problem
+"The agent interacts with the official `P1` stellarator-design benchmark and must improve a design under strict geometric constraints."
 ### Part 2: Environment
 ### Part 5: Why It Matters
+"This is a clear, reproducible scientific workflow environment built around a real verifier, not a shortcut task."
 That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
+## 21. Immediate Next Actions
+1. Freeze the `P1` environment contract in code and docs.
+2. Implement fresh verifier wiring in this repo.
+3. Run fixture checks before heavy training work.
+4. Run manual playtests before heavy training work.
+5. Mark the current reward as `V0`.
+6. Log the first real pathology and reward revision.
+7. Do not let notebook or video work outrun the environment evidence.

docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Fusion Design Lab: Next 12 Hours Checklist
-This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with environment clarity and reproducibility driving the sequence.
 ## Core Rule
@@ -11,14 +11,36 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
 Carry these rules through the whole checklist:
 - Freeze the environment contract before heavy iteration.
 - Treat the current reward as `Reward V0`, not final reward.
 - Distinguish validated facts from working hypotheses.
 - Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
 - If training is weak, ship the environment story anyway.
-## Hour 0-2: Lock the Environment Contract
-1. Write the exact environment spec.
 2. Freeze one task only.
 3. Define:
    - observation schema
@@ -28,13 +50,13 @@ Carry these rules through the whole checklist:
    - reward V0 terms
    - initial penalties
 4. Update the main diagram so it emphasizes:
-   - environment
-   - verifier
    - reward shaping
    - manual playtesting
 5. Mark open assumptions explicitly:
-   - risky action magnitudes
-   - whether 6 runs is enough
    - whether `restore_best` is useful without becoming an exploit
 Exit condition: a human can read the spec and understand how to act in the environment.
@@ -44,18 +66,30 @@ Artifacts:
 - revised mermaid diagram
 - short hypothesis list
-## Hour 2-4: Manual Playtest and Fix Reward Pathologies
-1. Manually play 5 to 10 episodes.
-2. Log for each step:
    - observation
    - chosen action
    - expected effect
    - returned reward
    - confusion or exploit if observed
-3. Identify at least one bad incentive or exploit.
-4. Patch reward or penalty logic immediately.
-5. Write the reward shaping story:
    - initial reward V0
    - bad behavior
    - refinement to reward V1
@@ -64,13 +98,14 @@ Artifacts:
 Exit condition: you can explain why the environment now rewards the intended behavior.
 Artifacts:
 - manual playtest log
 - reward shaping note
 - reward V1 delta note
 ## Hour 4-6: Stabilize the Local Task
-1. Prove the local physics or verifier loop.
 2. Run one stable end-to-end task repeatedly.
 3. Confirm the action schema is deterministic enough for reproducible episodes.
 4. Save one clean local trajectory.
@@ -84,10 +119,17 @@ Artifacts:
 ## Hour 6-8: Make the HF Space Real
-1. Package the OpenEnv environment for remote use.
-2. Verify remote `reset` and `step`.
-3. Run one clean remote episode end-to-end.
-4. Confirm the remote environment preserves the same task contract as local.
 Exit condition: the environment is runnable in the actual submission surface, not only locally.
@@ -99,7 +141,7 @@ Artifacts:
 1. Implement the random baseline.
 2. Implement the heuristic baseline.
-3. Run short comparisons on the same stable task.
 4. Save:
    - comparison numbers
    - behavior traces
@@ -119,20 +161,22 @@ Artifacts:
    - multi-turn episodes
    - behavior traces
    - reward or behavior comparison outputs
-3. Draft the 60-second demo script.
-4. Record the demo around:
-   - what the environment is
    - how reward was refined
    - what manual playtesting revealed
    - one stable trajectory
    - baseline comparison
-5. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
-6. Make the repo public-facing and readable only after the artifacts are real.
 Exit condition: all four visible artifacts exist in usable form.
 Artifacts:
 - Colab training or eval script
 - demo script
 - draft or final video
 - updated repo README
@@ -141,21 +185,26 @@ Artifacts:
 ## Artifact Order
 1. Environment spec
-2. Manual playtest log
-3. Reward revision note
-4. Stable task run
-5. Random baseline
-6. Heuristic baseline
-7. Colab training or eval evidence
-8. Demo recording
-9. Repo polish
 ## Non-Negotiables
 - Do not widen scope beyond one stable task.
 - Do not optimize training before manual playtesting.
 - Do not rely on reward curves alone; keep trajectory evidence.
 - Do not narrate hypotheses as facts before they are checked.
 - Do not polish the repo or video before the environment and baselines are real.
 - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
 - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.

 # Fusion Design Lab: Next 12 Hours Checklist
+This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with `P1`, fresh wiring, and environment clarity driving the sequence.
 ## Core Rule
 Carry these rules through the whole checklist:
 - Freeze the environment contract before heavy iteration.
+- Keep the repo freshly wired; do not port the old harness.
 - Treat the current reward as `Reward V0`, not final reward.
 - Distinguish validated facts from working hypotheses.
 - Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
 - If training is weak, ship the environment story anyway.
+- Use Northflank as the main compute workspace; keep HF Space and Colab as the submission surfaces.
+- Do not open another strategy loop unless a real blocker appears.
+## Hour 0-2: Parallelize Compute Bring-Up and Contract Lock
+### Track A: Northflank Compute
+1. Bring up the Northflank Jupyter Notebook with PyTorch on the team H100.
+2. Attach persistent storage before relying on saved models, caches, or fixture downloads.
+3. Pass a concrete smoke test:
+   - import `constellaration`
+   - generate one rotating-ellipse boundary
+   - run one low-fidelity verifier call
+   - write one artifact to persistent storage
+Exit condition: the notebook is not just open; the verifier path works and persistent storage is usable.
+Artifacts:
+- Northflank notebook live
+- smoke test note
+- one persisted smoke artifact
+### Track B: Environment Contract
+1. Write the exact `P1` environment spec.
 2. Freeze one task only.
 3. Define:
    - observation schema
    - reward V0 terms
    - initial penalties
 4. Update the main diagram so it emphasizes:
+   - `P1`
+   - official verifier
    - reward shaping
    - manual playtesting
 5. Mark open assumptions explicitly:
+   - whether the rotating-ellipse action set is expressive enough
+   - whether the fixed step budget is enough
    - whether `restore_best` is useful without becoming an exploit
 Exit condition: a human can read the spec and understand how to act in the environment.
 - revised mermaid diagram
 - short hypothesis list
+Transition rule:
+- once Track B exits, stop rewriting the strategy and move straight into wiring and verifier checks
+## Hour 2-4: Verify Wiring, Then Manual Playtest
+1. Run fixture checks:
+   - known-good or near-winning design
+   - near-boundary designs
+   - clearly bad designs
+2. Confirm:
+   - verifier outputs are sane
+   - reward ordering is sane
+   - objective direction is correct
+3. Manually play 5 to 10 episodes.
+4. Log for each step:
    - observation
    - chosen action
    - expected effect
    - returned reward
    - confusion or exploit if observed
+5. Identify at least one bad incentive or exploit.
+6. Patch reward or penalty logic immediately.
+7. Write the reward shaping story:
    - initial reward V0
    - bad behavior
    - refinement to reward V1
 Exit condition: you can explain why the environment now rewards the intended behavior.
 Artifacts:
+- fixture check note
 - manual playtest log
 - reward shaping note
 - reward V1 delta note
 ## Hour 4-6: Stabilize the Local Task
+1. Prove the fresh local `P1` verifier loop.
 2. Run one stable end-to-end task repeatedly.
 3. Confirm the action schema is deterministic enough for reproducible episodes.
 4. Save one clean local trajectory.
 ## Hour 6-8: Make the HF Space Real
+1. Package the OpenEnv `P1` environment for remote use.
+2. Use the explicit deployment path:
+   - commit changes in this repo
+   - push to GitHub
+   - let HF Space build from the repo
+3. Decide and document the access mode:
+   - preferred: public HF Space for the hackathon
+   - if private: token-based notebook access documented
+4. Verify remote `reset` and `step`.
+5. Run one clean remote episode end-to-end.
+6. Confirm the remote environment preserves the same task contract as local.
 Exit condition: the environment is runnable in the actual submission surface, not only locally.
 1. Implement the random baseline.
 2. Implement the heuristic baseline.
+3. Run short comparisons on the same stable `P1` task.
 4. Save:
    - comparison numbers
    - behavior traces
    - multi-turn episodes
    - behavior traces
    - reward or behavior comparison outputs
+3. Keep heavy verifier and training work on Northflank; use Colab as the thin public artifact.
+4. Draft the 60-second demo script.
+5. Record the demo around:
+   - what `P1` is
    - how reward was refined
    - what manual playtesting revealed
    - one stable trajectory
    - baseline comparison
+6. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
+7. Make the repo public-facing and readable only after the artifacts are real.
 Exit condition: all four visible artifacts exist in usable form.
 Artifacts:
 - Colab training or eval script
+- Northflank run notes or exported traces
 - demo script
 - draft or final video
 - updated repo README
 ## Artifact Order
 1. Environment spec
+2. Fixture check note
+3. Manual playtest log
+4. Reward revision note
+5. Stable task run
+6. Random baseline
+7. Heuristic baseline
+8. Northflank traces or training evidence
+9. Colab training or eval evidence
+10. Demo recording
+11. Repo polish
 ## Non-Negotiables
 - Do not widen scope beyond one stable task.
+- Do not port the old `ai-sci-feasible-designs` harness into this repo.
 - Do not optimize training before manual playtesting.
 - Do not rely on reward curves alone; keep trajectory evidence.
 - Do not narrate hypotheses as facts before they are checked.
 - Do not polish the repo or video before the environment and baselines are real.
 - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
 - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
+- Do not rely on Northflank container-local state without persistent storage.
+- Do not block contract design work on Northflank provisioning friction.

docs/PIVOT_P1_ROTATING_ELLIPSE.md ADDED Viewed

	@@ -0,0 +1,238 @@

+# Pivot: P1 Rotating-Ellipse Environment
+**Date:** 2026-03-07
+**Status:** Supporting decision record, superseded as planning SSOT by `FUSION_DESIGN_LAB_PLAN_V2.md`
+**Supersedes:** Synthetic physics model in current `server/physics.py`
+Use this file as rationale for the pivot, not as a fresh planning queue. Once the pivot is accepted, implementation should follow the SSOT plan docs.
+## Decision
+Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
+This borrows the strongest low-dimensional entry point from the proven winning approach documented in `raw-session.md`, not the full approach.
+## What Was Validated
+| Claim | Status | Source |
+|---|---|---|
+| P1 is the cleanest benchmark task | Verified | `problems.py:113` — minimize max_elongation, 3 constraints, no QI |
+| P1 skips QI | Verified | `problems.py:145` — `_does_it_require_qi = False` |
+| Low-fidelity eval is fast enough | Measured | 0.63s per eval on local machine; postmortem says ~1s/eval |
+| High-fidelity eval is expensive | Measured | 24s per eval; only viable for final validation |
+| Rotating-ellipse can find P1-feasible designs | Verified | `raw-session.md`: sweeps found 3 feasible designs in ~20 min |
+| vmecpp installs from wheels | Verified | `uv pip install vmecpp==0.4.7` resolves cleanly, no compilation |
+| constellaration Dockerfile is not bloated | Verified | `python:3.10-slim` + `pip install constellaration` |
+| Current seed logic is too loose for P1 | Verified | `seeds.py:42`: triangularity override 0.05 vs constraint -0.5 |
+| Full harness should not be ported | Verified | Postmortem: prescriptive harness produced 0 feasible candidates |
+## What Is Hypothesis (Not Yet Validated)
+1. **6 actions is enough** to reach or improve P1 feasibility from a rotating-ellipse starting point. Must validate by manual playtest immediately.
+2. **Discretized rotating-ellipse perturbations** create non-trivial decision pressure (not too easy, not impossible).
+3. **Low-fidelity metrics** are close enough to high-fidelity P1 scoring that low-fi reward signal is meaningful.
+4. **The Docker image** builds and deploys on HF Spaces within reasonable time/size limits.
+## Environment Design
+### Single Task
+Improve a stellarator boundary's P1 score using the rotating-ellipse parameterization under the official ConStellaration P1 constraints.
+### P1 Constraints (from `GeometricalProblem`)
+- aspect_ratio <= 4.0
+- average_triangularity <= -0.5
+- edge_rotational_transform / n_field_periods >= 0.3
+### P1 Objective
+Minimize `max_elongation`. Score = `1 - clip((max_elongation - 1) / 9, 0, 1)`.
+Feasibility tolerance: normalized constraint violations <= 1% (0.01).
+### Parameter Space
+The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
+| Parameter | Role | Typical range |
+|---|---|---|
+| `aspect_ratio` | Width-to-height ratio of the boundary | 2.0 - 8.0 |
+| `elongation` | Vertical stretching of cross-section | 1.0 - 5.0 |
+| `rotational_transform` | Magnetic field line winding | 0.1 - 1.0 |
+| `n_field_periods` | Fixed at 3 (not an action) | 3 |
+These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
+### Action Space
+Discrete perturbations on the 3 rotating-ellipse parameters:
+```
+intent: "run" | "submit" | "restore_best"
+operator: "aspect_ratio" | "elongation" | "rotational_transform"
+direction: "increase" | "decrease"
+magnitude: "small" | "medium" | "large"
+```
+Magnitude deltas (to be tuned by playtest):
+| Parameter | small | medium | large |
+|---|---|---|---|
+| aspect_ratio | 0.1 | 0.3 | 0.8 |
+| elongation | 0.1 | 0.3 | 0.8 |
+| rotational_transform | 0.02 | 0.05 | 0.15 |
+### Episode Flow
+1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
+2. Agent chooses action.
+3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
+4. If `restore_best`: revert to best-known parameters. No VMEC cost, but costs a budget step.
+5. If `submit`: end episode. Optionally run high-fi for final score.
+6. Episode ends on `submit` or budget exhaustion.
+### Budget
+6 evaluations per episode. All non-submit actions cost 1 budget.
+### Observation
+```
+diagnostics_text: str          # human-readable summary
+max_elongation: float          # P1 objective (minimize)
+aspect_ratio: float            # constraint: <= 4.0
+average_triangularity: float   # constraint: <= -0.5
+edge_iota_over_nfp: float     # constraint: >= 0.3
+p1_score: float                # official P1 score (0 if infeasible)
+p1_feasibility: float          # max normalized constraint violation
+constraints_satisfied: bool    # feasibility <= 0.01
+vacuum_well: float             # stability indicator
+step_number: int
+budget_remaining: int
+best_score: float
+target_spec: str
+```
+### Reward V0
+Feasibility-first, then objective improvement:
+```
+if constraints newly satisfied:
+    +3.0
+if constraints newly violated:
+    -3.0
+if feasible:
+    reward += (prev_elongation - curr_elongation) * 10.0  # improvement in objective
+else:
+    reward += (prev_feasibility - curr_feasibility) * 5.0  # progress toward feasibility
+per-step cost: -0.1
+submit bonus (if feasible and improved):
+    +5.0 * improvement_ratio + 1.0 * budget_efficiency
+submit penalty (if infeasible or no improvement):
+    -1.0
+```
+This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
+### State
+```
+step_count: int
+current_params: {aspect_ratio, elongation, rotational_transform}
+best_params: {aspect_ratio, elongation, rotational_transform}
+initial_score: float
+best_score: float
+current_feasibility: float
+best_feasibility: float
+history: list[str]
+```
+## Two Designs That Were Considered
+| | Rotating-ellipse env | Curated-seed Fourier-repair env |
+|---|---|---|
+| Action space | 3 parameters (AR, elongation, iota) | N Fourier modes |
+| Starting point | Generated from parameters | Frozen from HF dataset |
+| Interpretability | High — parameters map to physical shape | Lower — mode perturbations are abstract |
+| Dataset dependency | None at runtime | Requires offline curation |
+| Search space coverage | Low-dimensional subfamily | Full boundary space |
+| Hackathon viability | High | Medium (needs pre-work) |
+**Decision:** Rotating-ellipse for the hackathon. It is self-contained, human-playable, and proven as a viable entry point for P1.
+**What it does NOT claim:** Full coverage of the P1 boundary design space. This is a tradeoff accepted for hackathon scope.
+## Implementation Order
+### Phase 1: Physics Backend (~1 hour)
+Rewrite `server/physics.py` to wrap:
+- `constellaration.initial_guess.generate_rotating_ellipse` for boundary generation
+- `constellaration.forward_model.forward_model` with low-fi settings for evaluation
+- `constellaration.problems.GeometricalProblem` for official P1 scoring on submit
+### Phase 2: Environment Contract (~1 hour)
+Update `server/environment.py`:
+- New observation schema with P1 metrics
+- New action schema for rotating-ellipse perturbations
+- Reward V0 with feasibility-first logic
+- Terminal conditions
+Update `fusion_lab/models.py` for new schemas.
+### Phase 3: Manual Playtest (~30 min)
+Validate hypothesis: "6 actions is enough."
+- Play 5-10 episodes manually
+- Log: can a human reach feasibility? Improve elongation?
+- Tune magnitude deltas if needed
+- Document at least one pathology or adjustment
+### Phase 4: Baselines (~30 min)
+- Random agent
+- Heuristic agent (greedy toward known-good parameter region)
+- Comparison table
+### Phase 5: Deploy + Evidence (~2 hours)
+- Update Dockerfile/deps for constellaration
+- `openenv validate` + `openenv push`
+- Colab notebook connecting to live environment
+- 1-minute demo video
+This section exists to justify the pivot with an implementation path. It should not trigger another strategy pass when the same work is already covered by the SSOT plan and checklist.
+## Fallback
+If constellaration deployment fails (Docker build, HF Spaces issues):
+- The current synthetic physics environment is already working and deployment-ready
+- Fall back to shipping that with updated docs acknowledging it as a proxy model
+- Do not spend more than 1 hour debugging deployment before falling back
+## Known-Good Fixtures
+Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
+1. **Near-feasible anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — expected to be close to P1 boundary
+2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
+3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
+These are for verifier/reward sanity, not a prerequisite seed-mining project.
+## What Not To Do
+- Do not port the full ai-sci-feasible-designs harness or governor stack.
+- Do not make the task "agent writes arbitrary optimization scripts."
+- Do not stream the full HF dataset at runtime.
+- Do not mix rotating-ellipse and Fourier-repair action spaces.
+- Do not use high-fidelity eval for interactive steps (24s is too slow).
+- Do not narrate "6 actions is enough" as validated until manually playtested.
+- Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.
+- Do not reopen the task-selection debate after the pivot is already accepted unless a blocker forces it.

pyproject.toml CHANGED Viewed

@@ -1,10 +1,11 @@
 [project]
 name = "fusion-design-lab"
 version = "0.1.0"
-description = "OpenEnv environment for budget-constrained stellarator design"
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
   "fastapi>=0.115.0",
   "numpy>=2.0.0",
   "openenv-core[core]>=0.2.1",
@@ -13,9 +14,9 @@ dependencies = [
 ]
 [project.optional-dependencies]
-physics = [
-  "simsopt",
-  "vmecpp",
 ]
 dev = [
   "pre-commit>=4.0.0",
@@ -23,12 +24,15 @@ dev = [
   "ruff>=0.11.0",
 ]
 [build-system]
 requires = ["setuptools>=69.0"]
 build-backend = "setuptools.build_meta"
 [tool.setuptools]
-packages = ["fusion_lab", "server"]
 [tool.ruff]
 line-length = 100

 [project]
 name = "fusion-design-lab"
 version = "0.1.0"
+description = "OpenEnv P1 environment for constrained stellarator design with constellaration"
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
+  "constellaration",
   "fastapi>=0.115.0",
   "numpy>=2.0.0",
   "openenv-core[core]>=0.2.1",
 ]
 [project.optional-dependencies]
+notebooks = [
+  "ipykernel>=6.29.0",
+  "jupyterlab>=4.3.0",
 ]
 dev = [
   "pre-commit>=4.0.0",
   "ruff>=0.11.0",
 ]
+[project.scripts]
+server = "server.app:main"
 [build-system]
 requires = ["setuptools>=69.0"]
 build-backend = "setuptools.build_meta"
 [tool.setuptools]
+packages = ["baselines", "fusion_lab", "server"]
 [tool.ruff]
 line-length = 100

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,50 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+ARG BUILD_MODE=standalone
+ARG ENV_NAME=fusion_design_lab
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/data/p1/README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+# P1 Fixture Data
+Store tracked `P1` fixtures here.
+Intended contents:
+- one known-good or near-winning boundary JSON
+- a few near-boundary designs
+- a few clearly infeasible designs
+These fixtures are for verifier and reward sanity checks.
+Do not copy the old `ai-sci-feasible-designs` harness here. Reuse only the specific JSON artifacts needed for the fresh `P1` environment.

training/notebooks/README.md ADDED Viewed

	@@ -0,0 +1,29 @@

+# Notebooks
+Use this directory for the notebooks that support the hackathon submission.
+Expected contents:
+- one Colab-friendly notebook that connects to the deployed HF Space
+- one Northflank-friendly notebook path for verifier sanity checks, manual reward iteration, baselines, or training/debugging
+Recommended split:
+- Northflank notebook: main compute workspace on the team H100
+- Colab notebook: thin public artifact required by the hackathon
+Operational defaults:
+- use the same Python dependency set as the repo runtime
+- keep heavy verifier and training work on Northflank
+- keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
+- prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
+Northflank smoke gate:
+- import `constellaration`
+- generate one rotating-ellipse boundary
+- run one low-fidelity verifier call
+- write one artifact to persistent storage
+The notebooks are supporting evidence for the environment, not the primary product.

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff