CreativeEngineer commited on
Commit
5354ca9
·
1 Parent(s): 98ffb4a

docs: lock p1 plan and hackathon runtime setup

Browse files
.gitignore CHANGED
@@ -8,6 +8,7 @@ __pycache__/
8
  .ipynb_checkpoints/
9
  dist/
10
  build/
 
11
  *.sqlite
12
  *.db
13
  reports/
 
8
  .ipynb_checkpoints/
9
  dist/
10
  build/
11
+ *.egg-info/
12
  *.sqlite
13
  *.db
14
  reports/
.pre-commit-config.yaml CHANGED
@@ -14,3 +14,4 @@ repos:
14
  - id: check-yaml
15
  - id: check-toml
16
  - id: check-added-large-files
 
 
14
  - id: check-yaml
15
  - id: check-toml
16
  - id: check-added-large-files
17
+ exclude: ^uv\.lock$
AGENTS.md CHANGED
@@ -24,6 +24,8 @@ Use these docs as the planning SSOT:
24
  - `docs/FUSION_DELIVERABLES_MAP.md`
25
  - `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`
26
 
 
 
27
  If code and docs disagree, either:
28
 
29
  1. update code to match the docs, or
@@ -39,6 +41,7 @@ Do not leave silent divergence.
39
  4. Manual-playtest before investing heavily in training.
40
  5. Prefer behavior traces and baselines over reward-curve-only storytelling.
41
  6. Keep claims conservative and evidence-backed.
 
42
 
43
  ## Working Rules
44
 
@@ -48,6 +51,8 @@ Do not leave silent divergence.
48
  - Do not add new tests during the hackathon unless the user explicitly requests them.
49
  - Do not add complicated reward shaping until the simpler version has been tested against actual trajectories.
50
  - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
 
 
51
 
52
  ## Environment Contract Rules
53
 
 
24
  - `docs/FUSION_DELIVERABLES_MAP.md`
25
  - `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`
26
 
27
+ `docs/PIVOT_P1_ROTATING_ELLIPSE.md` is a supporting decision record, not a planning SSOT. If it disagrees with the three docs above, the three SSOT docs win.
28
+
29
  If code and docs disagree, either:
30
 
31
  1. update code to match the docs, or
 
41
  4. Manual-playtest before investing heavily in training.
42
  5. Prefer behavior traces and baselines over reward-curve-only storytelling.
43
  6. Keep claims conservative and evidence-backed.
44
+ 7. Once the task family is locked, shift to implementation instead of reopening strategy.
45
 
46
  ## Working Rules
47
 
 
51
  - Do not add new tests during the hackathon unless the user explicitly requests them.
52
  - Do not add complicated reward shaping until the simpler version has been tested against actual trajectories.
53
  - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
54
+ - Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
55
+ - Treat supporting decision records as rationale, not as a fresh task queue.
56
 
57
  ## Environment Contract Rules
58
 
README.md CHANGED
@@ -1,13 +1,13 @@
1
  # Fusion Design Lab
2
 
3
- Fusion Design Lab is an environment-first OpenEnv hackathon project for budget-constrained stellarator design.
4
 
5
  The repo is organized around one clear submission thesis:
6
 
7
- - a narrow, reproducible stellarator design task
8
- - a small discrete action space
9
- - real simulator feedback
10
- - explicit constraints
11
  - a reward function that is iteratively improved through observed behavior
12
 
13
  Training is supporting evidence. The environment is the product.
@@ -18,10 +18,16 @@ This repository is the clean hackathon workspace. The detailed planning docs liv
18
 
19
  Implementation status:
20
 
21
- - repo scaffolded
22
- - shared models defined
23
- - server and client entry points stubbed
24
- - environment contract ready to be implemented next
 
 
 
 
 
 
25
 
26
  ## Planned Repository Layout
27
 
@@ -32,17 +38,73 @@ fusion-design-lab/
32
  ├── docs/
33
  ├── fusion_lab/
34
  ├── server/
 
35
  ├── training/
36
  ├── openenv.yaml
37
  ├── pyproject.toml
38
  └── README.md
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Immediate Next Steps
42
 
43
- 1. Implement the environment contract in `server/environment.py`.
44
- 2. Implement the VMEC-backed physics loop in `server/physics.py`.
45
- 3. Run manual-playtest episodes before heavy training work.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Hackathon Working Note
48
 
 
1
  # Fusion Design Lab
2
 
3
+ Fusion Design Lab is an environment-first OpenEnv hackathon project for the `P1` stellarator benchmark.
4
 
5
  The repo is organized around one clear submission thesis:
6
 
7
+ - an official `P1` task with `constellaration` as the verifier of record
8
+ - a narrow, reproducible action space
9
+ - real verifier feedback
10
+ - explicit constraints and feasibility semantics
11
  - a reward function that is iteratively improved through observed behavior
12
 
13
  Training is supporting evidence. The environment is the product.
 
18
 
19
  Implementation status:
20
 
21
+ - `P1` is locked as the benchmark task
22
+ - docs are aligned to fresh `P1` wiring in this repo
23
+ - shared models and server/client entry points exist
24
+ - the runtime environment still needs to be rewired from the old toy scaffold to the real `P1` contract
25
+
26
+ Current mode:
27
+
28
+ - strategic task choice is already locked
29
+ - the next work is implementation, smoke validation, and manual playtesting
30
+ - new planning text should only appear when a real blocker forces a decision change
31
 
32
  ## Planned Repository Layout
33
 
 
38
  ├── docs/
39
  ├── fusion_lab/
40
  ├── server/
41
+ ├── server/data/p1/
42
  ├── training/
43
  ├── openenv.yaml
44
  ├── pyproject.toml
45
  └── README.md
46
  ```
47
 
48
+ ## Setup
49
+
50
+ Base runtime:
51
+
52
+ ```bash
53
+ uv sync
54
+ ```
55
+
56
+ Development tooling:
57
+
58
+ ```bash
59
+ uv sync --extra dev
60
+ pre-commit install
61
+ ```
62
+
63
+ Optional local notebook tooling:
64
+
65
+ ```bash
66
+ uv sync --extra notebooks
67
+ ```
68
+
69
+ ## Runtime Assumptions
70
+
71
+ - Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
72
+ - OpenEnv deployment target: Hugging Face Spaces
73
+ - Minimal submission notebook target: Colab
74
+ - Verifier of record: `constellaration.problems.GeometricalProblem`
75
+ - Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
76
+ - Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
77
+ - Preferred deployment path: push this GitHub repo and let HF Space build from the repo/Docker configuration rather than copying code manually
78
+ - Preferred Colab/HF Space connectivity: make the HF Space public for the hackathon unless privacy becomes necessary; if private, document and use an explicit access token in the notebook
79
+
80
  ## Immediate Next Steps
81
 
82
+ 1. Set up the Northflank Jupyter Notebook with PyTorch and attach persistent storage.
83
+ 2. Pass a Northflank smoke test:
84
+ - import `constellaration`
85
+ - run one rotating-ellipse generation plus one low-fidelity verifier call
86
+ - write an artifact to persistent storage
87
+ 3. Rewrite [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py) to the locked `P1` contract.
88
+ 4. Rewrite [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py) to use `constellaration`-based `P1` verification.
89
+ 5. Add tracked `P1` fixtures under [server/data/p1](/Users/suhjungdae/code/fusion-design-lab/server/data/p1).
90
+ 6. Add the Colab notebook under [training/notebooks](/Users/suhjungdae/code/fusion-design-lab/training/notebooks).
91
+ 7. Run manual playtest episodes before heavy training work.
92
+
93
+ These are implementation steps, not another planning phase.
94
+
95
+ ## Fixture Policy
96
+
97
+ This repo may reuse selected JSON artifacts or boundaries as fixed calibration fixtures.
98
+
99
+ Allowed examples:
100
+
101
+ - a known-good or near-winning `P1` boundary
102
+ - near-boundary cases
103
+ - clearly bad cases
104
+
105
+ Disallowed:
106
+
107
+ - porting the old planner, governor, or experiment harness into this repo
108
 
109
  ## Hackathon Working Note
110
 
docs/FUSION_DELIVERABLES_MAP.md CHANGED
@@ -1,6 +1,10 @@
1
  # Fusion Design Lab Deliverables Map
2
 
3
- This is the output-first map for the hackathon. It is aligned to Plan V2: environment-first, reward-iteration-driven, and conservative about training claims. Everything branches from the four final artifacts the judges and submission flow will actually see.
 
 
 
 
4
 
5
  ## Deliverables Tree
6
 
@@ -10,8 +14,9 @@ flowchart TD
10
  A --> C["Colab Eval / Training Notebook"]
11
  A --> D["1-Minute Demo"]
12
  A --> E["Public Repo + README"]
 
13
 
14
- B --> B0["Environment contract frozen"]
15
  B --> B1["Remote reset/step works"]
16
  B --> B2["Reward V0 -> V1 documented"]
17
  B --> B3["One stable task runs end-to-end"]
@@ -29,13 +34,22 @@ flowchart TD
29
  E --> E2["Setup + run instructions"]
30
  E --> E3["Submission links and artifacts"]
31
 
 
 
 
 
 
32
  B0 --> F["Observation + action schema frozen"]
33
- B3 --> G["Standalone physics loop proven"]
34
  B2 --> H["Exploit observed -> penalty added"]
35
  B4 --> I0["Deterministic action schema"]
36
  D2 --> I["Human can act coherently in env"]
37
  C3 --> J["Random baseline"]
38
  C3 --> K["Heuristic baseline"]
 
 
 
 
39
  ```
40
 
41
  ## Reverse Timeline
@@ -46,6 +60,7 @@ flowchart LR
46
  S --> R["Repo public and readable"]
47
  S --> T["Training / eval evidence exported"]
48
  S --> H["HF Space live"]
 
49
 
50
  V --> V1["Recorded clean demo trajectory"]
51
  V --> V2["Scripted 60-second story"]
@@ -54,27 +69,35 @@ flowchart LR
54
  T --> T2["Baseline comparison numbers"]
55
  T --> T3["Colab notebook runs end-to-end"]
56
 
57
- H --> H1["OpenEnv environment packaged"]
58
  H --> H2["Remote client can reset and step"]
59
  H --> H3["Verifier and reward stable"]
60
  H --> H4["Rules are clear and reproducible"]
61
 
62
  H4 --> P["Environment contract locked first"]
 
 
 
 
63
  P --> Q["Manual playtest completed first"]
64
- H3 --> M["Local physics loop proven first"]
65
  T2 --> B["Random + heuristic baselines done"]
66
  T3 --> X["Training included only if persuasive"]
67
  V1 --> Y["One stable task only"]
68
  V2 --> Z["Explain reward fix, not just reward gain"]
 
69
  ```
70
 
71
  ## Priority Order
72
 
73
- 1. Prove the local physics loop.
74
- 2. Freeze the environment contract and mark the initial reward as `V0`.
75
- 3. Manual-playtest the environment and fix obvious reward/pathology issues.
76
- 4. Make one stable OpenEnv task work remotely with clear, reproducible rules.
77
- 5. Get random and heuristic baselines.
78
- 6. Use the notebook to show traces and comparisons; include training only if it adds signal.
79
- 7. Record the demo around environment clarity, reward shaping, and one stable trajectory.
80
- 8. Polish the repo only after the artifacts are real.
 
 
 
 
1
  # Fusion Design Lab Deliverables Map
2
 
3
+ This is the output-first map for the hackathon. It is aligned to Plan V2: `P1` is locked, the environment is built fresh in this repo, the old harness is not ported, and training claims stay conservative. Everything branches from the four final artifacts the judges and submission flow will actually see.
4
+
5
+ Northflank is the recommended compute workspace behind those artifacts. HF Space and Colab remain the actual submission surfaces.
6
+
7
+ Use this map to sequence execution, not to reopen already-locked task choices.
8
 
9
  ## Deliverables Tree
10
 
 
14
  A --> C["Colab Eval / Training Notebook"]
15
  A --> D["1-Minute Demo"]
16
  A --> E["Public Repo + README"]
17
+ A --> N["Northflank H100 Workspace"]
18
 
19
+ B --> B0["P1 environment contract frozen"]
20
  B --> B1["Remote reset/step works"]
21
  B --> B2["Reward V0 -> V1 documented"]
22
  B --> B3["One stable task runs end-to-end"]
 
34
  E --> E2["Setup + run instructions"]
35
  E --> E3["Submission links and artifacts"]
36
 
37
+ N --> N1["Jupyter Notebook with PyTorch live"]
38
+ N --> N2["Persistent storage attached"]
39
+ N --> N3["Verifier + baseline runs happen here"]
40
+ N --> N4["Northflank smoke test passes"]
41
+
42
  B0 --> F["Observation + action schema frozen"]
43
+ B3 --> G["Fresh P1 verifier loop proven"]
44
  B2 --> H["Exploit observed -> penalty added"]
45
  B4 --> I0["Deterministic action schema"]
46
  D2 --> I["Human can act coherently in env"]
47
  C3 --> J["Random baseline"]
48
  C3 --> K["Heuristic baseline"]
49
+ G --> L["Official constellaration P1 verifier wired correctly"]
50
+ L --> M["Good / boundary / bad fixture checks pass"]
51
+ N4 --> N3
52
+ N3 --> G
53
  ```
54
 
55
  ## Reverse Timeline
 
60
  S --> R["Repo public and readable"]
61
  S --> T["Training / eval evidence exported"]
62
  S --> H["HF Space live"]
63
+ S --> N1["Northflank compute ready"]
64
 
65
  V --> V1["Recorded clean demo trajectory"]
66
  V --> V2["Scripted 60-second story"]
 
69
  T --> T2["Baseline comparison numbers"]
70
  T --> T3["Colab notebook runs end-to-end"]
71
 
72
+ H --> H1["OpenEnv P1 environment packaged"]
73
  H --> H2["Remote client can reset and step"]
74
  H --> H3["Verifier and reward stable"]
75
  H --> H4["Rules are clear and reproducible"]
76
 
77
  H4 --> P["Environment contract locked first"]
78
+ N1 --> N2["Jupyter with PyTorch up first"]
79
+ N2 --> N3["Persistent storage attached"]
80
+ N3 --> N4["Import + low-fi verifier smoke passes"]
81
+ N4 --> M0
82
  P --> Q["Manual playtest completed first"]
83
+ H3 --> M0["Local verifier loop proven first"]
84
  T2 --> B["Random + heuristic baselines done"]
85
  T3 --> X["Training included only if persuasive"]
86
  V1 --> Y["One stable task only"]
87
  V2 --> Z["Explain reward fix, not just reward gain"]
88
+ M0 --> N["Fresh wiring, not legacy harness port"]
89
  ```
90
 
91
  ## Priority Order
92
 
93
+ 1. Bring up the Northflank H100 workspace with persistent storage.
94
+ 2. Pass the Northflank smoke test.
95
+ 3. Prove the fresh local `P1` verifier loop.
96
+ 4. Freeze the environment contract and mark the initial reward as `V0`.
97
+ 5. Run verifier/fixture checks and then manual-playtest the environment.
98
+ 6. Fix obvious reward/pathology issues.
99
+ 7. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
100
+ 8. Get random and heuristic baselines.
101
+ 9. Use the notebook to show traces and comparisons; include training only if it adds signal.
102
+ 10. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
103
+ 11. Polish the repo only after the artifacts are real.
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  **Hackathon:** OpenEnv Hackathon, March 7-8, 2026
4
  **Track:** Statement 3.1 (World Modeling — Professional Tasks)
5
- **Status:** Judge-aligned rewrite of the main plan
6
 
7
  ## 1. Submission Thesis
8
 
@@ -10,16 +10,30 @@ We are not primarily submitting "a trained model for fusion."
10
 
11
  We are submitting a clear, reproducible training environment for a constrained scientific design task:
12
 
13
- - a junior plasma-scientist-style agent
14
- - a small VMEC budget
15
- - a narrow action space
16
- - real simulator feedback
17
  - explicit constraints
18
  - a reward function that is understandable and iteratively improved
19
 
20
  Training is supporting evidence. The environment is the product.
21
 
22
- ## 2. What Changed From V1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  This version changes the center of gravity:
25
 
@@ -27,6 +41,7 @@ This version changes the center of gravity:
27
  - `reward shaping story > polished final reward formula`
28
  - `manual playtesting > training-first iteration`
29
  - `clarity and reproducibility > broad unsupported transfer claims`
 
30
 
31
  This version also separates:
32
 
@@ -34,7 +49,7 @@ This version also separates:
34
  - what is a working hypothesis
35
  - what must be validated before it becomes part of the final pitch
36
 
37
- ## 3. Judge-Aligned Priorities
38
 
39
  The judging signal now implies four priorities:
40
 
@@ -43,7 +58,7 @@ The judging signal now implies four priorities:
43
  3. A human should be able to act in the environment coherently before we invest heavily in training.
44
  4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
45
 
46
- ## 4. Final Artifacts
47
 
48
  The four visible artifacts remain:
49
 
@@ -52,6 +67,12 @@ The four visible artifacts remain:
52
  3. 1-minute demo video
53
  4. Public repo and README
54
 
 
 
 
 
 
 
55
  But the evidence order is:
56
 
57
  1. environment contract
@@ -62,38 +83,113 @@ But the evidence order is:
62
  6. training or eval notebook evidence
63
  7. demo and repo polish
64
 
65
- ## 5. Non-Negotiables
66
 
67
  - One stable task only.
68
  - No broad cross-science claims unless evidence exists.
69
  - No training-first drift.
70
  - No dependence on reward curves alone.
71
  - No repo/video polish before environment and baselines are real.
 
 
72
 
73
- ## 6. Single Stable Task
74
 
75
  We intentionally narrow the scope to one environment family:
76
 
77
- - fixed-boundary, low-resolution, 2-period quasi-helical stellarator
78
- - one baseline input
79
- - small seed perturbation for episode variety
80
- - budget of 6 VMEC runs per episode
 
81
 
82
  The task is:
83
 
84
- > improve quasi-symmetry under explicit constraints with limited simulation budget
85
 
86
  ### Constraints
87
 
88
- - aspect ratio in `[4.5, 7.0]`
89
- - edge iota in `[0.3, 0.6]`
90
- - volume `> 0.5 m^3`
 
 
91
 
92
  ### Objective
93
 
94
- - minimize quasi-symmetry residual
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
- ## 7. Environment Contract
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  The environment contract must be frozen before meaningful evaluation.
99
 
@@ -101,17 +197,15 @@ The environment contract must be frozen before meaningful evaluation.
101
 
102
  The observation should expose:
103
 
104
- - current quasi-symmetry residual
105
- - best residual so far
106
- - improvement from initial
107
- - aspect ratio
108
- - axis and edge iota
109
- - volume
110
- - magnetic well
111
- - VMEC convergence status
112
  - step number
113
  - budget remaining
114
- - target description
115
  - concise textual summary of the last action outcome
116
 
117
  The observation must be interpretable by a human without additional hidden state.
@@ -126,18 +220,20 @@ The action space stays intentionally small and discrete:
126
 
127
  For `run`, the controllable fields are:
128
 
129
- - operator: one of a small fixed set of coefficients
 
 
 
130
  - direction: increase or decrease
131
  - magnitude: small, medium, large
132
- - restart mode: hot or cold
133
 
134
- This is not trying to expose the full plasma design space. The goal is a legible environment, not maximal realism.
135
 
136
  ### Episode Flow
137
 
138
- 1. Reset from baseline plus optional small seed perturbation.
139
  2. Agent chooses one action.
140
- 3. Simulator or verifier runs.
141
  4. Environment returns diagnostics and reward.
142
  5. Episode ends on:
143
  - `submit`
@@ -154,17 +250,30 @@ At termination, the environment should provide:
154
  - total reward
155
  - short human-readable summary of the trajectory
156
 
157
- ## 8. Reward V0
 
 
 
 
 
 
 
 
 
 
 
 
 
158
 
159
  The reward in this document is not the final reward. It is `Reward V0`.
160
 
161
- The initial scoring idea remains:
162
 
163
- - improvement in quasi-symmetry should help
164
- - constraint violations should hurt
165
- - VMEC non-convergence should hurt
166
  - wasting budget should have some cost
167
- - successful early submission may deserve a small bonus
168
 
169
  ### Reward V0 Design Goals
170
 
@@ -172,33 +281,52 @@ The initial scoring idea remains:
172
  - sensitive to genuine progress
173
  - hostile to obvious degenerate behavior
174
  - simple enough to debug from trajectories
 
175
 
176
  ### Reward V0 Failure Modes To Test
177
 
178
  We should expect at least some of these:
179
 
180
- - the agent spams large perturbations
181
  - the agent oscillates between equivalent moves
182
- - the agent overuses `restore_best`
183
- - the agent never submits
184
  - the agent submits too early
185
- - the agent learns to preserve safety but not improve objective
 
 
 
186
 
187
  The reward is only acceptable after we test for those behaviors.
188
 
189
- ## 9. What Is Hypothesis vs Validated
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  These are still hypotheses until manually or empirically checked:
192
 
193
- - `large` perturbations are risky enough to make restart choice meaningful
194
- - six runs are enough to create non-trivial decision pressure
195
- - the chosen coefficients create a task that is neither trivial nor impossible
196
  - `restore_best` is useful without becoming an exploit
197
  - heuristic should beat random on mean episode reward
 
198
 
199
  These should not be narrated as facts in the final demo until validated.
200
 
201
- ## 10. Manual Playtest Plan
202
 
203
  Before heavy training, we should act as the agent ourselves.
204
 
@@ -209,7 +337,7 @@ Run 5 to 10 episodes manually and log for each step:
209
  - observation seen
210
  - action chosen
211
  - reason for the action
212
- - simulator outcome
213
  - reward returned
214
  - whether the reward matched intuitive quality
215
 
@@ -217,7 +345,7 @@ Run 5 to 10 episodes manually and log for each step:
217
 
218
  - can a human understand what to do from the observation?
219
  - do action labels map to meaningful decisions?
220
- - is six-run budgeting interesting or arbitrary?
221
  - which actions are high leverage?
222
  - do obvious bad actions get punished?
223
  - do obviously good actions get rewarded?
@@ -229,7 +357,7 @@ Run 5 to 10 episodes manually and log for each step:
229
  - one paragraph on what a good episode looks like
230
  - one paragraph on what broke or felt ambiguous
231
 
232
- ## 11. Reward Iteration Story
233
 
234
  The reward iteration story is not a side note. It is likely part of the pitch.
235
 
@@ -242,13 +370,13 @@ We should aim to document at least one concrete sequence:
242
 
243
  Examples of acceptable story structure:
244
 
245
- - "The agent kept making risky large moves, so we increased the non-convergence penalty."
246
- - "The agent kept deferring commitment, so we adjusted terminal incentives."
247
- - "The agent overused restore-best, so we changed the reward/step logic to make stalling unprofitable."
248
 
249
  This is stronger than saying only "reward improved after training."
250
 
251
- ## 12. Evidence Plan
252
 
253
  ### HF Space
254
 
@@ -259,6 +387,22 @@ Must prove:
259
  - one stable episode runs end-to-end
260
  - the remote behavior matches the local contract
261
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
  ### Colab Notebook
263
 
264
  Primary job:
@@ -273,11 +417,18 @@ Secondary job:
273
 
274
  If training is weak but the environment and eval traces are strong, the notebook still ships.
275
 
 
 
 
 
 
 
 
276
  ### Demo Video
277
 
278
  The video should show:
279
 
280
- 1. the task
281
  2. the environment observation and action space
282
  3. one manual or agent trajectory
283
  4. one reward pathology and fix
@@ -289,14 +440,21 @@ Reward curves are optional supporting visuals, not the center of the story.
289
 
290
  The repo should make the environment easy to understand:
291
 
292
- - what the task is
293
  - what the agent sees
294
  - what the agent can do
295
  - how reward works
296
  - how to run one episode
297
  - where the demo evidence lives
 
298
 
299
- ## 13. Success Gates
 
 
 
 
 
 
300
 
301
  ### Gate 1: Environment Contract Locked
302
 
@@ -305,53 +463,74 @@ The repo should make the environment easy to understand:
305
  - action schema frozen
306
  - terminal conditions frozen
307
 
308
- ### Gate 2: Manual Playtest Pass
 
 
 
 
 
 
309
 
310
  - human can act coherently
311
  - at least one trajectory feels sensible
312
  - at least one pathology identified or ruled out
313
 
314
- ### Gate 3: Stable Local Episode
315
 
316
- - local modify -> solve -> observe loop works
317
  - at least one end-to-end episode is stable
318
 
319
- ### Gate 4: Reward V1
320
 
321
  - at least one reward revision completed
322
  - story is documented with before/after behavior
323
 
324
- ### Gate 5: Baselines
325
 
326
  - random baseline complete
327
  - heuristic baseline complete
328
  - heuristic is at least competitive and preferably better than random
329
 
330
- ### Gate 6: Remote Environment
331
 
332
  - HF Space live
333
  - remote client runs one clean episode
334
 
335
- ### Gate 7: Notebook Evidence
336
 
337
  - notebook runs end-to-end
338
  - traces exported
339
  - training evidence included only if it adds signal
340
 
341
- ## 14. Timeline
342
 
343
  ### Phase 0
344
 
345
- Lock the environment contract and validate the minimal toolchain needed to play the game.
 
 
 
346
 
347
  Deliverables:
348
 
349
  - frozen task definition
350
  - frozen action and observation schema
351
- - proof that one VMEC modify -> run -> diagnose loop works
 
352
 
353
  ### Phase 1
354
 
 
 
 
 
 
 
 
 
 
 
 
355
  Manual-playtest the environment.
356
 
357
  Deliverables:
@@ -359,7 +538,7 @@ Deliverables:
359
  - 5 to 10 episode logs
360
  - notes on leverage, ambiguity, and pathologies
361
 
362
- ### Phase 2
363
 
364
  Implement or refine Reward V0 into Reward V1 based on real behavior.
365
 
@@ -369,7 +548,7 @@ Deliverables:
369
  - documented fix
370
  - updated reward logic
371
 
372
- ### Phase 3
373
 
374
  Stabilize one local task and run baselines.
375
 
@@ -379,7 +558,7 @@ Deliverables:
379
  - random baseline
380
  - heuristic baseline
381
 
382
- ### Phase 4
383
 
384
  Deploy HF Space and validate remote parity.
385
 
@@ -388,18 +567,19 @@ Deliverables:
388
  - live environment
389
  - one stable remote episode
390
 
391
- ### Phase 5
392
 
393
  Produce notebook evidence.
394
 
395
  Deliverables:
396
 
397
  - Colab notebook
 
398
  - traces
399
  - baseline comparison
400
  - training outputs only if persuasive
401
 
402
- ### Phase 6
403
 
404
  Record the demo and make the repo readable.
405
 
@@ -409,7 +589,7 @@ Deliverables:
409
  - public README
410
  - linked artifacts
411
 
412
- ## 15. Fallback Rules
413
 
414
  If something goes wrong, the fallback should preserve the environment story.
415
 
@@ -420,11 +600,22 @@ Do not force a training-centric pitch.
420
  Ship:
421
 
422
  - strong environment
 
423
  - manual playtest evidence
424
  - reward iteration story
425
  - baseline traces
426
  - one stable remote demo
427
 
 
 
 
 
 
 
 
 
 
 
428
  ### If reward is unstable
429
 
430
  Reduce ambition:
@@ -439,9 +630,10 @@ Do not broaden scope.
439
 
440
  Instead:
441
 
442
- - simplify the starting configuration
443
  - tighten the action set
444
- - make the task more learnable within six runs
 
445
 
446
  ### If the task is too easy
447
 
@@ -453,13 +645,13 @@ Instead:
453
  - adjust magnitudes
454
  - adjust reward to discourage trivial submission
455
 
456
- ## 16. Demo Story
457
 
458
  The recommended demo structure is:
459
 
460
  ### Part 1: Problem
461
 
462
- "The agent gets a small VMEC budget to improve a stellarator design while staying within constraints."
463
 
464
  ### Part 2: Environment
465
 
@@ -475,14 +667,16 @@ The recommended demo structure is:
475
 
476
  ### Part 5: Why It Matters
477
 
478
- "This is a clear, reproducible simulation environment for budget-constrained scientific decision-making."
479
 
480
  That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
481
 
482
- ## 17. Immediate Next Actions
483
 
484
- 1. Freeze the environment contract in code and docs.
485
- 2. Run manual playtests before heavy training work.
486
- 3. Mark the current reward as `V0`.
487
- 4. Log the first real pathology and reward revision.
488
- 5. Do not let notebook or video work outrun the environment evidence.
 
 
 
2
 
3
  **Hackathon:** OpenEnv Hackathon, March 7-8, 2026
4
  **Track:** Statement 3.1 (World Modeling — Professional Tasks)
5
+ **Status:** Judge-aligned plan with `P1` locked
6
 
7
  ## 1. Submission Thesis
8
 
 
10
 
11
  We are submitting a clear, reproducible training environment for a constrained scientific design task:
12
 
13
+ - official `P1` benchmark semantics
14
+ - a narrow, human-playable action space
15
+ - real verifier feedback from `constellaration`
 
16
  - explicit constraints
17
  - a reward function that is understandable and iteratively improved
18
 
19
  Training is supporting evidence. The environment is the product.
20
 
21
+ ## 2. Locked Decisions
22
+
23
+ These decisions are now fixed unless a hard blocker appears:
24
+
25
+ - benchmark task: `P1`
26
+ - submission framing: `Statement 3.1`
27
+ - verifier of record: `constellaration.problems.GeometricalProblem`
28
+ - implementation strategy: fresh wiring in this repo
29
+ - reuse policy: do not port the old `ai-sci-feasible-designs` harness; only reuse selected JSON artifacts or boundaries when useful
30
+
31
+ Execution rule after lock:
32
+
33
+ - do not reopen these decisions in new planning passes unless a real blocker appears
34
+ - once a decision is locked, translate it into code, fixtures, baselines, or deployment work
35
+
36
+ ## 3. What Changed From V1
37
 
38
  This version changes the center of gravity:
39
 
 
41
  - `reward shaping story > polished final reward formula`
42
  - `manual playtesting > training-first iteration`
43
  - `clarity and reproducibility > broad unsupported transfer claims`
44
+ - `fresh, minimal environment wiring > transplanting legacy orchestration`
45
 
46
  This version also separates:
47
 
 
49
  - what is a working hypothesis
50
  - what must be validated before it becomes part of the final pitch
51
 
52
+ ## 4. Judge-Aligned Priorities
53
 
54
  The judging signal now implies four priorities:
55
 
 
58
  3. A human should be able to act in the environment coherently before we invest heavily in training.
59
  4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
60
 
61
+ ## 5. Final Artifacts
62
 
63
  The four visible artifacts remain:
64
 
 
67
  3. 1-minute demo video
68
  4. Public repo and README
69
 
70
+ The primary compute workspace should be Northflank:
71
+
72
+ - Northflank Jupyter Notebook with PyTorch on the team H100 for development, verifier integration, baselines, and training/debugging
73
+ - HF Space as the hosted environment surface
74
+ - Colab as the minimal required public notebook artifact
75
+
76
  But the evidence order is:
77
 
78
  1. environment contract
 
83
  6. training or eval notebook evidence
84
  7. demo and repo polish
85
 
86
+ ## 6. Non-Negotiables
87
 
88
  - One stable task only.
89
  - No broad cross-science claims unless evidence exists.
90
  - No training-first drift.
91
  - No dependence on reward curves alone.
92
  - No repo/video polish before environment and baselines are real.
93
+ - No harness transplant from `ai-sci-feasible-designs`.
94
+ - No new strategy churn after `P1` + rotating-ellipse is locked unless a blocker forces it.
95
 
96
+ ## 7. Single Stable Task
97
 
98
  We intentionally narrow the scope to one environment family:
99
 
100
+ - `P1` geometrical benchmark
101
+ - rotating-ellipse, low-dimensional design space
102
+ - official `constellaration` verifier
103
+ - low-fidelity evaluation for ordinary interaction
104
+ - optional high-fidelity verification for final checks or `submit`
105
 
106
  The task is:
107
 
108
+ > improve a stellarator boundary on the `P1` benchmark under explicit constraints and limited evaluation budget
109
 
110
  ### Constraints
111
 
112
+ Use the official `P1` constraints:
113
+
114
+ - aspect ratio `<= 4.0`
115
+ - average triangularity `<= -0.5`
116
+ - edge rotational transform over field periods `>= 0.3`
117
 
118
  ### Objective
119
 
120
+ Use the official `P1` objective:
121
+
122
+ - minimize `max_elongation`
123
+
124
+ ### Why This Task
125
+
126
+ - it is official rather than invented
127
+ - it is cheaper than `P2` and `P3` because `P1` skips QI
128
+ - it maps cleanly to a tool-using scientific workflow
129
+ - it is easier to explain than a broader fusion-design claim
130
+
131
+ ## 8. Fresh Wiring Rule
132
+
133
+ This repo should implement a minimal environment directly for the hackathon.
134
+
135
+ That means:
136
+
137
+ - define our own environment contract
138
+ - define our own reward logic on top of the official verifier
139
+ - define our own baselines
140
+ - define our own HF Space interface
141
+
142
+ That does not mean:
143
+
144
+ - importing the old governor
145
+ - importing the old planner
146
+ - importing the old experiment harness
147
+ - recreating the old agent-as-coder stack
148
+
149
+ Allowed reuse:
150
+
151
+ - official `constellaration` library behavior
152
+ - selected JSON artifacts or seed boundaries
153
+ - problem notes as human reference
154
+
155
+ Implementation handoff:
156
 
157
+ - the remaining work is now wiring, smoke validation, manual playtesting, baselines, and deployment
158
+ - do not treat supporting decision notes as a new planning backlog
159
+
160
+ ## 8.1 Compute Surfaces
161
+
162
+ Use each surface for one clear purpose:
163
+
164
+ - Northflank Jupyter Notebook with PyTorch:
165
+ - main development and compute workspace
166
+ - verifier sanity checks
167
+ - manual playtesting
168
+ - baseline runs
169
+ - optional RL fine-tuning
170
+ - HF Space:
171
+ - public OpenEnv environment surface
172
+ - remote `reset` and `step` endpoint for the final demo path
173
+ - Colab:
174
+ - minimal reproducible evaluation or training notebook required by the hackathon
175
+
176
+ Northflank-specific constraint:
177
+
178
+ - containers are ephemeral, so persistent storage must be attached before relying on saved models, caches, or fixture downloads
179
+
180
+ Deployment path:
181
+
182
+ - develop and verify in Northflank or local
183
+ - commit and push changes to the public GitHub repo
184
+ - have HF Space build and serve from that repo path
185
+ - do not rely on manual copy-paste deployment as the default path
186
+
187
+ Auth stance:
188
+
189
+ - prefer a public HF Space for the hackathon to keep the Colab artifact simple
190
+ - if the Space must be private, the notebook must explicitly document token-based access
191
+
192
+ ## 9. Environment Contract
193
 
194
  The environment contract must be frozen before meaningful evaluation.
195
 
 
197
 
198
  The observation should expose:
199
 
200
+ - current `max_elongation`
201
+ - current aspect ratio
202
+ - current average triangularity
203
+ - current edge rotational transform over field periods
204
+ - current feasibility score or normalized violation summary
205
+ - best-so-far feasible score
206
+ - best-so-far least-violating design summary
 
207
  - step number
208
  - budget remaining
 
209
  - concise textual summary of the last action outcome
210
 
211
  The observation must be interpretable by a human without additional hidden state.
 
220
 
221
  For `run`, the controllable fields are:
222
 
223
+ - parameter: one of
224
+ - `aspect_ratio`
225
+ - `elongation`
226
+ - `rotational_transform`
227
  - direction: increase or decrease
228
  - magnitude: small, medium, large
 
229
 
230
+ This is not trying to expose the full Fourier-boundary space. The goal is a legible environment, not maximal realism.
231
 
232
  ### Episode Flow
233
 
234
+ 1. Reset from one rotating-ellipse initial state or a small frozen set of initial states.
235
  2. Agent chooses one action.
236
+ 3. Low-fidelity verifier runs for normal interaction.
237
  4. Environment returns diagnostics and reward.
238
  5. Episode ends on:
239
  - `submit`
 
250
  - total reward
251
  - short human-readable summary of the trajectory
252
 
253
+ ## 10. Verifier Contract
254
+
255
+ The verifier of record is `constellaration.problems.GeometricalProblem`.
256
+
257
+ The environment must preserve:
258
+
259
+ - objective direction
260
+ - constraint direction
261
+ - feasibility semantics
262
+ - score ordering
263
+
264
+ The environment may add reward shaping, but it must not redefine what `P1` means.
265
+
266
+ ## 11. Reward V0
267
 
268
  The reward in this document is not the final reward. It is `Reward V0`.
269
 
270
+ The initial scoring idea should be feasibility-first:
271
 
272
+ - reducing normalized constraint violation should help
273
+ - becoming feasible should give a meaningful bonus
274
+ - once feasible, lower `max_elongation` should help
275
  - wasting budget should have some cost
276
+ - successful submission may deserve a small bonus
277
 
278
  ### Reward V0 Design Goals
279
 
 
281
  - sensitive to genuine progress
282
  - hostile to obvious degenerate behavior
283
  - simple enough to debug from trajectories
284
+ - aligned with official `P1` semantics
285
 
286
  ### Reward V0 Failure Modes To Test
287
 
288
  We should expect at least some of these:
289
 
 
290
  - the agent oscillates between equivalent moves
 
 
291
  - the agent submits too early
292
+ - the agent never submits
293
+ - the agent learns to improve objective before it learns feasibility
294
+ - the agent camps near one constraint while breaking another
295
+ - the agent overuses `restore_best`
296
 
297
  The reward is only acceptable after we test for those behaviors.
298
 
299
+ ## 12. Verifier and Reward Fixture Checks
300
+
301
+ Before training, we should validate environment wiring with a few fixed fixtures.
302
+
303
+ Use:
304
+
305
+ - one known-good design or near-winning design
306
+ - a few near-boundary designs
307
+ - a few clearly infeasible designs
308
+
309
+ Purpose:
310
+
311
+ - verify the verifier is wired correctly
312
+ - verify the reward ordering makes sense
313
+ - verify feasible designs outrank clearly infeasible ones
314
+
315
+ This is calibration, not training.
316
+
317
+ ## 13. What Is Hypothesis vs Validated
318
 
319
  These are still hypotheses until manually or empirically checked:
320
 
321
+ - six steps are enough to create non-trivial decision pressure
322
+ - the rotating-ellipse action space is expressive enough for a meaningful `P1` task
 
323
  - `restore_best` is useful without becoming an exploit
324
  - heuristic should beat random on mean episode reward
325
+ - low-fidelity interaction is predictive enough for useful policy learning
326
 
327
  These should not be narrated as facts in the final demo until validated.
328
 
329
+ ## 14. Manual Playtest Plan
330
 
331
  Before heavy training, we should act as the agent ourselves.
332
 
 
337
  - observation seen
338
  - action chosen
339
  - reason for the action
340
+ - verifier outcome
341
  - reward returned
342
  - whether the reward matched intuitive quality
343
 
 
345
 
346
  - can a human understand what to do from the observation?
347
  - do action labels map to meaningful decisions?
348
+ - is the step budget interesting or arbitrary?
349
  - which actions are high leverage?
350
  - do obvious bad actions get punished?
351
  - do obviously good actions get rewarded?
 
357
  - one paragraph on what a good episode looks like
358
  - one paragraph on what broke or felt ambiguous
359
 
360
+ ## 15. Reward Iteration Story
361
 
362
  The reward iteration story is not a side note. It is likely part of the pitch.
363
 
 
370
 
371
  Examples of acceptable story structure:
372
 
373
+ - "The agent improved elongation while staying deeply infeasible, so we increased feasibility-first shaping."
374
+ - "The agent hovered near one constraint and ignored another, so we changed the violation shaping."
375
+ - "The agent overused restore-best, so we changed the reward or step logic to make stalling unprofitable."
376
 
377
  This is stronger than saying only "reward improved after training."
378
 
379
+ ## 16. Evidence Plan
380
 
381
  ### HF Space
382
 
 
387
  - one stable episode runs end-to-end
388
  - the remote behavior matches the local contract
389
 
390
+ HF Space is the serving surface, not the main heavy-compute workspace.
391
+
392
+ ### Northflank Notebook
393
+
394
+ Must prove:
395
+
396
+ - Jupyter Notebook with PyTorch is live on the team H100
397
+ - persistent storage is attached
398
+ - verifier and baseline work runs there without local-machine dependency
399
+ - environment/debug/training work can proceed there even if local runtime is inconvenient
400
+ - one smoke check passes:
401
+ - import `constellaration`
402
+ - generate one rotating-ellipse boundary
403
+ - run one low-fidelity verifier call
404
+ - write a result artifact to persistent storage
405
+
406
  ### Colab Notebook
407
 
408
  Primary job:
 
417
 
418
  If training is weak but the environment and eval traces are strong, the notebook still ships.
419
 
420
+ Colab is a required artifact, but it is not the preferred main compute surface.
421
+
422
+ Connectivity rule:
423
+
424
+ - if HF Space is public, the notebook uses direct HTTP calls with no extra auth flow
425
+ - if HF Space is private, the notebook must state the required token path and setup explicitly
426
+
427
  ### Demo Video
428
 
429
  The video should show:
430
 
431
+ 1. the `P1` task
432
  2. the environment observation and action space
433
  3. one manual or agent trajectory
434
  4. one reward pathology and fix
 
440
 
441
  The repo should make the environment easy to understand:
442
 
443
+ - what `P1` is
444
  - what the agent sees
445
  - what the agent can do
446
  - how reward works
447
  - how to run one episode
448
  - where the demo evidence lives
449
+ - why the repo is freshly wired rather than copied from the old project
450
 
451
+ ## 17. Success Gates
452
+
453
+ ### Prerequisite: Northflank Compute Ready
454
+
455
+ - notebook starts on the team H100
456
+ - persistent storage mount is usable
457
+ - smoke test artifact is written successfully
458
 
459
  ### Gate 1: Environment Contract Locked
460
 
 
463
  - action schema frozen
464
  - terminal conditions frozen
465
 
466
+ ### Gate 2: Verifier Wiring Pass
467
+
468
+ - official `P1` verifier returns expected outputs
469
+ - fixture ordering is sensible
470
+ - objective direction is correct
471
+
472
+ ### Gate 3: Manual Playtest Pass
473
 
474
  - human can act coherently
475
  - at least one trajectory feels sensible
476
  - at least one pathology identified or ruled out
477
 
478
+ ### Gate 4: Stable Local Episode
479
 
480
+ - local modify -> verify -> observe loop works
481
  - at least one end-to-end episode is stable
482
 
483
+ ### Gate 5: Reward V1
484
 
485
  - at least one reward revision completed
486
  - story is documented with before/after behavior
487
 
488
+ ### Gate 6: Baselines
489
 
490
  - random baseline complete
491
  - heuristic baseline complete
492
  - heuristic is at least competitive and preferably better than random
493
 
494
+ ### Gate 7: Remote Environment
495
 
496
  - HF Space live
497
  - remote client runs one clean episode
498
 
499
+ ### Gate 8: Notebook Evidence
500
 
501
  - notebook runs end-to-end
502
  - traces exported
503
  - training evidence included only if it adds signal
504
 
505
+ ## 18. Timeline
506
 
507
  ### Phase 0
508
 
509
+ Run two parallel tracks:
510
+
511
+ - Track A: Northflank compute setup and smoke validation
512
+ - Track B: lock the `P1` environment contract
513
 
514
  Deliverables:
515
 
516
  - frozen task definition
517
  - frozen action and observation schema
518
+ - proof that one local `P1` loop works
519
+ - Northflank smoke test pass
520
 
521
  ### Phase 1
522
 
523
+ Wire the official verifier and run fixture checks.
524
+
525
+ Deliverables:
526
+
527
+ - one good fixture
528
+ - near-boundary fixtures
529
+ - bad fixtures
530
+ - confidence that reward/verifier ordering is sane
531
+
532
+ ### Phase 2
533
+
534
  Manual-playtest the environment.
535
 
536
  Deliverables:
 
538
  - 5 to 10 episode logs
539
  - notes on leverage, ambiguity, and pathologies
540
 
541
+ ### Phase 3
542
 
543
  Implement or refine Reward V0 into Reward V1 based on real behavior.
544
 
 
548
  - documented fix
549
  - updated reward logic
550
 
551
+ ### Phase 4
552
 
553
  Stabilize one local task and run baselines.
554
 
 
558
  - random baseline
559
  - heuristic baseline
560
 
561
+ ### Phase 5
562
 
563
  Deploy HF Space and validate remote parity.
564
 
 
567
  - live environment
568
  - one stable remote episode
569
 
570
+ ### Phase 6
571
 
572
  Produce notebook evidence.
573
 
574
  Deliverables:
575
 
576
  - Colab notebook
577
+ - Northflank traces or run exports
578
  - traces
579
  - baseline comparison
580
  - training outputs only if persuasive
581
 
582
+ ### Phase 7
583
 
584
  Record the demo and make the repo readable.
585
 
 
589
  - public README
590
  - linked artifacts
591
 
592
+ ## 19. Fallback Rules
593
 
594
  If something goes wrong, the fallback should preserve the environment story.
595
 
 
600
  Ship:
601
 
602
  - strong environment
603
+ - verifier and fixture evidence
604
  - manual playtest evidence
605
  - reward iteration story
606
  - baseline traces
607
  - one stable remote demo
608
 
609
+ ### If Northflank is delayed or unavailable
610
+
611
+ Do not block environment design on it.
612
+
613
+ Fallback:
614
+
615
+ - continue contract definition, reward design, and basic wiring locally
616
+ - use local CPU or Colab for limited verifier/debug work
617
+ - keep Northflank as the preferred compute target, but do not stall the whole plan waiting for it
618
+
619
  ### If reward is unstable
620
 
621
  Reduce ambition:
 
630
 
631
  Instead:
632
 
633
+ - simplify the initial states
634
  - tighten the action set
635
+ - reduce magnitude choices
636
+ - keep the environment more learnable within the fixed budget
637
 
638
  ### If the task is too easy
639
 
 
645
  - adjust magnitudes
646
  - adjust reward to discourage trivial submission
647
 
648
+ ## 20. Demo Story
649
 
650
  The recommended demo structure is:
651
 
652
  ### Part 1: Problem
653
 
654
+ "The agent interacts with the official `P1` stellarator-design benchmark and must improve a design under strict geometric constraints."
655
 
656
  ### Part 2: Environment
657
 
 
667
 
668
  ### Part 5: Why It Matters
669
 
670
+ "This is a clear, reproducible scientific workflow environment built around a real verifier, not a shortcut task."
671
 
672
  That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
673
 
674
+ ## 21. Immediate Next Actions
675
 
676
+ 1. Freeze the `P1` environment contract in code and docs.
677
+ 2. Implement fresh verifier wiring in this repo.
678
+ 3. Run fixture checks before heavy training work.
679
+ 4. Run manual playtests before heavy training work.
680
+ 5. Mark the current reward as `V0`.
681
+ 6. Log the first real pathology and reward revision.
682
+ 7. Do not let notebook or video work outrun the environment evidence.
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED
@@ -1,6 +1,6 @@
1
  # Fusion Design Lab: Next 12 Hours Checklist
2
 
3
- This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with environment clarity and reproducibility driving the sequence.
4
 
5
  ## Core Rule
6
 
@@ -11,14 +11,36 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
11
  Carry these rules through the whole checklist:
12
 
13
  - Freeze the environment contract before heavy iteration.
 
14
  - Treat the current reward as `Reward V0`, not final reward.
15
  - Distinguish validated facts from working hypotheses.
16
  - Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
17
  - If training is weak, ship the environment story anyway.
 
 
18
 
19
- ## Hour 0-2: Lock the Environment Contract
20
 
21
- 1. Write the exact environment spec.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  2. Freeze one task only.
23
  3. Define:
24
  - observation schema
@@ -28,13 +50,13 @@ Carry these rules through the whole checklist:
28
  - reward V0 terms
29
  - initial penalties
30
  4. Update the main diagram so it emphasizes:
31
- - environment
32
- - verifier
33
  - reward shaping
34
  - manual playtesting
35
  5. Mark open assumptions explicitly:
36
- - risky action magnitudes
37
- - whether 6 runs is enough
38
  - whether `restore_best` is useful without becoming an exploit
39
 
40
  Exit condition: a human can read the spec and understand how to act in the environment.
@@ -44,18 +66,30 @@ Artifacts:
44
  - revised mermaid diagram
45
  - short hypothesis list
46
 
47
- ## Hour 2-4: Manual Playtest and Fix Reward Pathologies
 
 
 
 
48
 
49
- 1. Manually play 5 to 10 episodes.
50
- 2. Log for each step:
 
 
 
 
 
 
 
 
51
  - observation
52
  - chosen action
53
  - expected effect
54
  - returned reward
55
  - confusion or exploit if observed
56
- 3. Identify at least one bad incentive or exploit.
57
- 4. Patch reward or penalty logic immediately.
58
- 5. Write the reward shaping story:
59
  - initial reward V0
60
  - bad behavior
61
  - refinement to reward V1
@@ -64,13 +98,14 @@ Artifacts:
64
  Exit condition: you can explain why the environment now rewards the intended behavior.
65
 
66
  Artifacts:
 
67
  - manual playtest log
68
  - reward shaping note
69
  - reward V1 delta note
70
 
71
  ## Hour 4-6: Stabilize the Local Task
72
 
73
- 1. Prove the local physics or verifier loop.
74
  2. Run one stable end-to-end task repeatedly.
75
  3. Confirm the action schema is deterministic enough for reproducible episodes.
76
  4. Save one clean local trajectory.
@@ -84,10 +119,17 @@ Artifacts:
84
 
85
  ## Hour 6-8: Make the HF Space Real
86
 
87
- 1. Package the OpenEnv environment for remote use.
88
- 2. Verify remote `reset` and `step`.
89
- 3. Run one clean remote episode end-to-end.
90
- 4. Confirm the remote environment preserves the same task contract as local.
 
 
 
 
 
 
 
91
 
92
  Exit condition: the environment is runnable in the actual submission surface, not only locally.
93
 
@@ -99,7 +141,7 @@ Artifacts:
99
 
100
  1. Implement the random baseline.
101
  2. Implement the heuristic baseline.
102
- 3. Run short comparisons on the same stable task.
103
  4. Save:
104
  - comparison numbers
105
  - behavior traces
@@ -119,20 +161,22 @@ Artifacts:
119
  - multi-turn episodes
120
  - behavior traces
121
  - reward or behavior comparison outputs
122
- 3. Draft the 60-second demo script.
123
- 4. Record the demo around:
124
- - what the environment is
 
125
  - how reward was refined
126
  - what manual playtesting revealed
127
  - one stable trajectory
128
  - baseline comparison
129
- 5. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
130
- 6. Make the repo public-facing and readable only after the artifacts are real.
131
 
132
  Exit condition: all four visible artifacts exist in usable form.
133
 
134
  Artifacts:
135
  - Colab training or eval script
 
136
  - demo script
137
  - draft or final video
138
  - updated repo README
@@ -141,21 +185,26 @@ Artifacts:
141
  ## Artifact Order
142
 
143
  1. Environment spec
144
- 2. Manual playtest log
145
- 3. Reward revision note
146
- 4. Stable task run
147
- 5. Random baseline
148
- 6. Heuristic baseline
149
- 7. Colab training or eval evidence
150
- 8. Demo recording
151
- 9. Repo polish
 
 
152
 
153
  ## Non-Negotiables
154
 
155
  - Do not widen scope beyond one stable task.
 
156
  - Do not optimize training before manual playtesting.
157
  - Do not rely on reward curves alone; keep trajectory evidence.
158
  - Do not narrate hypotheses as facts before they are checked.
159
  - Do not polish the repo or video before the environment and baselines are real.
160
  - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
161
  - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
 
 
 
1
  # Fusion Design Lab: Next 12 Hours Checklist
2
 
3
+ This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with `P1`, fresh wiring, and environment clarity driving the sequence.
4
 
5
  ## Core Rule
6
 
 
11
  Carry these rules through the whole checklist:
12
 
13
  - Freeze the environment contract before heavy iteration.
14
+ - Keep the repo freshly wired; do not port the old harness.
15
  - Treat the current reward as `Reward V0`, not final reward.
16
  - Distinguish validated facts from working hypotheses.
17
  - Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
18
  - If training is weak, ship the environment story anyway.
19
+ - Use Northflank as the main compute workspace; keep HF Space and Colab as the submission surfaces.
20
+ - Do not open another strategy loop unless a real blocker appears.
21
 
22
+ ## Hour 0-2: Parallelize Compute Bring-Up and Contract Lock
23
 
24
+ ### Track A: Northflank Compute
25
+
26
+ 1. Bring up the Northflank Jupyter Notebook with PyTorch on the team H100.
27
+ 2. Attach persistent storage before relying on saved models, caches, or fixture downloads.
28
+ 3. Pass a concrete smoke test:
29
+ - import `constellaration`
30
+ - generate one rotating-ellipse boundary
31
+ - run one low-fidelity verifier call
32
+ - write one artifact to persistent storage
33
+
34
+ Exit condition: the notebook is not just open; the verifier path works and persistent storage is usable.
35
+
36
+ Artifacts:
37
+ - Northflank notebook live
38
+ - smoke test note
39
+ - one persisted smoke artifact
40
+
41
+ ### Track B: Environment Contract
42
+
43
+ 1. Write the exact `P1` environment spec.
44
  2. Freeze one task only.
45
  3. Define:
46
  - observation schema
 
50
  - reward V0 terms
51
  - initial penalties
52
  4. Update the main diagram so it emphasizes:
53
+ - `P1`
54
+ - official verifier
55
  - reward shaping
56
  - manual playtesting
57
  5. Mark open assumptions explicitly:
58
+ - whether the rotating-ellipse action set is expressive enough
59
+ - whether the fixed step budget is enough
60
  - whether `restore_best` is useful without becoming an exploit
61
 
62
  Exit condition: a human can read the spec and understand how to act in the environment.
 
66
  - revised mermaid diagram
67
  - short hypothesis list
68
 
69
+ Transition rule:
70
+
71
+ - once Track B exits, stop rewriting the strategy and move straight into wiring and verifier checks
72
+
73
+ ## Hour 2-4: Verify Wiring, Then Manual Playtest
74
 
75
+ 1. Run fixture checks:
76
+ - known-good or near-winning design
77
+ - near-boundary designs
78
+ - clearly bad designs
79
+ 2. Confirm:
80
+ - verifier outputs are sane
81
+ - reward ordering is sane
82
+ - objective direction is correct
83
+ 3. Manually play 5 to 10 episodes.
84
+ 4. Log for each step:
85
  - observation
86
  - chosen action
87
  - expected effect
88
  - returned reward
89
  - confusion or exploit if observed
90
+ 5. Identify at least one bad incentive or exploit.
91
+ 6. Patch reward or penalty logic immediately.
92
+ 7. Write the reward shaping story:
93
  - initial reward V0
94
  - bad behavior
95
  - refinement to reward V1
 
98
  Exit condition: you can explain why the environment now rewards the intended behavior.
99
 
100
  Artifacts:
101
+ - fixture check note
102
  - manual playtest log
103
  - reward shaping note
104
  - reward V1 delta note
105
 
106
  ## Hour 4-6: Stabilize the Local Task
107
 
108
+ 1. Prove the fresh local `P1` verifier loop.
109
  2. Run one stable end-to-end task repeatedly.
110
  3. Confirm the action schema is deterministic enough for reproducible episodes.
111
  4. Save one clean local trajectory.
 
119
 
120
  ## Hour 6-8: Make the HF Space Real
121
 
122
+ 1. Package the OpenEnv `P1` environment for remote use.
123
+ 2. Use the explicit deployment path:
124
+ - commit changes in this repo
125
+ - push to GitHub
126
+ - let HF Space build from the repo
127
+ 3. Decide and document the access mode:
128
+ - preferred: public HF Space for the hackathon
129
+ - if private: token-based notebook access documented
130
+ 4. Verify remote `reset` and `step`.
131
+ 5. Run one clean remote episode end-to-end.
132
+ 6. Confirm the remote environment preserves the same task contract as local.
133
 
134
  Exit condition: the environment is runnable in the actual submission surface, not only locally.
135
 
 
141
 
142
  1. Implement the random baseline.
143
  2. Implement the heuristic baseline.
144
+ 3. Run short comparisons on the same stable `P1` task.
145
  4. Save:
146
  - comparison numbers
147
  - behavior traces
 
161
  - multi-turn episodes
162
  - behavior traces
163
  - reward or behavior comparison outputs
164
+ 3. Keep heavy verifier and training work on Northflank; use Colab as the thin public artifact.
165
+ 4. Draft the 60-second demo script.
166
+ 5. Record the demo around:
167
+ - what `P1` is
168
  - how reward was refined
169
  - what manual playtesting revealed
170
  - one stable trajectory
171
  - baseline comparison
172
+ 6. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
173
+ 7. Make the repo public-facing and readable only after the artifacts are real.
174
 
175
  Exit condition: all four visible artifacts exist in usable form.
176
 
177
  Artifacts:
178
  - Colab training or eval script
179
+ - Northflank run notes or exported traces
180
  - demo script
181
  - draft or final video
182
  - updated repo README
 
185
  ## Artifact Order
186
 
187
  1. Environment spec
188
+ 2. Fixture check note
189
+ 3. Manual playtest log
190
+ 4. Reward revision note
191
+ 5. Stable task run
192
+ 6. Random baseline
193
+ 7. Heuristic baseline
194
+ 8. Northflank traces or training evidence
195
+ 9. Colab training or eval evidence
196
+ 10. Demo recording
197
+ 11. Repo polish
198
 
199
  ## Non-Negotiables
200
 
201
  - Do not widen scope beyond one stable task.
202
+ - Do not port the old `ai-sci-feasible-designs` harness into this repo.
203
  - Do not optimize training before manual playtesting.
204
  - Do not rely on reward curves alone; keep trajectory evidence.
205
  - Do not narrate hypotheses as facts before they are checked.
206
  - Do not polish the repo or video before the environment and baselines are real.
207
  - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
208
  - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
209
+ - Do not rely on Northflank container-local state without persistent storage.
210
+ - Do not block contract design work on Northflank provisioning friction.
docs/PIVOT_P1_ROTATING_ELLIPSE.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pivot: P1 Rotating-Ellipse Environment
2
+
3
+ **Date:** 2026-03-07
4
+ **Status:** Supporting decision record, superseded as planning SSOT by `FUSION_DESIGN_LAB_PLAN_V2.md`
5
+ **Supersedes:** Synthetic physics model in current `server/physics.py`
6
+
7
+ Use this file as rationale for the pivot, not as a fresh planning queue. Once the pivot is accepted, implementation should follow the SSOT plan docs.
8
+
9
+ ## Decision
10
+
11
+ Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
12
+
13
+ This borrows the strongest low-dimensional entry point from the proven winning approach documented in `raw-session.md`, not the full approach.
14
+
15
+ ## What Was Validated
16
+
17
+ | Claim | Status | Source |
18
+ |---|---|---|
19
+ | P1 is the cleanest benchmark task | Verified | `problems.py:113` — minimize max_elongation, 3 constraints, no QI |
20
+ | P1 skips QI | Verified | `problems.py:145` — `_does_it_require_qi = False` |
21
+ | Low-fidelity eval is fast enough | Measured | 0.63s per eval on local machine; postmortem says ~1s/eval |
22
+ | High-fidelity eval is expensive | Measured | 24s per eval; only viable for final validation |
23
+ | Rotating-ellipse can find P1-feasible designs | Verified | `raw-session.md`: sweeps found 3 feasible designs in ~20 min |
24
+ | vmecpp installs from wheels | Verified | `uv pip install vmecpp==0.4.7` resolves cleanly, no compilation |
25
+ | constellaration Dockerfile is not bloated | Verified | `python:3.10-slim` + `pip install constellaration` |
26
+ | Current seed logic is too loose for P1 | Verified | `seeds.py:42`: triangularity override 0.05 vs constraint -0.5 |
27
+ | Full harness should not be ported | Verified | Postmortem: prescriptive harness produced 0 feasible candidates |
28
+
29
+ ## What Is Hypothesis (Not Yet Validated)
30
+
31
+ 1. **6 actions is enough** to reach or improve P1 feasibility from a rotating-ellipse starting point. Must validate by manual playtest immediately.
32
+ 2. **Discretized rotating-ellipse perturbations** create non-trivial decision pressure (not too easy, not impossible).
33
+ 3. **Low-fidelity metrics** are close enough to high-fidelity P1 scoring that low-fi reward signal is meaningful.
34
+ 4. **The Docker image** builds and deploys on HF Spaces within reasonable time/size limits.
35
+
36
+ ## Environment Design
37
+
38
+ ### Single Task
39
+
40
+ Improve a stellarator boundary's P1 score using the rotating-ellipse parameterization under the official ConStellaration P1 constraints.
41
+
42
+ ### P1 Constraints (from `GeometricalProblem`)
43
+
44
+ - aspect_ratio <= 4.0
45
+ - average_triangularity <= -0.5
46
+ - edge_rotational_transform / n_field_periods >= 0.3
47
+
48
+ ### P1 Objective
49
+
50
+ Minimize `max_elongation`. Score = `1 - clip((max_elongation - 1) / 9, 0, 1)`.
51
+
52
+ Feasibility tolerance: normalized constraint violations <= 1% (0.01).
53
+
54
+ ### Parameter Space
55
+
56
+ The rotating-ellipse generator takes 3 continuous parameters + 1 discrete:
57
+
58
+ | Parameter | Role | Typical range |
59
+ |---|---|---|
60
+ | `aspect_ratio` | Width-to-height ratio of the boundary | 2.0 - 8.0 |
61
+ | `elongation` | Vertical stretching of cross-section | 1.0 - 5.0 |
62
+ | `rotational_transform` | Magnetic field line winding | 0.1 - 1.0 |
63
+ | `n_field_periods` | Fixed at 3 (not an action) | 3 |
64
+
65
+ These map to `constellaration.initial_guess.generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` which returns a `SurfaceRZFourier` boundary in ~4ms.
66
+
67
+ ### Action Space
68
+
69
+ Discrete perturbations on the 3 rotating-ellipse parameters:
70
+
71
+ ```
72
+ intent: "run" | "submit" | "restore_best"
73
+ operator: "aspect_ratio" | "elongation" | "rotational_transform"
74
+ direction: "increase" | "decrease"
75
+ magnitude: "small" | "medium" | "large"
76
+ ```
77
+
78
+ Magnitude deltas (to be tuned by playtest):
79
+
80
+ | Parameter | small | medium | large |
81
+ |---|---|---|---|
82
+ | aspect_ratio | 0.1 | 0.3 | 0.8 |
83
+ | elongation | 0.1 | 0.3 | 0.8 |
84
+ | rotational_transform | 0.02 | 0.05 | 0.15 |
85
+
86
+ ### Episode Flow
87
+
88
+ 1. Reset: generate initial boundary from baseline rotating-ellipse parameters (+ optional seed perturbation). Run low-fi forward_model. Return initial observation.
89
+ 2. Agent chooses action.
90
+ 3. If `run`: modify parameter, regenerate boundary, run low-fi forward_model (~0.6s). Return diagnostics + reward.
91
+ 4. If `restore_best`: revert to best-known parameters. No VMEC cost, but costs a budget step.
92
+ 5. If `submit`: end episode. Optionally run high-fi for final score.
93
+ 6. Episode ends on `submit` or budget exhaustion.
94
+
95
+ ### Budget
96
+
97
+ 6 evaluations per episode. All non-submit actions cost 1 budget.
98
+
99
+ ### Observation
100
+
101
+ ```
102
+ diagnostics_text: str # human-readable summary
103
+ max_elongation: float # P1 objective (minimize)
104
+ aspect_ratio: float # constraint: <= 4.0
105
+ average_triangularity: float # constraint: <= -0.5
106
+ edge_iota_over_nfp: float # constraint: >= 0.3
107
+ p1_score: float # official P1 score (0 if infeasible)
108
+ p1_feasibility: float # max normalized constraint violation
109
+ constraints_satisfied: bool # feasibility <= 0.01
110
+ vacuum_well: float # stability indicator
111
+ step_number: int
112
+ budget_remaining: int
113
+ best_score: float
114
+ target_spec: str
115
+ ```
116
+
117
+ ### Reward V0
118
+
119
+ Feasibility-first, then objective improvement:
120
+
121
+ ```
122
+ if constraints newly satisfied:
123
+ +3.0
124
+ if constraints newly violated:
125
+ -3.0
126
+
127
+ if feasible:
128
+ reward += (prev_elongation - curr_elongation) * 10.0 # improvement in objective
129
+ else:
130
+ reward += (prev_feasibility - curr_feasibility) * 5.0 # progress toward feasibility
131
+
132
+ per-step cost: -0.1
133
+
134
+ submit bonus (if feasible and improved):
135
+ +5.0 * improvement_ratio + 1.0 * budget_efficiency
136
+ submit penalty (if infeasible or no improvement):
137
+ -1.0
138
+ ```
139
+
140
+ This puts feasibility first. An agent that achieves feasibility then minimizes elongation gets rewarded. An agent that never reaches feasibility gets penalized.
141
+
142
+ ### State
143
+
144
+ ```
145
+ step_count: int
146
+ current_params: {aspect_ratio, elongation, rotational_transform}
147
+ best_params: {aspect_ratio, elongation, rotational_transform}
148
+ initial_score: float
149
+ best_score: float
150
+ current_feasibility: float
151
+ best_feasibility: float
152
+ history: list[str]
153
+ ```
154
+
155
+ ## Two Designs That Were Considered
156
+
157
+ | | Rotating-ellipse env | Curated-seed Fourier-repair env |
158
+ |---|---|---|
159
+ | Action space | 3 parameters (AR, elongation, iota) | N Fourier modes |
160
+ | Starting point | Generated from parameters | Frozen from HF dataset |
161
+ | Interpretability | High — parameters map to physical shape | Lower — mode perturbations are abstract |
162
+ | Dataset dependency | None at runtime | Requires offline curation |
163
+ | Search space coverage | Low-dimensional subfamily | Full boundary space |
164
+ | Hackathon viability | High | Medium (needs pre-work) |
165
+
166
+ **Decision:** Rotating-ellipse for the hackathon. It is self-contained, human-playable, and proven as a viable entry point for P1.
167
+
168
+ **What it does NOT claim:** Full coverage of the P1 boundary design space. This is a tradeoff accepted for hackathon scope.
169
+
170
+ ## Implementation Order
171
+
172
+ ### Phase 1: Physics Backend (~1 hour)
173
+
174
+ Rewrite `server/physics.py` to wrap:
175
+ - `constellaration.initial_guess.generate_rotating_ellipse` for boundary generation
176
+ - `constellaration.forward_model.forward_model` with low-fi settings for evaluation
177
+ - `constellaration.problems.GeometricalProblem` for official P1 scoring on submit
178
+
179
+ ### Phase 2: Environment Contract (~1 hour)
180
+
181
+ Update `server/environment.py`:
182
+ - New observation schema with P1 metrics
183
+ - New action schema for rotating-ellipse perturbations
184
+ - Reward V0 with feasibility-first logic
185
+ - Terminal conditions
186
+
187
+ Update `fusion_lab/models.py` for new schemas.
188
+
189
+ ### Phase 3: Manual Playtest (~30 min)
190
+
191
+ Validate hypothesis: "6 actions is enough."
192
+ - Play 5-10 episodes manually
193
+ - Log: can a human reach feasibility? Improve elongation?
194
+ - Tune magnitude deltas if needed
195
+ - Document at least one pathology or adjustment
196
+
197
+ ### Phase 4: Baselines (~30 min)
198
+
199
+ - Random agent
200
+ - Heuristic agent (greedy toward known-good parameter region)
201
+ - Comparison table
202
+
203
+ ### Phase 5: Deploy + Evidence (~2 hours)
204
+
205
+ - Update Dockerfile/deps for constellaration
206
+ - `openenv validate` + `openenv push`
207
+ - Colab notebook connecting to live environment
208
+ - 1-minute demo video
209
+
210
+ This section exists to justify the pivot with an implementation path. It should not trigger another strategy pass when the same work is already covered by the SSOT plan and checklist.
211
+
212
+ ## Fallback
213
+
214
+ If constellaration deployment fails (Docker build, HF Spaces issues):
215
+ - The current synthetic physics environment is already working and deployment-ready
216
+ - Fall back to shipping that with updated docs acknowledging it as a proxy model
217
+ - Do not spend more than 1 hour debugging deployment before falling back
218
+
219
+ ## Known-Good Fixtures
220
+
221
+ Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
222
+
223
+ 1. **Near-feasible anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — expected to be close to P1 boundary
224
+ 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
225
+ 3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
226
+
227
+ These are for verifier/reward sanity, not a prerequisite seed-mining project.
228
+
229
+ ## What Not To Do
230
+
231
+ - Do not port the full ai-sci-feasible-designs harness or governor stack.
232
+ - Do not make the task "agent writes arbitrary optimization scripts."
233
+ - Do not stream the full HF dataset at runtime.
234
+ - Do not mix rotating-ellipse and Fourier-repair action spaces.
235
+ - Do not use high-fidelity eval for interactive steps (24s is too slow).
236
+ - Do not narrate "6 actions is enough" as validated until manually playtested.
237
+ - Do not claim full P1 boundary space coverage. The env uses a low-dim subfamily.
238
+ - Do not reopen the task-selection debate after the pivot is already accepted unless a blocker forces it.
pyproject.toml CHANGED
@@ -1,10 +1,11 @@
1
  [project]
2
  name = "fusion-design-lab"
3
  version = "0.1.0"
4
- description = "OpenEnv environment for budget-constrained stellarator design"
5
  readme = "README.md"
6
  requires-python = ">=3.11"
7
  dependencies = [
 
8
  "fastapi>=0.115.0",
9
  "numpy>=2.0.0",
10
  "openenv-core[core]>=0.2.1",
@@ -13,9 +14,9 @@ dependencies = [
13
  ]
14
 
15
  [project.optional-dependencies]
16
- physics = [
17
- "simsopt",
18
- "vmecpp",
19
  ]
20
  dev = [
21
  "pre-commit>=4.0.0",
@@ -23,12 +24,15 @@ dev = [
23
  "ruff>=0.11.0",
24
  ]
25
 
 
 
 
26
  [build-system]
27
  requires = ["setuptools>=69.0"]
28
  build-backend = "setuptools.build_meta"
29
 
30
  [tool.setuptools]
31
- packages = ["fusion_lab", "server"]
32
 
33
  [tool.ruff]
34
  line-length = 100
 
1
  [project]
2
  name = "fusion-design-lab"
3
  version = "0.1.0"
4
+ description = "OpenEnv P1 environment for constrained stellarator design with constellaration"
5
  readme = "README.md"
6
  requires-python = ">=3.11"
7
  dependencies = [
8
+ "constellaration",
9
  "fastapi>=0.115.0",
10
  "numpy>=2.0.0",
11
  "openenv-core[core]>=0.2.1",
 
14
  ]
15
 
16
  [project.optional-dependencies]
17
+ notebooks = [
18
+ "ipykernel>=6.29.0",
19
+ "jupyterlab>=4.3.0",
20
  ]
21
  dev = [
22
  "pre-commit>=4.0.0",
 
24
  "ruff>=0.11.0",
25
  ]
26
 
27
+ [project.scripts]
28
+ server = "server.app:main"
29
+
30
  [build-system]
31
  requires = ["setuptools>=69.0"]
32
  build-backend = "setuptools.build_meta"
33
 
34
  [tool.setuptools]
35
+ packages = ["baselines", "fusion_lab", "server"]
36
 
37
  [tool.ruff]
38
  line-length = 100
server/Dockerfile ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
2
+ FROM ${BASE_IMAGE} AS builder
3
+
4
+ WORKDIR /app
5
+
6
+ RUN apt-get update && \
7
+ apt-get install -y --no-install-recommends git && \
8
+ rm -rf /var/lib/apt/lists/*
9
+
10
+ ARG BUILD_MODE=standalone
11
+ ARG ENV_NAME=fusion_design_lab
12
+
13
+ COPY . /app/env
14
+
15
+ WORKDIR /app/env
16
+
17
+ RUN if ! command -v uv >/dev/null 2>&1; then \
18
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
19
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
20
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
21
+ fi
22
+
23
+ RUN --mount=type=cache,target=/root/.cache/uv \
24
+ if [ -f uv.lock ]; then \
25
+ uv sync --frozen --no-install-project --no-editable; \
26
+ else \
27
+ uv sync --no-install-project --no-editable; \
28
+ fi
29
+
30
+ RUN --mount=type=cache,target=/root/.cache/uv \
31
+ if [ -f uv.lock ]; then \
32
+ uv sync --frozen --no-editable; \
33
+ else \
34
+ uv sync --no-editable; \
35
+ fi
36
+
37
+ FROM ${BASE_IMAGE}
38
+
39
+ WORKDIR /app
40
+
41
+ COPY --from=builder /app/env/.venv /app/.venv
42
+ COPY --from=builder /app/env /app/env
43
+
44
+ ENV PATH="/app/.venv/bin:$PATH"
45
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
46
+
47
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
48
+ CMD curl -f http://localhost:8000/health || exit 1
49
+
50
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/data/p1/README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Fixture Data
2
+
3
+ Store tracked `P1` fixtures here.
4
+
5
+ Intended contents:
6
+
7
+ - one known-good or near-winning boundary JSON
8
+ - a few near-boundary designs
9
+ - a few clearly infeasible designs
10
+
11
+ These fixtures are for verifier and reward sanity checks.
12
+
13
+ Do not copy the old `ai-sci-feasible-designs` harness here. Reuse only the specific JSON artifacts needed for the fresh `P1` environment.
training/notebooks/README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Notebooks
2
+
3
+ Use this directory for the notebooks that support the hackathon submission.
4
+
5
+ Expected contents:
6
+
7
+ - one Colab-friendly notebook that connects to the deployed HF Space
8
+ - one Northflank-friendly notebook path for verifier sanity checks, manual reward iteration, baselines, or training/debugging
9
+
10
+ Recommended split:
11
+
12
+ - Northflank notebook: main compute workspace on the team H100
13
+ - Colab notebook: thin public artifact required by the hackathon
14
+
15
+ Operational defaults:
16
+
17
+ - use the same Python dependency set as the repo runtime
18
+ - keep heavy verifier and training work on Northflank
19
+ - keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
20
+ - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
21
+
22
+ Northflank smoke gate:
23
+
24
+ - import `constellaration`
25
+ - generate one rotating-ellipse boundary
26
+ - run one low-fidelity verifier call
27
+ - write one artifact to persistent storage
28
+
29
+ The notebooks are supporting evidence for the environment, not the primary product.
uv.lock ADDED
The diff for this file is too large to render. See raw diff