CreativeEngineer commited on
Commit
65b799e
·
0 Parent(s):

chore: scaffold fusion design lab repo

Browse files
.gitignore ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .DS_Store
2
+ .venv/
3
+ __pycache__/
4
+ *.pyc
5
+ .pytest_cache/
6
+ .ruff_cache/
7
+ .mypy_cache/
8
+ .ipynb_checkpoints/
9
+ dist/
10
+ build/
11
+ *.sqlite
12
+ *.db
13
+ reports/
14
+ artifacts/
15
+ checkpoints/
16
+ server/data/generated/
17
+
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fusion Design Lab
2
+
3
+ Fusion Design Lab is an environment-first OpenEnv hackathon project for budget-constrained stellarator design.
4
+
5
+ The repo is organized around one clear submission thesis:
6
+
7
+ - a narrow, reproducible stellarator design task
8
+ - a small discrete action space
9
+ - real simulator feedback
10
+ - explicit constraints
11
+ - a reward function that is iteratively improved through observed behavior
12
+
13
+ Training is supporting evidence. The environment is the product.
14
+
15
+ ## Current Status
16
+
17
+ This repository is the clean hackathon workspace. The detailed planning docs live in [docs/FUSION_DESIGN_LAB_PLAN_V2.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md), [docs/FUSION_DELIVERABLES_MAP.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md), and [docs/FUSION_NEXT_12_HOURS_CHECKLIST.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md).
18
+
19
+ Implementation status:
20
+
21
+ - repo scaffolded
22
+ - shared models defined
23
+ - server and client entry points stubbed
24
+ - environment contract ready to be implemented next
25
+
26
+ ## Planned Repository Layout
27
+
28
+ ```text
29
+ fusion-design-lab/
30
+ ├── baselines/
31
+ ├── demo/
32
+ ├── docs/
33
+ ├── fusion_lab/
34
+ ├── server/
35
+ ├── tests/
36
+ ├── training/
37
+ ├── openenv.yaml
38
+ ├── pyproject.toml
39
+ └── README.md
40
+ ```
41
+
42
+ ## Immediate Next Steps
43
+
44
+ 1. Implement the environment contract in `server/environment.py`.
45
+ 2. Implement the VMEC-backed physics loop in `server/physics.py`.
46
+ 3. Add one stable local episode test.
47
+ 4. Run manual-playtest episodes before heavy training work.
48
+
baselines/README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Random and heuristic baselines will live here.
2
+
3
+ The first baseline milestone is:
4
+
5
+ - one random agent
6
+ - one simple heuristic agent
7
+ - one short comparison run on the frozen task
8
+
demo/README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Demo assets belong here.
2
+
3
+ Expected contents:
4
+
5
+ - one stable episode capture
6
+ - short demo script
7
+ - any exported figures used in the 1-minute video
8
+
docs/FUSION_DELIVERABLES_MAP.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fusion Design Lab Deliverables Map
2
+
3
+ This is the output-first map for the hackathon. It is aligned to Plan V2: environment-first, reward-iteration-driven, and conservative about training claims. Everything branches from the four final artifacts the judges and submission flow will actually see.
4
+
5
+ ## Deliverables Tree
6
+
7
+ ```mermaid
8
+ flowchart TD
9
+ A["Fusion Design Lab Submission"] --> B["HF Space Environment"]
10
+ A --> C["Colab Eval / Training Notebook"]
11
+ A --> D["1-Minute Demo"]
12
+ A --> E["Public Repo + README"]
13
+
14
+ B --> B0["Environment contract frozen"]
15
+ B --> B1["Remote reset/step works"]
16
+ B --> B2["Reward V0 -> V1 documented"]
17
+ B --> B3["One stable task runs end-to-end"]
18
+ B --> B4["Clear rules + reproducible episodes"]
19
+
20
+ C --> C1["Connects to HF Space"]
21
+ C --> C2["Runs multi-turn episodes"]
22
+ C --> C3["Logs behavior + reward traces"]
23
+
24
+ D --> D1["Clear problem statement"]
25
+ D --> D2["Manual playtest + agent trajectory"]
26
+ D --> D3["Reward shaping story"]
27
+
28
+ E --> E1["Readable project summary"]
29
+ E --> E2["Setup + run instructions"]
30
+ E --> E3["Submission links and artifacts"]
31
+
32
+ B0 --> F["Observation + action schema frozen"]
33
+ B3 --> G["Standalone physics loop proven"]
34
+ B2 --> H["Exploit observed -> penalty added"]
35
+ B4 --> I0["Deterministic action schema"]
36
+ D2 --> I["Human can act coherently in env"]
37
+ C3 --> J["Random baseline"]
38
+ C3 --> K["Heuristic baseline"]
39
+ ```
40
+
41
+ ## Reverse Timeline
42
+
43
+ ```mermaid
44
+ flowchart LR
45
+ S["Submit by Sun 1:00 PM"] --> V["Video finalized"]
46
+ S --> R["Repo public and readable"]
47
+ S --> T["Training / eval evidence exported"]
48
+ S --> H["HF Space live"]
49
+
50
+ V --> V1["Recorded clean demo trajectory"]
51
+ V --> V2["Scripted 60-second story"]
52
+
53
+ T --> T1["Behavior trace image"]
54
+ T --> T2["Baseline comparison numbers"]
55
+ T --> T3["Colab notebook runs end-to-end"]
56
+
57
+ H --> H1["OpenEnv environment packaged"]
58
+ H --> H2["Remote client can reset and step"]
59
+ H --> H3["Verifier and reward stable"]
60
+ H --> H4["Rules are clear and reproducible"]
61
+
62
+ H4 --> P["Environment contract locked first"]
63
+ P --> Q["Manual playtest completed first"]
64
+ H3 --> M["Local physics loop proven first"]
65
+ T2 --> B["Random + heuristic baselines done"]
66
+ T3 --> X["Training included only if persuasive"]
67
+ V1 --> Y["One stable task only"]
68
+ V2 --> Z["Explain reward fix, not just reward gain"]
69
+ ```
70
+
71
+ ## Priority Order
72
+
73
+ 1. Prove the local physics loop.
74
+ 2. Freeze the environment contract and mark the initial reward as `V0`.
75
+ 3. Manual-playtest the environment and fix obvious reward/pathology issues.
76
+ 4. Make one stable OpenEnv task work remotely with clear, reproducible rules.
77
+ 5. Get random and heuristic baselines.
78
+ 6. Use the notebook to show traces and comparisons; include training only if it adds signal.
79
+ 7. Record the demo around environment clarity, reward shaping, and one stable trajectory.
80
+ 8. Polish the repo only after the artifacts are real.
docs/FUSION_DESIGN_LAB_PLAN_V2.md ADDED
@@ -0,0 +1,488 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fusion Design Lab — Plan V2
2
+
3
+ **Hackathon:** OpenEnv Hackathon, March 7-8, 2026
4
+ **Track:** Statement 3.1 (World Modeling — Professional Tasks)
5
+ **Status:** Judge-aligned rewrite of the main plan
6
+
7
+ ## 1. Submission Thesis
8
+
9
+ We are not primarily submitting "a trained model for fusion."
10
+
11
+ We are submitting a clear, reproducible training environment for a constrained scientific design task:
12
+
13
+ - a junior plasma-scientist-style agent
14
+ - a small VMEC budget
15
+ - a narrow action space
16
+ - real simulator feedback
17
+ - explicit constraints
18
+ - a reward function that is understandable and iteratively improved
19
+
20
+ Training is supporting evidence. The environment is the product.
21
+
22
+ ## 2. What Changed From V1
23
+
24
+ This version changes the center of gravity:
25
+
26
+ - `environment quality > training effort`
27
+ - `reward shaping story > polished final reward formula`
28
+ - `manual playtesting > training-first iteration`
29
+ - `clarity and reproducibility > broad unsupported transfer claims`
30
+
31
+ This version also separates:
32
+
33
+ - what is already decided
34
+ - what is a working hypothesis
35
+ - what must be validated before it becomes part of the final pitch
36
+
37
+ ## 3. Judge-Aligned Priorities
38
+
39
+ The judging signal now implies four priorities:
40
+
41
+ 1. The environment itself must be strong.
42
+ 2. The reward function must be explainable and visibly iterated.
43
+ 3. A human should be able to act in the environment coherently before we invest heavily in training.
44
+ 4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
45
+
46
+ ## 4. Final Artifacts
47
+
48
+ The four visible artifacts remain:
49
+
50
+ 1. HF Space environment
51
+ 2. Colab notebook for evaluation or training
52
+ 3. 1-minute demo video
53
+ 4. Public repo and README
54
+
55
+ But the evidence order is:
56
+
57
+ 1. environment contract
58
+ 2. manual playtest log
59
+ 3. reward iteration note
60
+ 4. stable local and remote episodes
61
+ 5. random and heuristic baselines
62
+ 6. training or eval notebook evidence
63
+ 7. demo and repo polish
64
+
65
+ ## 5. Non-Negotiables
66
+
67
+ - One stable task only.
68
+ - No broad cross-science claims unless evidence exists.
69
+ - No training-first drift.
70
+ - No dependence on reward curves alone.
71
+ - No repo/video polish before environment and baselines are real.
72
+
73
+ ## 6. Single Stable Task
74
+
75
+ We intentionally narrow the scope to one environment family:
76
+
77
+ - fixed-boundary, low-resolution, 2-period quasi-helical stellarator
78
+ - one baseline input
79
+ - small seed perturbation for episode variety
80
+ - budget of 6 VMEC runs per episode
81
+
82
+ The task is:
83
+
84
+ > improve quasi-symmetry under explicit constraints with limited simulation budget
85
+
86
+ ### Constraints
87
+
88
+ - aspect ratio in `[4.5, 7.0]`
89
+ - edge iota in `[0.3, 0.6]`
90
+ - volume `> 0.5 m^3`
91
+
92
+ ### Objective
93
+
94
+ - minimize quasi-symmetry residual
95
+
96
+ ## 7. Environment Contract
97
+
98
+ The environment contract must be frozen before meaningful evaluation.
99
+
100
+ ### Observation
101
+
102
+ The observation should expose:
103
+
104
+ - current quasi-symmetry residual
105
+ - best residual so far
106
+ - improvement from initial
107
+ - aspect ratio
108
+ - axis and edge iota
109
+ - volume
110
+ - magnetic well
111
+ - VMEC convergence status
112
+ - step number
113
+ - budget remaining
114
+ - target description
115
+ - concise textual summary of the last action outcome
116
+
117
+ The observation must be interpretable by a human without additional hidden state.
118
+
119
+ ### Action Space
120
+
121
+ The action space stays intentionally small and discrete:
122
+
123
+ - `run`
124
+ - `submit`
125
+ - `restore_best`
126
+
127
+ For `run`, the controllable fields are:
128
+
129
+ - operator: one of a small fixed set of coefficients
130
+ - direction: increase or decrease
131
+ - magnitude: small, medium, large
132
+ - restart mode: hot or cold
133
+
134
+ This is not trying to expose the full plasma design space. The goal is a legible environment, not maximal realism.
135
+
136
+ ### Episode Flow
137
+
138
+ 1. Reset from baseline plus optional small seed perturbation.
139
+ 2. Agent chooses one action.
140
+ 3. Simulator or verifier runs.
141
+ 4. Environment returns diagnostics and reward.
142
+ 5. Episode ends on:
143
+ - `submit`
144
+ - exhausted budget
145
+
146
+ ### Terminal Contract
147
+
148
+ The episode should end cleanly and deterministically.
149
+
150
+ At termination, the environment should provide:
151
+
152
+ - final best design metrics
153
+ - whether constraints were satisfied
154
+ - total reward
155
+ - short human-readable summary of the trajectory
156
+
157
+ ## 8. Reward V0
158
+
159
+ The reward in this document is not the final reward. It is `Reward V0`.
160
+
161
+ The initial scoring idea remains:
162
+
163
+ - improvement in quasi-symmetry should help
164
+ - constraint violations should hurt
165
+ - VMEC non-convergence should hurt
166
+ - wasting budget should have some cost
167
+ - successful early submission may deserve a small bonus
168
+
169
+ ### Reward V0 Design Goals
170
+
171
+ - easy to explain
172
+ - sensitive to genuine progress
173
+ - hostile to obvious degenerate behavior
174
+ - simple enough to debug from trajectories
175
+
176
+ ### Reward V0 Failure Modes To Test
177
+
178
+ We should expect at least some of these:
179
+
180
+ - the agent spams large perturbations
181
+ - the agent oscillates between equivalent moves
182
+ - the agent overuses `restore_best`
183
+ - the agent never submits
184
+ - the agent submits too early
185
+ - the agent learns to preserve safety but not improve objective
186
+
187
+ The reward is only acceptable after we test for those behaviors.
188
+
189
+ ## 9. What Is Hypothesis vs Validated
190
+
191
+ These are still hypotheses until manually or empirically checked:
192
+
193
+ - `large` perturbations are risky enough to make restart choice meaningful
194
+ - six runs are enough to create non-trivial decision pressure
195
+ - the chosen coefficients create a task that is neither trivial nor impossible
196
+ - `restore_best` is useful without becoming an exploit
197
+ - heuristic should beat random on mean episode reward
198
+
199
+ These should not be narrated as facts in the final demo until validated.
200
+
201
+ ## 10. Manual Playtest Plan
202
+
203
+ Before heavy training, we should act as the agent ourselves.
204
+
205
+ ### Protocol
206
+
207
+ Run 5 to 10 episodes manually and log for each step:
208
+
209
+ - observation seen
210
+ - action chosen
211
+ - reason for the action
212
+ - simulator outcome
213
+ - reward returned
214
+ - whether the reward matched intuitive quality
215
+
216
+ ### Questions The Playtest Must Answer
217
+
218
+ - can a human understand what to do from the observation?
219
+ - do action labels map to meaningful decisions?
220
+ - is six-run budgeting interesting or arbitrary?
221
+ - which actions are high leverage?
222
+ - do obvious bad actions get punished?
223
+ - do obviously good actions get rewarded?
224
+ - does `restore_best` help recovery or encourage stalling?
225
+
226
+ ### Expected Output
227
+
228
+ - short manual playtest log
229
+ - one paragraph on what a good episode looks like
230
+ - one paragraph on what broke or felt ambiguous
231
+
232
+ ## 11. Reward Iteration Story
233
+
234
+ The reward iteration story is not a side note. It is likely part of the pitch.
235
+
236
+ We should aim to document at least one concrete sequence:
237
+
238
+ 1. initial reward version
239
+ 2. observed bad behavior
240
+ 3. reward or penalty change
241
+ 4. changed behavior afterward
242
+
243
+ Examples of acceptable story structure:
244
+
245
+ - "The agent kept making risky large moves, so we increased the non-convergence penalty."
246
+ - "The agent kept deferring commitment, so we adjusted terminal incentives."
247
+ - "The agent overused restore-best, so we changed the reward/step logic to make stalling unprofitable."
248
+
249
+ This is stronger than saying only "reward improved after training."
250
+
251
+ ## 12. Evidence Plan
252
+
253
+ ### HF Space
254
+
255
+ Must prove:
256
+
257
+ - remote `reset` works
258
+ - remote `step` works
259
+ - one stable episode runs end-to-end
260
+ - the remote behavior matches the local contract
261
+
262
+ ### Colab Notebook
263
+
264
+ Primary job:
265
+
266
+ - connect to the live environment
267
+ - run multi-turn episodes
268
+ - export traces and baseline comparisons
269
+
270
+ Secondary job:
271
+
272
+ - show training or policy improvement if the signal is credible
273
+
274
+ If training is weak but the environment and eval traces are strong, the notebook still ships.
275
+
276
+ ### Demo Video
277
+
278
+ The video should show:
279
+
280
+ 1. the task
281
+ 2. the environment observation and action space
282
+ 3. one manual or agent trajectory
283
+ 4. one reward pathology and fix
284
+ 5. one baseline comparison
285
+
286
+ Reward curves are optional supporting visuals, not the center of the story.
287
+
288
+ ### Public Repo
289
+
290
+ The repo should make the environment easy to understand:
291
+
292
+ - what the task is
293
+ - what the agent sees
294
+ - what the agent can do
295
+ - how reward works
296
+ - how to run one episode
297
+ - where the demo evidence lives
298
+
299
+ ## 13. Success Gates
300
+
301
+ ### Gate 1: Environment Contract Locked
302
+
303
+ - task frozen
304
+ - observation schema frozen
305
+ - action schema frozen
306
+ - terminal conditions frozen
307
+
308
+ ### Gate 2: Manual Playtest Pass
309
+
310
+ - human can act coherently
311
+ - at least one trajectory feels sensible
312
+ - at least one pathology identified or ruled out
313
+
314
+ ### Gate 3: Stable Local Episode
315
+
316
+ - local modify -> solve -> observe loop works
317
+ - at least one end-to-end episode is stable
318
+
319
+ ### Gate 4: Reward V1
320
+
321
+ - at least one reward revision completed
322
+ - story is documented with before/after behavior
323
+
324
+ ### Gate 5: Baselines
325
+
326
+ - random baseline complete
327
+ - heuristic baseline complete
328
+ - heuristic is at least competitive and preferably better than random
329
+
330
+ ### Gate 6: Remote Environment
331
+
332
+ - HF Space live
333
+ - remote client runs one clean episode
334
+
335
+ ### Gate 7: Notebook Evidence
336
+
337
+ - notebook runs end-to-end
338
+ - traces exported
339
+ - training evidence included only if it adds signal
340
+
341
+ ## 14. Timeline
342
+
343
+ ### Phase 0
344
+
345
+ Lock the environment contract and validate the minimal toolchain needed to play the game.
346
+
347
+ Deliverables:
348
+
349
+ - frozen task definition
350
+ - frozen action and observation schema
351
+ - proof that one VMEC modify -> run -> diagnose loop works
352
+
353
+ ### Phase 1
354
+
355
+ Manual-playtest the environment.
356
+
357
+ Deliverables:
358
+
359
+ - 5 to 10 episode logs
360
+ - notes on leverage, ambiguity, and pathologies
361
+
362
+ ### Phase 2
363
+
364
+ Implement or refine Reward V0 into Reward V1 based on real behavior.
365
+
366
+ Deliverables:
367
+
368
+ - documented exploit
369
+ - documented fix
370
+ - updated reward logic
371
+
372
+ ### Phase 3
373
+
374
+ Stabilize one local task and run baselines.
375
+
376
+ Deliverables:
377
+
378
+ - stable local trajectory
379
+ - random baseline
380
+ - heuristic baseline
381
+
382
+ ### Phase 4
383
+
384
+ Deploy HF Space and validate remote parity.
385
+
386
+ Deliverables:
387
+
388
+ - live environment
389
+ - one stable remote episode
390
+
391
+ ### Phase 5
392
+
393
+ Produce notebook evidence.
394
+
395
+ Deliverables:
396
+
397
+ - Colab notebook
398
+ - traces
399
+ - baseline comparison
400
+ - training outputs only if persuasive
401
+
402
+ ### Phase 6
403
+
404
+ Record the demo and make the repo readable.
405
+
406
+ Deliverables:
407
+
408
+ - 1-minute video
409
+ - public README
410
+ - linked artifacts
411
+
412
+ ## 15. Fallback Rules
413
+
414
+ If something goes wrong, the fallback should preserve the environment story.
415
+
416
+ ### If training signal is weak
417
+
418
+ Do not force a training-centric pitch.
419
+
420
+ Ship:
421
+
422
+ - strong environment
423
+ - manual playtest evidence
424
+ - reward iteration story
425
+ - baseline traces
426
+ - one stable remote demo
427
+
428
+ ### If reward is unstable
429
+
430
+ Reduce ambition:
431
+
432
+ - keep only the terms we can explain
433
+ - remove fragile shaping
434
+ - prefer legible trajectories over complex reward composition
435
+
436
+ ### If the task is too hard
437
+
438
+ Do not broaden scope.
439
+
440
+ Instead:
441
+
442
+ - simplify the starting configuration
443
+ - tighten the action set
444
+ - make the task more learnable within six runs
445
+
446
+ ### If the task is too easy
447
+
448
+ Do not add more domains.
449
+
450
+ Instead:
451
+
452
+ - adjust budget
453
+ - adjust magnitudes
454
+ - adjust reward to discourage trivial submission
455
+
456
+ ## 16. Demo Story
457
+
458
+ The recommended demo structure is:
459
+
460
+ ### Part 1: Problem
461
+
462
+ "The agent gets a small VMEC budget to improve a stellarator design while staying within constraints."
463
+
464
+ ### Part 2: Environment
465
+
466
+ "Here is what the agent sees, what it can change, and what counts as success."
467
+
468
+ ### Part 3: Reward Iteration
469
+
470
+ "Our first reward version produced a bad behavior. We changed the penalty or incentive, and the behavior improved."
471
+
472
+ ### Part 4: Evidence
473
+
474
+ "Here is one stable trajectory, plus random and heuristic baselines."
475
+
476
+ ### Part 5: Why It Matters
477
+
478
+ "This is a clear, reproducible simulation environment for budget-constrained scientific decision-making."
479
+
480
+ That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
481
+
482
+ ## 17. Immediate Next Actions
483
+
484
+ 1. Freeze the environment contract in code and docs.
485
+ 2. Run manual playtests before heavy training work.
486
+ 3. Mark the current reward as `V0`.
487
+ 4. Log the first real pathology and reward revision.
488
+ 5. Do not let notebook or video work outrun the environment evidence.
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fusion Design Lab: Next 12 Hours Checklist
2
+
3
+ This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with environment clarity and reproducibility driving the sequence.
4
+
5
+ ## Core Rule
6
+
7
+ Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
8
+
9
+ ## Plan V2 Inheritance
10
+
11
+ Carry these rules through the whole checklist:
12
+
13
+ - Freeze the environment contract before heavy iteration.
14
+ - Treat the current reward as `Reward V0`, not final reward.
15
+ - Distinguish validated facts from working hypotheses.
16
+ - Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
17
+ - If training is weak, ship the environment story anyway.
18
+
19
+ ## Hour 0-2: Lock the Environment Contract
20
+
21
+ 1. Write the exact environment spec.
22
+ 2. Freeze one task only.
23
+ 3. Define:
24
+ - observation schema
25
+ - action schema
26
+ - episode loop
27
+ - terminal conditions
28
+ - reward V0 terms
29
+ - initial penalties
30
+ 4. Update the main diagram so it emphasizes:
31
+ - environment
32
+ - verifier
33
+ - reward shaping
34
+ - manual playtesting
35
+ 5. Mark open assumptions explicitly:
36
+ - risky action magnitudes
37
+ - whether 6 runs is enough
38
+ - whether `restore_best` is useful without becoming an exploit
39
+
40
+ Exit condition: a human can read the spec and understand how to act in the environment.
41
+
42
+ Artifacts:
43
+ - short environment spec
44
+ - revised mermaid diagram
45
+ - short hypothesis list
46
+
47
+ ## Hour 2-4: Manual Playtest and Fix Reward Pathologies
48
+
49
+ 1. Manually play 5 to 10 episodes.
50
+ 2. Log for each step:
51
+ - observation
52
+ - chosen action
53
+ - expected effect
54
+ - returned reward
55
+ - confusion or exploit if observed
56
+ 3. Identify at least one bad incentive or exploit.
57
+ 4. Patch reward or penalty logic immediately.
58
+ 5. Write the reward shaping story:
59
+ - initial reward V0
60
+ - bad behavior
61
+ - refinement to reward V1
62
+ - improved behavior
63
+
64
+ Exit condition: you can explain why the environment now rewards the intended behavior.
65
+
66
+ Artifacts:
67
+ - manual playtest log
68
+ - reward shaping note
69
+ - reward V1 delta note
70
+
71
+ ## Hour 4-6: Stabilize the Local Task
72
+
73
+ 1. Prove the local physics or verifier loop.
74
+ 2. Run one stable end-to-end task repeatedly.
75
+ 3. Confirm the action schema is deterministic enough for reproducible episodes.
76
+ 4. Save one clean local trajectory.
77
+ 5. Do not proceed to remote deployment until this gate is real.
78
+
79
+ Exit condition: the same setup yields the same type of behavior reliably enough for a demo.
80
+
81
+ Artifacts:
82
+ - stable local run
83
+ - saved trajectory
84
+
85
+ ## Hour 6-8: Make the HF Space Real
86
+
87
+ 1. Package the OpenEnv environment for remote use.
88
+ 2. Verify remote `reset` and `step`.
89
+ 3. Run one clean remote episode end-to-end.
90
+ 4. Confirm the remote environment preserves the same task contract as local.
91
+
92
+ Exit condition: the environment is runnable in the actual submission surface, not only locally.
93
+
94
+ Artifacts:
95
+ - live HF Space environment
96
+ - remote episode proof
97
+
98
+ ## Hour 8-10: Add Baselines
99
+
100
+ 1. Implement the random baseline.
101
+ 2. Implement the heuristic baseline.
102
+ 3. Run short comparisons on the same stable task.
103
+ 4. Save:
104
+ - comparison numbers
105
+ - behavior traces
106
+ - one example where heuristic beats random
107
+
108
+ Exit condition: there is a credible baseline anchor for the judges.
109
+
110
+ Artifacts:
111
+ - random baseline
112
+ - heuristic baseline
113
+ - comparison table or figure
114
+
115
+ ## Hour 10-12: Produce the Submission Evidence
116
+
117
+ 1. Wire the Colab training or eval script to the live environment.
118
+ 2. Ensure it produces:
119
+ - multi-turn episodes
120
+ - behavior traces
121
+ - reward or behavior comparison outputs
122
+ 3. Draft the 60-second demo script.
123
+ 4. Record the demo around:
124
+ - what the environment is
125
+ - how reward was refined
126
+ - what manual playtesting revealed
127
+ - one stable trajectory
128
+ - baseline comparison
129
+ 5. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
130
+ 6. Make the repo public-facing and readable only after the artifacts are real.
131
+
132
+ Exit condition: all four visible artifacts exist in usable form.
133
+
134
+ Artifacts:
135
+ - Colab training or eval script
136
+ - demo script
137
+ - draft or final video
138
+ - updated repo README
139
+ - explicit fallback note if training is not persuasive
140
+
141
+ ## Artifact Order
142
+
143
+ 1. Environment spec
144
+ 2. Manual playtest log
145
+ 3. Reward revision note
146
+ 4. Stable task run
147
+ 5. Random baseline
148
+ 6. Heuristic baseline
149
+ 7. Colab training or eval evidence
150
+ 8. Demo recording
151
+ 9. Repo polish
152
+
153
+ ## Non-Negotiables
154
+
155
+ - Do not widen scope beyond one stable task.
156
+ - Do not optimize training before manual playtesting.
157
+ - Do not rely on reward curves alone; keep trajectory evidence.
158
+ - Do not narrate hypotheses as facts before they are checked.
159
+ - Do not polish the repo or video before the environment and baselines are real.
160
+ - Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
161
+ - Do not force a training-centric story if the strongest evidence is environment quality plus baselines.
fusion_lab/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ """Shared client-side package for Fusion Design Lab."""
2
+
fusion_lab/client.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from openenv.core.client_types import StepResult
4
+ from openenv.core.env_client import EnvClient
5
+
6
+ from fusion_lab.models import StellaratorAction, StellaratorObservation, StellaratorState
7
+
8
+
9
+ class FusionLabClient(
10
+ EnvClient[StellaratorAction, StellaratorObservation, StellaratorState]
11
+ ):
12
+ """Thin typed client wrapper for the remote OpenEnv environment."""
13
+
14
+ def _step_payload(self, action: StellaratorAction) -> dict[str, object]:
15
+ return action.model_dump(exclude_none=True)
16
+
17
+ def _parse_result(self, payload: dict[str, object]) -> StepResult[StellaratorObservation]:
18
+ observation = StellaratorObservation(**payload)
19
+ return StepResult(
20
+ observation=observation,
21
+ reward=observation.reward,
22
+ done=observation.done,
23
+ )
24
+
25
+ def _parse_state(self, payload: dict[str, object]) -> StellaratorState:
26
+ return StellaratorState(**payload)
27
+
fusion_lab/models.py ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Literal
4
+
5
+ from pydantic import BaseModel, Field
6
+
7
+
8
+ ActionIntent = Literal["run", "submit", "restore_best"]
9
+ OperatorName = Literal["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
10
+ DirectionName = Literal["increase", "decrease"]
11
+ MagnitudeName = Literal["small", "medium", "large"]
12
+ RestartMode = Literal["hot", "cold"]
13
+
14
+
15
+ class StellaratorAction(BaseModel):
16
+ intent: ActionIntent
17
+ operator: OperatorName | None = None
18
+ direction: DirectionName | None = None
19
+ magnitude: MagnitudeName | None = None
20
+ restart: RestartMode | None = None
21
+ reasoning: str = ""
22
+
23
+
24
+ class StellaratorObservation(BaseModel):
25
+ diagnostics_text: str
26
+ quasi_symmetry_residual: float
27
+ aspect_ratio: float
28
+ rotational_transform_axis: float
29
+ rotational_transform_edge: float
30
+ magnetic_well_depth: float
31
+ volume: float
32
+ vmec_converged: bool
33
+ step_number: int
34
+ budget_remaining: int
35
+ best_qs_residual: float
36
+ constraints_satisfied: bool
37
+ target_spec: str
38
+ reward: float | None = None
39
+ done: bool = False
40
+
41
+
42
+ class StellaratorState(BaseModel):
43
+ step_count: int = 0
44
+ initial_qs: float = 0.0
45
+ current_qs: float = 0.0
46
+ prev_qs: float = 0.0
47
+ best_qs: float = Field(default=float("inf"))
48
+ budget_total: int = 6
49
+ budget_remaining: int = 6
50
+ constraints_satisfied: bool = True
51
+ history: list[str] = Field(default_factory=list)
52
+
hackathan_raw_guidance.md ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## **OpenEnv Hackathon Participant Guide**
2
+
3
+ Welcome to the [OpenEnv Hackathon](https://cerebralvalley.ai/e/open-env-hackathon), hacker! 👋 We’re thrilled to have you on board.
4
+
5
+ This guide is your all-in-one resource for the event, including schedule, rules, technical resources, problem statements, judging information, and more. Please read this carefully; most answers can be found here.
6
+
7
+ ## **1. Join the [PyTorch Discord Server](https://discord.gg/VBcf6VtfY6)**
8
+
9
+ - You’ll be given a Hackathon Participant role by an admin, which will give you access to the hackathon-specific channels.
10
+
11
+ - Here, you’ll be able to interact with hackers and sponsors, introduce yourselves, and form teams (for a maximum team size of **3**).
12
+
13
+ - If you don't receive your role within **24 hours of joining,** please ping @CV.
14
+
15
+ - Please submit your Discord username below so we can grant you the role
16
+
17
+ [linkEmbed]
18
+
19
+ ## **2. Location**
20
+
21
+ **|** Shack15 (1 Ferry Building, Suite 201, San Francisco CA. 94111)
22
+
23
+ - **Venue Access:** Shack15 is on the 2nd floor of the Ferry Building. Go up the Ferry Building elevator to the second floor, and turn left. Here you will see the main entrance to Shack15. 
24
+
25
+ - **Parking:** Parking near the Ferry Building is extremely limited. Consider parking farther out and taking Uber, Lyft, or Public Transportation. 
26
+
27
+ [youtube]
28
+
29
+ ## **3. WiFi Information**
30
+
31
+ - **Username:** SHACK15_Members
32
+
33
+ - **Password:** M3mb3r$4L!f3
34
+
35
+ ## **4. Hackathon Schedule**
36
+
37
+ **Saturday, March 7 (Outline)**
38
+
39
+ - **9:00 AM:** Doors Open •󠁏 Breakfast Served •󠁏 Team Formation
40
+
41
+ - **10:00 AM – 11:30AM**: Kick-off presentations with Meta, Hugging Face, UC Berkeley, CoreWeave, OpenPipe, Unsloth AI, Fleet AI, Mercor, Scaler AI Labs, Snorkel AI, Patronus AI, Halluminate and Scale AI
42
+
43
+ - **11:30 AM:** Hacking Begins
44
+
45
+ - **1:00 PM:** Lunch Served
46
+
47
+ - **6:00 PM:** Dinner Served
48
+
49
+ - **10:00 PM:** Doors Close •󠁏 Re-entry not permitted
50
+
51
+ **Sunday, March 8 (Outline)**
52
+
53
+ - **9:00AM:** Doors Open •󠁏 Breakfast Served
54
+
55
+ - **1:00PM:** Hacking stops •󠁏 Submissions Due
56
+
57
+ - **1:15PM:** First Round Judging Begins
58
+
59
+ - **2:00PM:** Lunch Served
60
+
61
+ - **3:00PM:** Final Round Judging Begins
62
+
63
+ - **4:00PM:** Winners Announced and Closing
64
+
65
+ - **5:00PM:** Doors Close
66
+
67
+ All presentation slides can be found here
68
+
69
+ [linkEmbed]
70
+
71
+ ## **5. Hackathon and Submission Rules**
72
+
73
+ To keep things fair and aligned with our goals, all teams must follow these rules:
74
+
75
+ - **Open Source:** Please ensure your repository is public.
76
+
77
+ - **New Work Only:** All projects must be started from scratch during the hackathon with no previous work.
78
+
79
+ - **Team Size:** Teams may have up to **3** members.
80
+
81
+ - **Banned Projects:** Projects will be disqualified if they: violate legal, ethical, or platform policies, use code, data, or assets you do not have the rights to.
82
+
83
+ - Your project **must** use OpenEnv (stable release 0.2.1) deployed on HF spaces
84
+
85
+ - You must show a minimal training script for your environment using Unsloth or HF TRL in Colab.
86
+
87
+ - You must upload a **one minute** demo video to YouTube talking about your submission.
88
+
89
+ ## **6. Hackathon Problem Statements**
90
+
91
+ Your project must address at least **one of the five** required problem statements.
92
+
93
+ - Some problem statements include **optional partner-sponsored sub-problem statements**, which are additional focus areas related to the main theme.
94
+
95
+ - Your project may align with **multiple partner sub-problem statements**, but you can only be **judged for a maximum of two**. Please **select up to two** when submitting.
96
+
97
+ - Projects that match these partner sub-problem statements are eligible for **extra partner prizes**, judged separately from the main track winners.
98
+
99
+ - Each partner sub-problem statement carries a prize of **$10,000 USD**.
100
+
101
+ **Statement 1: Multi-Agent Interactions**
102
+
103
+ Environments for this theme involve cooperation, competition, negotiation, and coalition formation. Learning from these environments will enable agents to model the beliefs and incentives of others in partially observable settings. This drives theory-of-mind reasoning and emergent strategic behavior.
104
+
105
+ - **Expected Outcome:** an environment that can be used to train multi-agent task handling in a LLM
106
+
107
+ - **Example Environments:** Market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
108
+
109
+ - **Partner Sub-Themes:**
110
+ - **Fleet AI:** Scalable Oversight: Environments that train oversight agents to monitor, analyze, and explain the behavior of other AI agents operating in complex, multi-agent settings.
111
+ - **Halluminate:** Multi-Actor Environments: Build a realistic environment where an agent interacts with and manages multiple actors (agents) to discover and achieve the task
112
+
113
+ **Statement 2: (Super) Long-Horizon Planning & Instruction Following**
114
+
115
+ You will build environments that require deep, multi-step reasoning with sparse or delayed rewards. After using these environments, the goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes. The aim is to push beyond shallow next-token reasoning toward structured planning and durable internal representations. 
116
+
117
+ - **Expected Outcome:** an environment that can capture and improve LLM behaviour on challenging long horizon tasks that need long running sessions beyond context memory limits. 
118
+
119
+ - **Example Environments:** Research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
120
+
121
+ - **Partner Sub-Themes:**
122
+ - **Mercor:** Make an environment with capped/uncapped rewards where frontier model rewards scale with token output.
123
+
124
+ - **Scale AI:** Environments for long horizon workflows for non-code use cases within a business setting: focusing on either Sales, Project management, or HR & IT.
125
+
126
+ **Statement 3: World Modeling**
127
+
128
+ - **Statement 3.1: Professional Tasks:** Here you will develop environments that require real interaction with tools, APIs, or dynamic systems where the model is expected to do real hard work instead of exploiting short-cuts to arrive at the desired outcome. Learning from these environments will enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows. The goal is to strengthen causal reasoning and persistent world models.
129
+ - **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improve LLM interaction with it
130
+
131
+ - **Example Environments:** Dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers → code → experiments), economic simulations with feedback, tool-discovery benchmarks.
132
+
133
+ - **Partner Sub-Theme:**
134
+ - **Scaler AI Labs:** Multi-App RL Environment for Enterprise Workflows: Create RL environments to demonstrate complex workflows, business rule nuances etc in a large enterprise
135
+
136
+ - **Statement 3.2: Personalized Tasks:** Here we will develop an environment that offers real personalized task handling, imagine replying to personal messages or handling dinner conflicts due to work conflicts, replying to tough emails. Think any personal assistant tasks.
137
+ - **Expected Outcome:** An environment that gives the model a realistic simulation of handling personal tasks, conflicts and managing them as delegations
138
+
139
+ - **Example Environments:** Executive Assistant Meeting Planner, Dinner and drive planning, email and message replying, etc
140
+
141
+ - **Partner Sub-Theme:**
142
+ - **Patronus AI:** Consumer Workflows with Schema Drift: Multi-step consumer workflow environments where the underlying data schemas, API contracts, and t&cs/policies/rules change.
143
+
144
+ **Statement 4: Self-Improvement**
145
+
146
+ The focus here is to create environments where agents can learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula. Rather than optimizing fixed tasks, the goal is for agents to learn to drive their own capability growth. The objective is recursive skill amplification.
147
+
148
+ - **Expected Outcome:** an environment for improving self-play of a LLM over a defined set of tasks
149
+
150
+ - **Example Environments:** Self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
151
+
152
+ - **Partner Sub-Theme:**
153
+ - **Snorkel AI:** Simulated Experts-in-the-Loop: Environment that simulates interactions with real subject-matter experts, with changing requirements / preferences.
154
+
155
+ **Statement 5: Wild Card - Impress Us!**
156
+
157
+ We do not want to limit your focus if your idea doesn’t fit the boxes above, we want and WILL reward out of box tasks, please be creative but remember to add submissions that meaningfully add value to LLM training on a certain task.
158
+
159
+ More details about each theme can be found here:
160
+
161
+ [linkEmbed]
162
+
163
+ ## **7. CV Hackathon Winners**
164
+
165
+ [linkEmbed]
166
+
167
+ ## **8. OpenEnv Provided Resources**
168
+
169
+ **Please read through the entire slideshow here. This includes:**
170
+
171
+ - OpenEnv Fundamentals, Architecture
172
+ - Local Dev, Docker, and HF Spaces Deployment
173
+ - OpenEnv in Practice
174
+ - Training (TRL & Unsloth)
175
+ - How-to-Access-Infrastructure (including GPU Request Form)
176
+
177
+ [linkEmbed]
178
+
179
+ ## **9. Partner Provided Resources**
180
+
181
+ - **Unsloth AI Resources**
182
+ - <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks>
183
+ - **Mercor Resources**
184
+ - Dataset: <https://huggingface.co/datasets/mercor/apex-agents>
185
+ - Archipelago repo to run the eval: <https://github.com/Mercor-Intelligence/archipelago>
186
+ - APEX-Agents paper: <https://arxiv.org/abs/2601.14242>
187
+ - **Hugging Face Resources**
188
+ - **$30** in Compute and Inference Credits
189
+ - To claim your credits, set up a HF account here: <https://huggingface.co/join>
190
+ - Then, follow this link: <https://huggingface.co/openenv-community>
191
+ - You will be granted **$30** of compute and inference credits!
192
+ - **Northflank Resources**
193
+ - Each team gets an H100
194
+ - Northflank instructions
195
+
196
+ [linkEmbed]
197
+
198
+ - Join the NorthFlank discord channel for any questions
199
+ - Please fill out this form:
200
+
201
+ [linkEmbed]
202
+
203
+ - **Cursor Resources**
204
+ - **$50** in Cursor Credits, **apply below**
205
+
206
+ [linkEmbed]
207
+
208
+ ## **10. Judging & Submissions**
209
+
210
+ Judges will be taking place on **Sunday, March 8**. These judges are evaluating your **technical demos** in the following categories. _Show us what you have built_ to solve our problem statements. Please **do not** show us a presentation. We'll be checking to ensure your project was built **entirely during the event**; no previous work is allowed. 
211
+
212
+ **|** **Teams should submit [here](https://cerebralvalley.ai/e/openenv-hackathon-sf/hackathon/submit) when they have completed hacking.** In the submission form, you will have to upload a **one minute** demo video on YouTube talking about your submission. You must also show a minimal training script for your environment using Unsloth or HF TRL in Colab.
213
+
214
+ **Please ensure your project uses** use OpenEnv (stable release 0.2.1) deployed on HF spaces.
215
+
216
+ [linkEmbed]
217
+
218
+ **Judging Criteria**
219
+
220
+ - **Environment Innovation (40%) -** Is the environment novel, creative, or challenging? Does it meaningfully test the agent’s behavior?
221
+ - **Storytelling (30%) -** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
222
+ - **Training Script Showing Improvement in Rewards (20%) -** Does the demo provide observable evidence of training progress (reward curves, metrics, or before/after behavior)? 
223
+ - **Reward and Training Pipeline Setup (10%) -** Is the reward logic coherent, and does the pipeline produce meaningful improvement in the agent’s inference (how it acts in the environment)?
224
+
225
+ **Judging Process**
226
+
227
+ **|** Judging proceeds in two rounds:
228
+
229
+ - Hackers will be assigned groups of judges; \~3 minutes to pitch followed by 1-2 minutes of Q/A
230
+
231
+ - The top **six** teams in ranking will get to demo on stage to a panel of judges; \~3 minutes to pitch followed by 2-3 minutes for Q/A.
232
+
233
+ ## **11. Prizes**
234
+
235
+ - **1st Place:** $15,000 USD Cash
236
+
237
+ - **2nd Place:** $9,000 USD Cash
238
+
239
+ - **3rd Place:** $6,000 USD Cash
openenv.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: fusion_design_lab
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
pyproject.toml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "fusion-design-lab"
3
+ version = "0.1.0"
4
+ description = "OpenEnv environment for budget-constrained stellarator design"
5
+ readme = "README.md"
6
+ requires-python = ">=3.11"
7
+ dependencies = [
8
+ "fastapi>=0.115.0",
9
+ "numpy>=2.0.0",
10
+ "openenv-core[core]>=0.2.1",
11
+ "pydantic>=2.10.0",
12
+ "uvicorn>=0.34.0",
13
+ ]
14
+
15
+ [project.optional-dependencies]
16
+ physics = [
17
+ "simsopt",
18
+ "vmecpp",
19
+ ]
20
+ dev = [
21
+ "pytest>=8.3.0",
22
+ "ruff>=0.11.0",
23
+ ]
24
+
25
+ [build-system]
26
+ requires = ["setuptools>=69.0"]
27
+ build-backend = "setuptools.build_meta"
28
+
29
+ [tool.setuptools]
30
+ packages = ["fusion_lab", "server"]
31
+
32
+ [tool.ruff]
33
+ line-length = 100
34
+ target-version = "py311"
35
+
36
+ [tool.pytest.ini_options]
37
+ testpaths = ["tests"]
38
+
server/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ """Server-side package for Fusion Design Lab."""
2
+
server/app.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi import FastAPI
4
+
5
+ from server.environment import TASK, environment_status
6
+
7
+ app = FastAPI(title="Fusion Design Lab")
8
+
9
+
10
+ @app.get("/healthz")
11
+ def healthcheck() -> dict[str, str]:
12
+ return {"status": "ok", "environment": environment_status()}
13
+
14
+
15
+ @app.get("/task")
16
+ def task_summary() -> dict[str, object]:
17
+ return TASK
18
+
server/data/README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Baseline VMEC inputs and related static assets belong here.
2
+
3
+ Do not commit generated solver outputs or large transient artifacts.
4
+
server/environment.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Final
4
+
5
+ TASK: Final[dict[str, object]] = {
6
+ "description": "Minimize quasi-symmetry error for a 2-period quasi-helical stellarator.",
7
+ "constraints": {
8
+ "aspect_ratio": [4.5, 7.0],
9
+ "rotational_transform_edge": [0.3, 0.6],
10
+ "volume_min": 0.5,
11
+ },
12
+ "budget": 6,
13
+ "baseline_input": "server/data/input.QH_baseline",
14
+ }
15
+
16
+
17
+ def environment_status() -> str:
18
+ """Return a simple status string until the full environment is implemented."""
19
+ return "scaffolded"
20
+
server/physics.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+
4
+ class PhysicsEngine:
5
+ """Placeholder for the VMEC-backed physics loop.
6
+
7
+ The next implementation step should make this the single place that:
8
+ - loads the baseline input
9
+ - applies discrete coefficient updates
10
+ - runs the solver
11
+ - computes diagnostics
12
+ - tracks best-known designs
13
+ """
14
+
15
+ def __init__(self) -> None:
16
+ self._status = "unimplemented"
17
+
18
+ @property
19
+ def status(self) -> str:
20
+ return self._status
21
+
tests/test_repo_scaffold.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from server.environment import TASK, environment_status
2
+
3
+
4
+ def test_environment_scaffold_status() -> None:
5
+ assert environment_status() == "scaffolded"
6
+
7
+
8
+ def test_task_budget_is_fixed() -> None:
9
+ assert TASK["budget"] == 6
training/README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Training and evaluation notebooks belong here.
2
+
3
+ This repository treats notebooks as supporting evidence for the environment, not the primary product.
4
+