CreativeEngineer commited on
Commit
daba1b9
·
1 Parent(s): 61fc39b

feat: align p1 environment with repo plan

Browse files
AGENTS.md CHANGED
@@ -51,6 +51,7 @@ Do not leave silent divergence.
51
  - `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
52
  - `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
53
  - `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
 
54
 
55
  ## Working Rules
56
 
@@ -62,6 +63,8 @@ Do not leave silent divergence.
62
  - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
63
  - Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
64
  - Treat supporting decision records as rationale, not as a fresh task queue.
 
 
65
 
66
  ## Environment Contract Rules
67
 
@@ -109,6 +112,12 @@ If a human cannot act coherently from the observation, fix the environment contr
109
 
110
  For scoped changes, prefer the smallest relevant checks first.
111
 
 
 
 
 
 
 
112
  Current useful commands:
113
 
114
  ```bash
 
51
  - `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
52
  - `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
53
  - `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
54
+ - `No Fallout`: keep refactors atomic. Do not leave stale schemas, stale consumers, or half-migrated task terms behind.
55
 
56
  ## Working Rules
57
 
 
63
  - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
64
  - Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
65
  - Treat supporting decision records as rationale, not as a fresh task queue.
66
+ - Do not leave fallout after contract changes. If a schema, action, reward, or task term changes, update dependent files in the same task so the repo stays coherent.
67
+ - Do not leave stale consumers behind after refactors. Task summaries, baselines, notebooks, and docs must either match the new contract or be deliberately updated.
68
 
69
  ## Environment Contract Rules
70
 
 
112
 
113
  For scoped changes, prefer the smallest relevant checks first.
114
 
115
+ ## Environment and Tooling
116
+
117
+ - This repo uses `uv` as the package and environment manager.
118
+ - Prefer `uv sync`, `uv run`, and `uv lock` for local work, Northflank, and HF Space builds.
119
+ - Do not introduce `conda`-specific setup into this repo unless a real blocker forces it and the change is documented.
120
+
121
  Current useful commands:
122
 
123
  ```bash
README.md CHANGED
@@ -14,14 +14,35 @@ Training is supporting evidence. The environment is the product.
14
 
15
  ## Current Status
16
 
17
- This repository is the clean hackathon workspace. The detailed planning docs live in [docs/FUSION_DESIGN_LAB_PLAN_V2.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md), [docs/FUSION_DELIVERABLES_MAP.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md), and [docs/FUSION_NEXT_12_HOURS_CHECKLIST.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md).
18
 
19
  Implementation status:
20
 
21
  - `P1` is locked as the benchmark task
22
  - docs are aligned to fresh `P1` wiring in this repo
23
- - shared models and server/client entry points exist
24
- - the runtime environment still needs to be rewired from the old toy scaffold to the real `P1` contract
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  Current mode:
27
 
@@ -84,11 +105,10 @@ uv sync --extra notebooks
84
  - import `constellaration`
85
  - run one rotating-ellipse generation plus one low-fidelity verifier call
86
  - write an artifact to persistent storage
87
- 3. Rewrite [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py) to the locked `P1` contract.
88
- 4. Rewrite [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py) to use `constellaration`-based `P1` verification.
89
- 5. Add tracked `P1` fixtures under [server/data/p1](/Users/suhjungdae/code/fusion-design-lab/server/data/p1).
90
- 6. Add the Colab notebook under [training/notebooks](/Users/suhjungdae/code/fusion-design-lab/training/notebooks).
91
- 7. Run manual playtest episodes before heavy training work.
92
 
93
  These are implementation steps, not another planning phase.
94
 
 
14
 
15
  ## Current Status
16
 
17
+ This repository is the clean hackathon workspace. The detailed planning docs live in `docs/FUSION_DESIGN_LAB_PLAN_V2.md`, `docs/FUSION_DELIVERABLES_MAP.md`, and `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`.
18
 
19
  Implementation status:
20
 
21
  - `P1` is locked as the benchmark task
22
  - docs are aligned to fresh `P1` wiring in this repo
23
+ - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
24
+ - the current environment uses a synthetic `P1` evaluator; the next runtime step is swapping in `constellaration` as the verifier of record
25
+
26
+ ## Execution Status
27
+
28
+ - [x] Lock the `P1` contract in code
29
+ - [x] Rewrite shared models to the rotating-ellipse `P1` schema
30
+ - [x] Rewrite the environment loop to the rotating-ellipse `P1` schema
31
+ - [x] Update the API/task surface to match `P1`
32
+ - [x] Update baseline agents to the `P1` contract
33
+ - [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
34
+ - [x] Run an initial baseline comparison on the current synthetic `P1` branch state
35
+ - [ ] Replace the synthetic evaluator with `constellaration`
36
+ - [ ] Add tracked `P1` fixtures under `server/data/p1/`
37
+ - [ ] Run manual playtesting and record the first reward pathology
38
+ - [ ] Deploy the real environment to HF Space
39
+
40
+ ## Known Gaps
41
+
42
+ - The current evaluator in `server/physics.py` is a synthetic proxy for `P1`, not the official `constellaration` verifier yet.
43
+ - `BASELINE_PARAMS` is intentionally repairable but currently infeasible at reset; do not describe it as a feasible anchor.
44
+ - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
45
+ - The first local baseline run is only a synthetic-proxy sanity check; heuristic beat random on 20/20 seeded episodes, but this should be re-run after `constellaration` wiring.
46
 
47
  Current mode:
48
 
 
105
  - import `constellaration`
106
  - run one rotating-ellipse generation plus one low-fidelity verifier call
107
  - write an artifact to persistent storage
108
+ 3. Replace the synthetic evaluator in `server/physics.py` with `constellaration`-based `P1` verification.
109
+ 4. Add tracked `P1` fixtures under `server/data/p1`.
110
+ 5. Add the Colab notebook under `training/notebooks`.
111
+ 6. Run manual playtest episodes before heavy training work.
 
112
 
113
  These are implementation steps, not another planning phase.
114
 
TODO.md CHANGED
@@ -4,18 +4,33 @@ This is the execution tracker for the hackathon repo.
4
 
5
  Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
6
 
7
- - [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
8
- - [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md)
9
- - [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
10
- - [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
11
- - [Repo Guardrails](/Users/suhjungdae/code/fusion-design-lab/AGENTS.md)
12
 
13
  Priority source:
14
 
15
- - [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
16
- - [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md) is the execution order SSOT
17
  - this file should track execution progress only
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Execution Graph
20
 
21
  ```mermaid
@@ -34,82 +49,99 @@ flowchart TD
34
 
35
  ## Hour 0-2
36
 
37
- - [ ] Lock the exact `P1` environment contract
38
  Goal:
39
  freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
40
  Related:
41
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
42
- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
43
 
44
  - [ ] Pass the Northflank smoke test
45
  Related:
46
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
47
- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
48
- [training/notebooks/README.md](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/README.md)
49
 
50
  ## Fresh Wiring
51
 
52
- - [ ] Rewrite the shared models to the locked `P1` contract
53
  Files:
54
- [fusion_lab/models.py](/Users/suhjungdae/code/fusion-design-lab/fusion_lab/models.py),
55
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
56
 
57
- - [ ] Rewrite the environment loop to the locked `P1` contract
58
  Files:
59
- [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py),
60
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
61
- [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
62
 
63
- - [ ] Replace the toy physics path with `constellaration` wiring
64
  Files:
65
- [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py),
66
- [server/Dockerfile](/Users/suhjungdae/code/fusion-design-lab/server/Dockerfile),
67
- [pyproject.toml](/Users/suhjungdae/code/fusion-design-lab/pyproject.toml)
68
 
69
- - [ ] Update the API/task surface to match `P1`
70
  Files:
71
- [server/app.py](/Users/suhjungdae/code/fusion-design-lab/server/app.py),
72
- [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
 
 
 
 
 
 
73
 
74
  ## Validation and Reward
75
 
76
  - [ ] Add 1-2 tracked `P1` fixtures
77
  Files:
78
- [server/data/p1/README.md](/Users/suhjungdae/code/fusion-design-lab/server/data/p1/README.md),
79
- [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
80
 
81
  - [ ] Run fixture sanity checks
82
  Goal:
83
  confirm verifier outputs, objective direction, and reward ordering
84
  Related:
85
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
86
- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
87
 
88
  - [ ] Manual-playtest 5-10 episodes
89
  Goal:
90
  verify a human can act coherently and surface at least one pathology or ambiguity
91
  Related:
92
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
93
- [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md)
94
 
95
  - [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
96
  Goal:
97
  keep a short exploit -> fix -> behavior improvement story
98
  Related:
99
- [AGENTS.md](/Users/suhjungdae/code/fusion-design-lab/AGENTS.md),
100
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
 
 
 
 
 
 
 
101
 
102
  ## Baselines
103
 
104
- - [ ] Implement and run the random baseline
 
 
 
 
 
105
  Files:
106
- [baselines/random_agent.py](/Users/suhjungdae/code/fusion-design-lab/baselines/random_agent.py),
107
- [baselines/compare.py](/Users/suhjungdae/code/fusion-design-lab/baselines/compare.py)
108
 
109
- - [ ] Implement and run the heuristic baseline
110
  Files:
111
- [baselines/heuristic_agent.py](/Users/suhjungdae/code/fusion-design-lab/baselines/heuristic_agent.py),
112
- [baselines/compare.py](/Users/suhjungdae/code/fusion-design-lab/baselines/compare.py)
113
 
114
  - [ ] Save one comparison trace that is presentation-ready
115
  Goal:
@@ -119,12 +151,12 @@ flowchart TD
119
 
120
  - [ ] Deploy the environment to HF Space
121
  Related:
122
- [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md),
123
- [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
124
 
125
  - [ ] Create the thin public Colab notebook
126
  Files:
127
- [training/notebooks/README.md](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/README.md)
128
 
129
  - [ ] Record the 1-minute demo
130
  Goal:
@@ -132,12 +164,12 @@ flowchart TD
132
 
133
  - [ ] Finalize the public README
134
  Files:
135
- [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
136
 
137
  - [ ] Only add training evidence if it is actually persuasive
138
  Related:
139
- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
140
- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
141
 
142
  ## Guardrails
143
 
@@ -145,3 +177,5 @@ flowchart TD
145
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
146
  - [ ] Do not let notebook or demo work outrun environment evidence
147
  - [ ] Do not add training-first complexity before manual playtesting
 
 
 
4
 
5
  Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
6
 
7
+ - [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
8
+ - [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
9
+ - [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
10
+ - [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
11
+ - [Repo Guardrails](AGENTS.md)
12
 
13
  Priority source:
14
 
15
+ - [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
16
+ - [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md) is the execution order SSOT
17
  - this file should track execution progress only
18
 
19
+ ## Current State
20
+
21
+ - [x] `P1` strategy is locked
22
+ - [x] shared models reflect the rotating-ellipse `P1` contract
23
+ - [x] environment loop reflects the rotating-ellipse `P1` contract
24
+ - [x] API/task surface reflects `P1`
25
+ - [x] baselines reflect the `P1` contract
26
+ - [x] repo docs call out the synthetic evaluator honestly
27
+ - [x] post-terminal guard in `step()`
28
+ - [ ] `constellaration` verifier wiring
29
+ - [ ] tracked `P1` fixtures
30
+ - [ ] manual playtest log
31
+ - [x] settle the non-submit terminal reward policy
32
+ - [x] baseline comparison has been run once on the current synthetic `P1` branch state
33
+
34
  ## Execution Graph
35
 
36
  ```mermaid
 
49
 
50
  ## Hour 0-2
51
 
52
+ - [x] Lock the exact `P1` environment contract
53
  Goal:
54
  freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
55
  Related:
56
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
57
+ [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
58
 
59
  - [ ] Pass the Northflank smoke test
60
  Related:
61
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
62
+ [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
63
+ [training/notebooks/README.md](training/notebooks/README.md)
64
 
65
  ## Fresh Wiring
66
 
67
+ - [x] Rewrite the shared models to the locked `P1` contract
68
  Files:
69
+ [fusion_lab/models.py](fusion_lab/models.py),
70
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
71
 
72
+ - [x] Rewrite the environment loop to the locked `P1` contract
73
  Files:
74
+ [server/environment.py](server/environment.py),
75
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
76
+ [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
77
 
78
+ - [x] Add a post-terminal guard to the environment loop
79
  Files:
80
+ [server/environment.py](server/environment.py)
81
+ Goal:
82
+ reject or no-op any `step()` call after terminal state so budget and step count do not drift past episode end
83
 
84
+ - [ ] Replace the synthetic physics path with `constellaration` wiring
85
  Files:
86
+ [server/physics.py](server/physics.py),
87
+ [server/Dockerfile](server/Dockerfile),
88
+ [pyproject.toml](pyproject.toml)
89
+
90
+ - [x] Update the API/task surface to match `P1`
91
+ Files:
92
+ [server/app.py](server/app.py),
93
+ [README.md](README.md)
94
 
95
  ## Validation and Reward
96
 
97
  - [ ] Add 1-2 tracked `P1` fixtures
98
  Files:
99
+ [server/data/p1/README.md](server/data/p1/README.md),
100
+ [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
101
 
102
  - [ ] Run fixture sanity checks
103
  Goal:
104
  confirm verifier outputs, objective direction, and reward ordering
105
  Related:
106
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
107
+ [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
108
 
109
  - [ ] Manual-playtest 5-10 episodes
110
  Goal:
111
  verify a human can act coherently and surface at least one pathology or ambiguity
112
  Related:
113
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
114
+ [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
115
 
116
  - [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
117
  Goal:
118
  keep a short exploit -> fix -> behavior improvement story
119
  Related:
120
+ [AGENTS.md](AGENTS.md),
121
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
122
+
123
+ - [x] Decide the non-submit terminal reward policy
124
+ Goal:
125
+ budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
126
+ Files:
127
+ [server/environment.py](server/environment.py),
128
+ [README.md](README.md)
129
 
130
  ## Baselines
131
 
132
+ - [x] Implement the random baseline
133
+ Files:
134
+ [baselines/random_agent.py](baselines/random_agent.py),
135
+ [baselines/compare.py](baselines/compare.py)
136
+
137
+ - [x] Implement the heuristic baseline
138
  Files:
139
+ [baselines/heuristic_agent.py](baselines/heuristic_agent.py),
140
+ [baselines/compare.py](baselines/compare.py)
141
 
142
+ - [x] Run the baseline comparison on the current `P1` branch state
143
  Files:
144
+ [baselines/compare.py](baselines/compare.py)
 
145
 
146
  - [ ] Save one comparison trace that is presentation-ready
147
  Goal:
 
151
 
152
  - [ ] Deploy the environment to HF Space
153
  Related:
154
+ [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md),
155
+ [README.md](README.md)
156
 
157
  - [ ] Create the thin public Colab notebook
158
  Files:
159
+ [training/notebooks/README.md](training/notebooks/README.md)
160
 
161
  - [ ] Record the 1-minute demo
162
  Goal:
 
164
 
165
  - [ ] Finalize the public README
166
  Files:
167
+ [README.md](README.md)
168
 
169
  - [ ] Only add training evidence if it is actually persuasive
170
  Related:
171
+ [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
172
+ [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
173
 
174
  ## Guardrails
175
 
 
177
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
178
  - [ ] Do not let notebook or demo work outrun environment evidence
179
  - [ ] Do not add training-first complexity before manual playtesting
180
+ - [ ] Do not describe the current synthetic evaluator as the official verifier integration
181
+ - [ ] Do not describe the current baseline reset state as already feasible
baselines/compare.py CHANGED
@@ -14,27 +14,27 @@ def main(n_episodes: int = 20) -> None:
14
 
15
  random_rewards: list[float] = []
16
  heuristic_rewards: list[float] = []
17
- random_best_qs: list[float] = []
18
- heuristic_best_qs: list[float] = []
19
 
20
  for i in range(n_episodes):
21
  rr, rt = random_episode(env, seed=i)
22
  random_rewards.append(rr)
23
- random_best_qs.append(rt[-1]["best_qs"])
24
 
25
  hr, ht = heuristic_episode(env, seed=i)
26
  heuristic_rewards.append(hr)
27
- heuristic_best_qs.append(ht[-1]["best_qs"])
28
 
29
  r_mean = sum(random_rewards) / len(random_rewards)
30
  h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
31
- r_qs = sum(random_best_qs) / len(random_best_qs)
32
- h_qs = sum(heuristic_best_qs) / len(heuristic_best_qs)
33
 
34
  print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
35
  print("-" * 51)
36
  print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
37
- print(f"{'Mean best QS residual':<25} {r_qs:>12.6f} {h_qs:>12.6f}")
38
  print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
39
  print()
40
 
 
14
 
15
  random_rewards: list[float] = []
16
  heuristic_rewards: list[float] = []
17
+ random_best_scores: list[float] = []
18
+ heuristic_best_scores: list[float] = []
19
 
20
  for i in range(n_episodes):
21
  rr, rt = random_episode(env, seed=i)
22
  random_rewards.append(rr)
23
+ random_best_scores.append(rt[-1]["best_score"])
24
 
25
  hr, ht = heuristic_episode(env, seed=i)
26
  heuristic_rewards.append(hr)
27
+ heuristic_best_scores.append(ht[-1]["best_score"])
28
 
29
  r_mean = sum(random_rewards) / len(random_rewards)
30
  h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
31
+ r_score = sum(random_best_scores) / len(random_best_scores)
32
+ h_score = sum(heuristic_best_scores) / len(heuristic_best_scores)
33
 
34
  print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
35
  print("-" * 51)
36
  print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
37
+ print(f"{'Mean best P1 score':<25} {r_score:>12.6f} {h_score:>12.6f}")
38
  print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
39
  print()
40
 
baselines/heuristic_agent.py CHANGED
@@ -1,8 +1,8 @@
1
  """Heuristic baseline agent for the stellarator design environment.
2
 
3
  Strategy: guided perturbations informed by domain knowledge.
4
- 1. Probe the most sensitive coefficient (zs12) first with a small move.
5
- 2. Apply medium perturbations in directions that typically improve QS.
6
  3. Use restore_best to recover from any worsening.
7
  4. Submit before exhausting budget.
8
  """
@@ -14,12 +14,12 @@ import sys
14
  from fusion_lab.models import StellaratorAction
15
  from server.environment import StellaratorEnvironment
16
 
17
- STRATEGY: list[tuple[str, str, str, str]] = [
18
- ("tune_zs12", "decrease", "small", "hot"),
19
- ("tune_zs12", "decrease", "medium", "hot"),
20
- ("tune_rc11", "increase", "small", "hot"),
21
- ("tune_rc10", "increase", "medium", "hot"),
22
- ("tune_zs11", "decrease", "small", "hot"),
23
  ]
24
 
25
 
@@ -28,33 +28,40 @@ def heuristic_episode(
28
  ) -> tuple[float, list[dict[str, object]]]:
29
  obs = env.reset(seed=seed)
30
  total_reward = 0.0
31
- trace: list[dict[str, object]] = [{"step": 0, "qs": obs.quasi_symmetry_residual}]
32
- prev_best = obs.best_qs_residual
 
 
 
33
 
34
- for operator, direction, magnitude, restart in STRATEGY:
35
  if obs.done or obs.budget_remaining <= 1:
36
  break
37
 
38
  action = StellaratorAction(
39
  intent="run",
40
- operator=operator,
41
  direction=direction,
42
  magnitude=magnitude,
43
- restart=restart,
44
  )
45
  obs = env.step(action)
46
  total_reward += obs.reward or 0.0
47
  trace.append(
48
  {
49
  "step": len(trace),
50
- "action": f"{operator} {direction} {magnitude}",
51
- "qs": obs.quasi_symmetry_residual,
52
- "best_qs": obs.best_qs_residual,
53
  "reward": obs.reward,
54
  }
55
  )
56
 
57
- if obs.best_qs_residual > prev_best and obs.budget_remaining > 1:
 
 
 
 
 
58
  restore = StellaratorAction(intent="restore_best")
59
  obs = env.step(restore)
60
  total_reward += obs.reward or 0.0
@@ -62,13 +69,13 @@ def heuristic_episode(
62
  {
63
  "step": len(trace),
64
  "action": "restore_best",
65
- "qs": obs.quasi_symmetry_residual,
66
- "best_qs": obs.best_qs_residual,
67
  "reward": obs.reward,
68
  }
69
  )
70
 
71
- prev_best = obs.best_qs_residual
72
 
73
  if not obs.done:
74
  submit = StellaratorAction(intent="submit")
@@ -78,8 +85,8 @@ def heuristic_episode(
78
  {
79
  "step": len(trace),
80
  "action": "submit",
81
- "qs": obs.quasi_symmetry_residual,
82
- "best_qs": obs.best_qs_residual,
83
  "reward": obs.reward,
84
  }
85
  )
@@ -97,7 +104,7 @@ def main(n_episodes: int = 20) -> None:
97
  rewards.append(total_reward)
98
  print(
99
  f"Episode {i:3d}: steps={len(trace) - 1} "
100
- f"final_qs={final['qs']:.6f} best_qs={final['best_qs']:.6f} "
101
  f"reward={total_reward:+.4f}"
102
  )
103
 
 
1
  """Heuristic baseline agent for the stellarator design environment.
2
 
3
  Strategy: guided perturbations informed by domain knowledge.
4
+ 1. Push elongation upward to improve triangularity.
5
+ 2. Nudge rotational transform upward to stay on the iota side of feasibility.
6
  3. Use restore_best to recover from any worsening.
7
  4. Submit before exhausting budget.
8
  """
 
14
  from fusion_lab.models import StellaratorAction
15
  from server.environment import StellaratorEnvironment
16
 
17
+ STRATEGY: list[tuple[str, str, str]] = [
18
+ ("elongation", "increase", "medium"),
19
+ ("elongation", "increase", "small"),
20
+ ("rotational_transform", "increase", "small"),
21
+ ("aspect_ratio", "decrease", "small"),
22
+ ("rotational_transform", "increase", "small"),
23
  ]
24
 
25
 
 
28
  ) -> tuple[float, list[dict[str, object]]]:
29
  obs = env.reset(seed=seed)
30
  total_reward = 0.0
31
+ trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
32
+ prev_best = (
33
+ int(obs.best_feasibility <= 0.01),
34
+ obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
35
+ )
36
 
37
+ for parameter, direction, magnitude in STRATEGY:
38
  if obs.done or obs.budget_remaining <= 1:
39
  break
40
 
41
  action = StellaratorAction(
42
  intent="run",
43
+ parameter=parameter,
44
  direction=direction,
45
  magnitude=magnitude,
 
46
  )
47
  obs = env.step(action)
48
  total_reward += obs.reward or 0.0
49
  trace.append(
50
  {
51
  "step": len(trace),
52
+ "action": f"{parameter} {direction} {magnitude}",
53
+ "score": obs.p1_score,
54
+ "best_score": obs.best_score,
55
  "reward": obs.reward,
56
  }
57
  )
58
 
59
+ current_best = (
60
+ int(obs.best_feasibility <= 0.01),
61
+ obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
62
+ )
63
+
64
+ if current_best < prev_best and obs.budget_remaining > 1:
65
  restore = StellaratorAction(intent="restore_best")
66
  obs = env.step(restore)
67
  total_reward += obs.reward or 0.0
 
69
  {
70
  "step": len(trace),
71
  "action": "restore_best",
72
+ "score": obs.p1_score,
73
+ "best_score": obs.best_score,
74
  "reward": obs.reward,
75
  }
76
  )
77
 
78
+ prev_best = current_best
79
 
80
  if not obs.done:
81
  submit = StellaratorAction(intent="submit")
 
85
  {
86
  "step": len(trace),
87
  "action": "submit",
88
+ "score": obs.p1_score,
89
+ "best_score": obs.best_score,
90
  "reward": obs.reward,
91
  }
92
  )
 
104
  rewards.append(total_reward)
105
  print(
106
  f"Episode {i:3d}: steps={len(trace) - 1} "
107
+ f"final_score={final['score']:.6f} best_score={final['best_score']:.6f} "
108
  f"reward={total_reward:+.4f}"
109
  )
110
 
baselines/random_agent.py CHANGED
@@ -8,10 +8,9 @@ import sys
8
  from fusion_lab.models import StellaratorAction
9
  from server.environment import StellaratorEnvironment
10
 
11
- OPERATORS = ["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
12
  DIRECTIONS = ["increase", "decrease"]
13
  MAGNITUDES = ["small", "medium", "large"]
14
- RESTARTS = ["hot", "cold"]
15
 
16
 
17
  def random_episode(
@@ -20,7 +19,7 @@ def random_episode(
20
  rng = random.Random(seed)
21
  obs = env.reset(seed=seed)
22
  total_reward = 0.0
23
- trace: list[dict[str, object]] = [{"step": 0, "qs": obs.quasi_symmetry_residual}]
24
 
25
  while not obs.done:
26
  if obs.budget_remaining <= 0:
@@ -28,10 +27,9 @@ def random_episode(
28
  else:
29
  action = StellaratorAction(
30
  intent="run",
31
- operator=rng.choice(OPERATORS),
32
  direction=rng.choice(DIRECTIONS),
33
  magnitude=rng.choice(MAGNITUDES),
34
- restart=rng.choice(RESTARTS),
35
  )
36
  obs = env.step(action)
37
  total_reward += obs.reward or 0.0
@@ -39,8 +37,8 @@ def random_episode(
39
  {
40
  "step": len(trace),
41
  "action": action.intent,
42
- "qs": obs.quasi_symmetry_residual,
43
- "best_qs": obs.best_qs_residual,
44
  "reward": obs.reward,
45
  }
46
  )
@@ -58,7 +56,7 @@ def main(n_episodes: int = 20) -> None:
58
  rewards.append(total_reward)
59
  print(
60
  f"Episode {i:3d}: steps={len(trace) - 1} "
61
- f"final_qs={final['qs']:.6f} best_qs={final['best_qs']:.6f} "
62
  f"reward={total_reward:+.4f}"
63
  )
64
 
 
8
  from fusion_lab.models import StellaratorAction
9
  from server.environment import StellaratorEnvironment
10
 
11
+ PARAMETERS = ["aspect_ratio", "elongation", "rotational_transform"]
12
  DIRECTIONS = ["increase", "decrease"]
13
  MAGNITUDES = ["small", "medium", "large"]
 
14
 
15
 
16
  def random_episode(
 
19
  rng = random.Random(seed)
20
  obs = env.reset(seed=seed)
21
  total_reward = 0.0
22
+ trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
23
 
24
  while not obs.done:
25
  if obs.budget_remaining <= 0:
 
27
  else:
28
  action = StellaratorAction(
29
  intent="run",
30
+ parameter=rng.choice(PARAMETERS),
31
  direction=rng.choice(DIRECTIONS),
32
  magnitude=rng.choice(MAGNITUDES),
 
33
  )
34
  obs = env.step(action)
35
  total_reward += obs.reward or 0.0
 
37
  {
38
  "step": len(trace),
39
  "action": action.intent,
40
+ "score": obs.p1_score,
41
+ "best_score": obs.best_score,
42
  "reward": obs.reward,
43
  }
44
  )
 
56
  rewards.append(total_reward)
57
  print(
58
  f"Episode {i:3d}: steps={len(trace) - 1} "
59
+ f"final_score={final['score']:.6f} best_score={final['best_score']:.6f} "
60
  f"reward={total_reward:+.4f}"
61
  )
62
 
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED
@@ -6,6 +6,15 @@ This checklist turns the updated deliverables map and Plan V2 into concrete exec
6
 
7
  Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
8
 
 
 
 
 
 
 
 
 
 
9
  ## Plan V2 Inheritance
10
 
11
  Carry these rules through the whole checklist:
 
6
 
7
  Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
8
 
9
+ ## Current Branch Status
10
+
11
+ - [x] `P1` task is locked
12
+ - [x] rotating-ellipse `P1` contract is implemented in the working tree
13
+ - [x] baselines and API surface have been moved to the `P1` contract
14
+ - [x] add a post-terminal guard in `step()`
15
+ - [ ] replace the synthetic evaluator with `constellaration`
16
+ - [ ] add tracked fixtures and manual playtest evidence
17
+
18
  ## Plan V2 Inheritance
19
 
20
  Carry these rules through the whole checklist:
docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED
@@ -220,7 +220,7 @@ If constellaration deployment fails (Docker build, HF Spaces issues):
220
 
221
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
222
 
223
- 1. **Near-feasible anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — expected to be close to P1 boundary
224
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
225
  3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
226
 
 
220
 
221
  Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
222
 
223
+ 1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
224
  2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
225
  3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
226
 
fusion_lab/models.py CHANGED
@@ -3,46 +3,66 @@ from __future__ import annotations
3
  from typing import Literal
4
 
5
  from openenv.core import Action, Observation, State
6
- from pydantic import Field
7
 
8
  ActionIntent = Literal["run", "submit", "restore_best"]
9
- OperatorName = Literal["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
10
  DirectionName = Literal["increase", "decrease"]
11
  MagnitudeName = Literal["small", "medium", "large"]
12
- RestartMode = Literal["hot", "cold"]
 
 
 
 
 
13
 
14
 
15
  class StellaratorAction(Action):
16
  intent: ActionIntent
17
- operator: OperatorName | None = None
18
  direction: DirectionName | None = None
19
  magnitude: MagnitudeName | None = None
20
- restart: RestartMode | None = None
21
  reasoning: str = ""
22
 
23
 
24
  class StellaratorObservation(Observation):
25
  diagnostics_text: str = ""
26
- quasi_symmetry_residual: float = 0.0
27
  aspect_ratio: float = 0.0
28
- rotational_transform_axis: float = 0.0
29
- rotational_transform_edge: float = 0.0
30
- magnetic_well_depth: float = 0.0
31
- volume: float = 0.0
32
- vmec_converged: bool = True
33
  step_number: int = 0
34
  budget_remaining: int = 6
35
- best_qs_residual: float = float("inf")
 
36
  constraints_satisfied: bool = True
37
  target_spec: str = ""
38
 
39
 
40
  class StellaratorState(State):
41
- initial_qs: float = 0.0
42
- current_qs: float = 0.0
43
- prev_qs: float = 0.0
44
- best_qs: float = Field(default=float("inf"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  budget_total: int = 6
46
  budget_remaining: int = 6
 
47
  constraints_satisfied: bool = True
48
  history: list[str] = Field(default_factory=list)
 
3
  from typing import Literal
4
 
5
  from openenv.core import Action, Observation, State
6
+ from pydantic import BaseModel, Field
7
 
8
  ActionIntent = Literal["run", "submit", "restore_best"]
9
+ ParameterName = Literal["aspect_ratio", "elongation", "rotational_transform"]
10
  DirectionName = Literal["increase", "decrease"]
11
  MagnitudeName = Literal["small", "medium", "large"]
12
+
13
+
14
+ class RotatingEllipseParams(BaseModel):
15
+ aspect_ratio: float
16
+ elongation: float
17
+ rotational_transform: float
18
 
19
 
20
  class StellaratorAction(Action):
21
  intent: ActionIntent
22
+ parameter: ParameterName | None = None
23
  direction: DirectionName | None = None
24
  magnitude: MagnitudeName | None = None
 
25
  reasoning: str = ""
26
 
27
 
28
  class StellaratorObservation(Observation):
29
  diagnostics_text: str = ""
30
+ max_elongation: float = 0.0
31
  aspect_ratio: float = 0.0
32
+ average_triangularity: float = 0.0
33
+ edge_iota_over_nfp: float = 0.0
34
+ p1_score: float = 0.0
35
+ p1_feasibility: float = 0.0
36
+ vacuum_well: float = 0.0
37
  step_number: int = 0
38
  budget_remaining: int = 6
39
+ best_score: float = 0.0
40
+ best_feasibility: float = float("inf")
41
  constraints_satisfied: bool = True
42
  target_spec: str = ""
43
 
44
 
45
  class StellaratorState(State):
46
+ current_params: RotatingEllipseParams = Field(
47
+ default_factory=lambda: RotatingEllipseParams(
48
+ aspect_ratio=3.5,
49
+ elongation=1.5,
50
+ rotational_transform=0.4,
51
+ )
52
+ )
53
+ best_params: RotatingEllipseParams = Field(
54
+ default_factory=lambda: RotatingEllipseParams(
55
+ aspect_ratio=3.5,
56
+ elongation=1.5,
57
+ rotational_transform=0.4,
58
+ )
59
+ )
60
+ initial_score: float = 0.0
61
+ best_score: float = 0.0
62
+ current_feasibility: float = float("inf")
63
+ best_feasibility: float = float("inf")
64
  budget_total: int = 6
65
  budget_remaining: int = 6
66
+ episode_done: bool = False
67
  constraints_satisfied: bool = True
68
  history: list[str] = Field(default_factory=list)
server/app.py CHANGED
@@ -4,10 +4,11 @@ from openenv.core import create_fastapi_app
4
 
5
  from fusion_lab.models import StellaratorAction, StellaratorObservation
6
  from server.environment import (
7
- ASPECT_RATIO_RANGE,
 
8
  BUDGET,
9
- IOTA_EDGE_RANGE,
10
- VOLUME_MIN,
11
  StellaratorEnvironment,
12
  )
13
 
@@ -21,18 +22,18 @@ app = create_fastapi_app(
21
  @app.get("/task")
22
  def task_summary() -> dict[str, object]:
23
  return {
24
- "description": "Minimize quasi-symmetry error for a 2-period quasi-helical stellarator.",
25
  "constraints": {
26
- "aspect_ratio": list(ASPECT_RATIO_RANGE),
27
- "rotational_transform_edge": list(IOTA_EDGE_RANGE),
28
- "volume_min": VOLUME_MIN,
29
  },
 
30
  "budget": BUDGET,
31
  "actions": ["run", "submit", "restore_best"],
32
- "operators": ["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"],
33
  "directions": ["increase", "decrease"],
34
  "magnitudes": ["small", "medium", "large"],
35
- "restart_modes": ["hot", "cold"],
36
  }
37
 
38
 
 
4
 
5
  from fusion_lab.models import StellaratorAction, StellaratorObservation
6
  from server.environment import (
7
+ ASPECT_RATIO_MAX,
8
+ AVERAGE_TRIANGULARITY_MAX,
9
  BUDGET,
10
+ EDGE_IOTA_OVER_NFP_MIN,
11
+ N_FIELD_PERIODS,
12
  StellaratorEnvironment,
13
  )
14
 
 
22
  @app.get("/task")
23
  def task_summary() -> dict[str, object]:
24
  return {
25
+ "description": "Optimize the P1 benchmark with a rotating-ellipse parameterization.",
26
  "constraints": {
27
+ "aspect_ratio_max": ASPECT_RATIO_MAX,
28
+ "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
29
+ "edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
30
  },
31
+ "n_field_periods": N_FIELD_PERIODS,
32
  "budget": BUDGET,
33
  "actions": ["run", "submit", "restore_best"],
34
+ "parameters": ["aspect_ratio", "elongation", "rotational_transform"],
35
  "directions": ["increase", "decrease"],
36
  "magnitudes": ["small", "medium", "large"],
 
37
  }
38
 
39
 
server/environment.py CHANGED
@@ -1,47 +1,62 @@
1
  from __future__ import annotations
2
 
 
3
  from typing import Any, Final, Optional
4
 
5
  from openenv.core import Environment as BaseEnvironment
6
 
7
  from fusion_lab.models import (
 
8
  StellaratorAction,
9
  StellaratorObservation,
10
  StellaratorState,
11
  )
12
- from server.physics import Diagnostics, PhysicsEngine
 
 
 
 
 
 
 
13
 
14
  BUDGET: Final[int] = 6
15
-
16
- ASPECT_RATIO_RANGE: Final[tuple[float, float]] = (4.5, 7.0)
17
- IOTA_EDGE_RANGE: Final[tuple[float, float]] = (0.3, 0.6)
18
- VOLUME_MIN: Final[float] = 0.5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  TARGET_SPEC: Final[str] = (
21
- "Minimize quasi-symmetry residual for a 2-period quasi-helical stellarator. "
22
- "Constraints: aspect ratio in [4.5, 7.0], edge iota in [0.3, 0.6], volume > 0.5 m³. "
 
23
  "Budget: 6 evaluations."
24
  )
25
 
26
 
27
- def check_constraints(diag: Diagnostics) -> bool:
28
- ar_lo, ar_hi = ASPECT_RATIO_RANGE
29
- iota_lo, iota_hi = IOTA_EDGE_RANGE
30
- return (
31
- ar_lo <= diag.aspect_ratio <= ar_hi
32
- and iota_lo <= diag.iota_edge <= iota_hi
33
- and diag.volume >= VOLUME_MIN
34
- )
35
-
36
-
37
  class StellaratorEnvironment(
38
  BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
39
  ):
40
  def __init__(self) -> None:
41
  super().__init__()
42
- self._engine = PhysicsEngine()
43
  self._state = StellaratorState()
44
- self._last_diag: Diagnostics | None = None
 
45
 
46
  def reset(
47
  self,
@@ -49,22 +64,27 @@ class StellaratorEnvironment(
49
  episode_id: Optional[str] = None,
50
  **kwargs: Any,
51
  ) -> StellaratorObservation:
52
- diag = self._engine.reset(seed)
53
- satisfied = check_constraints(diag)
 
54
  self._state = StellaratorState(
55
  episode_id=episode_id,
56
  step_count=0,
57
- initial_qs=diag.qs_residual,
58
- current_qs=diag.qs_residual,
59
- prev_qs=diag.qs_residual,
60
- best_qs=diag.qs_residual,
 
 
61
  budget_total=BUDGET,
62
  budget_remaining=BUDGET,
63
- constraints_satisfied=satisfied,
 
64
  )
65
- self._last_diag = diag
66
  return self._build_observation(
67
- diag, satisfied, action_summary="Episode started. Baseline design loaded."
 
68
  )
69
 
70
  def step(
@@ -73,7 +93,15 @@ class StellaratorEnvironment(
73
  timeout_s: Optional[float] = None,
74
  **kwargs: Any,
75
  ) -> StellaratorObservation:
76
- self._state.prev_qs = self._state.current_qs
 
 
 
 
 
 
 
 
77
  self._state.step_count += 1
78
 
79
  if action.intent == "submit":
@@ -91,108 +119,131 @@ class StellaratorEnvironment(
91
  # ------------------------------------------------------------------
92
 
93
  def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
94
- if not all([action.operator, action.direction, action.magnitude]):
95
  return self._handle_invalid_run()
96
 
97
  self._state.budget_remaining -= 1
98
-
99
- diag = self._engine.modify_and_run(
100
- operator=action.operator,
101
  direction=action.direction,
102
  magnitude=action.magnitude,
103
- restart=action.restart or "hot",
104
  )
105
-
106
- satisfied = check_constraints(diag) if diag.converged else self._state.constraints_satisfied
107
-
108
- if diag.converged:
109
- self._state.current_qs = diag.qs_residual
110
- if diag.qs_residual < self._state.best_qs:
111
- self._state.best_qs = diag.qs_residual
112
- self._state.constraints_satisfied = satisfied
113
 
114
  done = self._state.budget_remaining <= 0
115
- reward = self._compute_reward(diag, action.intent, done)
116
- summary = self._summary_run(action, diag)
117
  self._state.history.append(summary)
118
- self._last_diag = diag
 
119
 
120
  return self._build_observation(
121
- diag, satisfied, action_summary=summary, reward=reward, done=done
 
 
 
122
  )
123
 
124
  def _handle_submit(self) -> StellaratorObservation:
125
- diag = self._last_diag or self._engine.restore_best()
126
- satisfied = check_constraints(diag)
127
- reward = self._compute_reward(diag, "submit", done=True)
128
- summary = self._summary_submit(satisfied)
129
  self._state.history.append(summary)
 
130
 
131
  return self._build_observation(
132
- diag, satisfied, action_summary=summary, reward=reward, done=True
 
 
 
133
  )
134
 
135
  def _handle_restore(self) -> StellaratorObservation:
136
  self._state.budget_remaining -= 1
137
-
138
- diag = self._engine.restore_best()
139
- self._state.current_qs = diag.qs_residual
140
- satisfied = check_constraints(diag)
141
- self._state.constraints_satisfied = satisfied
142
 
143
  done = self._state.budget_remaining <= 0
144
- reward = self._compute_reward(diag, "restore_best", done)
145
- summary = f"Restored best design. QS residual: {diag.qs_residual:.6f}."
 
 
 
146
  self._state.history.append(summary)
147
- self._last_diag = diag
 
148
 
149
  return self._build_observation(
150
- diag, satisfied, action_summary=summary, reward=reward, done=done
 
 
 
151
  )
152
 
153
  def _handle_invalid_run(self) -> StellaratorObservation:
154
  self._state.budget_remaining -= 1
155
- diag = self._last_diag or self._engine.restore_best()
156
- satisfied = check_constraints(diag)
157
  done = self._state.budget_remaining <= 0
158
- summary = "Invalid run action: operator, direction, and magnitude are required."
159
  self._state.history.append(summary)
 
160
  return self._build_observation(
161
- diag, satisfied, action_summary=summary, reward=-1.0, done=done
 
 
 
162
  )
163
 
164
  # ------------------------------------------------------------------
165
  # Reward V0
166
  # ------------------------------------------------------------------
167
 
168
- def _compute_reward(self, diag: Diagnostics, intent: str, done: bool) -> float:
 
 
 
 
 
 
169
  reward = 0.0
170
 
171
- if diag.converged and self._state.prev_qs < float("inf"):
172
- improvement = self._state.prev_qs - diag.qs_residual
173
- reward += improvement * 500.0
174
-
175
- if diag.converged and not check_constraints(diag):
176
- reward -= 2.0
177
 
178
- if not diag.converged:
179
- reward -= 1.5
 
 
180
 
181
  if intent != "submit":
182
  reward -= 0.1
183
 
184
  if intent == "submit":
185
- if self._state.best_qs < self._state.initial_qs:
186
- ratio = 1.0 - (self._state.best_qs / max(self._state.initial_qs, 1e-9))
187
- reward += 5.0 * ratio
188
- reward += 1.0 * (self._state.budget_remaining / self._state.budget_total)
 
 
189
  else:
190
  reward -= 1.0
191
-
192
- if done and intent != "submit":
193
- if self._state.best_qs < self._state.initial_qs:
194
- ratio = 1.0 - (self._state.best_qs / max(self._state.initial_qs, 1e-9))
195
- reward += 2.0 * ratio
 
 
 
196
 
197
  return round(reward, 4)
198
 
@@ -202,8 +253,7 @@ class StellaratorEnvironment(
202
 
203
  def _build_observation(
204
  self,
205
- diag: Diagnostics,
206
- satisfied: bool,
207
  action_summary: str,
208
  reward: float | None = None,
209
  done: bool = False,
@@ -211,29 +261,30 @@ class StellaratorEnvironment(
211
  text_lines = [
212
  action_summary,
213
  "",
214
- f"QS Residual: {diag.qs_residual:.6f} | Best: {self._state.best_qs:.6f}",
215
- f"Aspect Ratio: {diag.aspect_ratio:.4f} [4.5, 7.0]",
216
- f"Edge Iota: {diag.iota_edge:.4f} [0.3, 0.6]",
217
- f"Volume: {diag.volume:.4f} (min 0.5)",
218
- f"Magnetic Well: {diag.magnetic_well_depth:.4f}",
219
- f"VMEC Converged: {diag.converged}",
220
- f"Constraints: {'SATISFIED' if satisfied else 'VIOLATED'}",
221
- f"Step: {self._state.step_count} | Budget: {self._state.budget_remaining}/{self._state.budget_total}",
222
  ]
223
 
224
  return StellaratorObservation(
225
  diagnostics_text="\n".join(text_lines),
226
- quasi_symmetry_residual=diag.qs_residual,
227
- aspect_ratio=diag.aspect_ratio,
228
- rotational_transform_axis=diag.iota_axis,
229
- rotational_transform_edge=diag.iota_edge,
230
- magnetic_well_depth=diag.magnetic_well_depth,
231
- volume=diag.volume,
232
- vmec_converged=diag.converged,
233
  step_number=self._state.step_count,
234
  budget_remaining=self._state.budget_remaining,
235
- best_qs_residual=self._state.best_qs,
236
- constraints_satisfied=satisfied,
 
237
  target_spec=TARGET_SPEC,
238
  reward=reward,
239
  done=done,
@@ -243,20 +294,85 @@ class StellaratorEnvironment(
243
  # Action summaries
244
  # ------------------------------------------------------------------
245
 
246
- def _summary_run(self, action: StellaratorAction, diag: Diagnostics) -> str:
247
- restart_note = f" ({action.restart} restart)" if action.restart else ""
248
- header = f"Applied {action.operator} {action.direction} {action.magnitude}{restart_note}."
249
-
250
- if diag.converged:
251
- delta = self._state.prev_qs - diag.qs_residual
252
- direction = "improved" if delta > 0 else "worsened" if delta < 0 else "unchanged"
253
- return f"{header} VMEC converged. QS {direction}: {self._state.prev_qs:.6f} -> {diag.qs_residual:.6f}."
254
- return f"{header} VMEC failed to converge. Change reverted."
 
 
 
 
 
 
 
 
 
255
 
256
- def _summary_submit(self, satisfied: bool) -> str:
257
- status = "Constraints satisfied." if satisfied else "Constraints VIOLATED."
258
- improvement = self._state.initial_qs - self._state.best_qs
259
  return (
260
- f"Design submitted. Best QS residual: {self._state.best_qs:.6f} "
261
- f"(improved by {improvement:.6f} from initial). {status}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
  )
 
 
 
 
 
 
 
 
 
 
1
  from __future__ import annotations
2
 
3
+ from random import Random
4
  from typing import Any, Final, Optional
5
 
6
  from openenv.core import Environment as BaseEnvironment
7
 
8
  from fusion_lab.models import (
9
+ RotatingEllipseParams,
10
  StellaratorAction,
11
  StellaratorObservation,
12
  StellaratorState,
13
  )
14
+ from server.physics import (
15
+ ASPECT_RATIO_MAX,
16
+ AVERAGE_TRIANGULARITY_MAX,
17
+ EDGE_IOTA_OVER_NFP_MIN,
18
+ FEASIBILITY_TOLERANCE,
19
+ EvaluationMetrics,
20
+ evaluate_params,
21
+ )
22
 
23
  BUDGET: Final[int] = 6
24
+ N_FIELD_PERIODS: Final[int] = 3
25
+
26
+ PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
27
+ "aspect_ratio": (2.0, 8.0),
28
+ "elongation": (1.0, 5.0),
29
+ "rotational_transform": (0.1, 1.0),
30
+ }
31
+
32
+ PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
33
+ "aspect_ratio": {"small": 0.1, "medium": 0.3, "large": 0.8},
34
+ "elongation": {"small": 0.1, "medium": 0.3, "large": 0.8},
35
+ "rotational_transform": {"small": 0.02, "medium": 0.05, "large": 0.15},
36
+ }
37
+
38
+ BASELINE_PARAMS: Final[RotatingEllipseParams] = RotatingEllipseParams(
39
+ aspect_ratio=3.5,
40
+ elongation=1.5,
41
+ rotational_transform=0.4,
42
+ )
43
 
44
  TARGET_SPEC: Final[str] = (
45
+ "Optimize the P1 benchmark using a rotating-ellipse parameterization. "
46
+ "Constraints: aspect ratio <= 4.0, average triangularity <= -0.5, "
47
+ "edge rotational transform / n_field_periods >= 0.3. "
48
  "Budget: 6 evaluations."
49
  )
50
 
51
 
 
 
 
 
 
 
 
 
 
 
52
  class StellaratorEnvironment(
53
  BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
54
  ):
55
  def __init__(self) -> None:
56
  super().__init__()
 
57
  self._state = StellaratorState()
58
+ self._last_metrics: EvaluationMetrics | None = None
59
+ self._rng = Random()
60
 
61
  def reset(
62
  self,
 
64
  episode_id: Optional[str] = None,
65
  **kwargs: Any,
66
  ) -> StellaratorObservation:
67
+ self._rng = Random(seed)
68
+ params = self._initial_params(seed)
69
+ metrics = evaluate_params(params)
70
  self._state = StellaratorState(
71
  episode_id=episode_id,
72
  step_count=0,
73
+ current_params=params,
74
+ best_params=params,
75
+ initial_score=metrics.p1_score,
76
+ best_score=metrics.p1_score,
77
+ current_feasibility=metrics.p1_feasibility,
78
+ best_feasibility=metrics.p1_feasibility,
79
  budget_total=BUDGET,
80
  budget_remaining=BUDGET,
81
+ episode_done=False,
82
+ constraints_satisfied=metrics.constraints_satisfied,
83
  )
84
+ self._last_metrics = metrics
85
  return self._build_observation(
86
+ metrics,
87
+ action_summary="Episode started from the rotating-ellipse baseline.",
88
  )
89
 
90
  def step(
 
93
  timeout_s: Optional[float] = None,
94
  **kwargs: Any,
95
  ) -> StellaratorObservation:
96
+ if self._state.episode_done or self._state.budget_remaining <= 0:
97
+ metrics = self._last_metrics or evaluate_params(self._state.current_params)
98
+ return self._build_observation(
99
+ metrics,
100
+ action_summary=("Episode already ended. Call reset() before sending more actions."),
101
+ reward=0.0,
102
+ done=True,
103
+ )
104
+
105
  self._state.step_count += 1
106
 
107
  if action.intent == "submit":
 
119
  # ------------------------------------------------------------------
120
 
121
  def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
122
+ if not all([action.parameter, action.direction, action.magnitude]):
123
  return self._handle_invalid_run()
124
 
125
  self._state.budget_remaining -= 1
126
+ params = self._apply_action(
127
+ params=self._state.current_params,
128
+ parameter=action.parameter,
129
  direction=action.direction,
130
  magnitude=action.magnitude,
 
131
  )
132
+ metrics = evaluate_params(params)
133
+ self._state.current_params = params
134
+ self._state.current_feasibility = metrics.p1_feasibility
135
+ self._state.constraints_satisfied = metrics.constraints_satisfied
136
+ self._update_best(params, metrics)
 
 
 
137
 
138
  done = self._state.budget_remaining <= 0
139
+ reward = self._compute_reward(metrics, action.intent, done)
140
+ summary = self._summary_run(action, metrics)
141
  self._state.history.append(summary)
142
+ self._last_metrics = metrics
143
+ self._state.episode_done = done
144
 
145
  return self._build_observation(
146
+ metrics,
147
+ action_summary=summary,
148
+ reward=reward,
149
+ done=done,
150
  )
151
 
152
  def _handle_submit(self) -> StellaratorObservation:
153
+ metrics = self._last_metrics or evaluate_params(self._state.current_params)
154
+ reward = self._compute_reward(metrics, "submit", done=True)
155
+ summary = self._summary_submit(metrics)
 
156
  self._state.history.append(summary)
157
+ self._state.episode_done = True
158
 
159
  return self._build_observation(
160
+ metrics,
161
+ action_summary=summary,
162
+ reward=reward,
163
+ done=True,
164
  )
165
 
166
  def _handle_restore(self) -> StellaratorObservation:
167
  self._state.budget_remaining -= 1
168
+ self._state.current_params = self._state.best_params
169
+ metrics = evaluate_params(self._state.current_params)
170
+ self._state.current_feasibility = metrics.p1_feasibility
171
+ self._state.constraints_satisfied = metrics.constraints_satisfied
 
172
 
173
  done = self._state.budget_remaining <= 0
174
+ reward = self._compute_reward(metrics, "restore_best", done)
175
+ summary = (
176
+ "Restored the best-known design. "
177
+ f"Score={metrics.p1_score:.6f}, feasibility={metrics.p1_feasibility:.6f}."
178
+ )
179
  self._state.history.append(summary)
180
+ self._last_metrics = metrics
181
+ self._state.episode_done = done
182
 
183
  return self._build_observation(
184
+ metrics,
185
+ action_summary=summary,
186
+ reward=reward,
187
+ done=done,
188
  )
189
 
190
  def _handle_invalid_run(self) -> StellaratorObservation:
191
  self._state.budget_remaining -= 1
192
+ metrics = self._last_metrics or evaluate_params(self._state.current_params)
 
193
  done = self._state.budget_remaining <= 0
194
+ summary = "Invalid run action: parameter, direction, and magnitude are required."
195
  self._state.history.append(summary)
196
+ self._state.episode_done = done
197
  return self._build_observation(
198
+ metrics,
199
+ action_summary=summary,
200
+ reward=-1.0,
201
+ done=done,
202
  )
203
 
204
  # ------------------------------------------------------------------
205
  # Reward V0
206
  # ------------------------------------------------------------------
207
 
208
+ def _compute_reward(
209
+ self,
210
+ metrics: EvaluationMetrics,
211
+ intent: str,
212
+ done: bool,
213
+ ) -> float:
214
+ previous_metrics = self._last_metrics or metrics
215
  reward = 0.0
216
 
217
+ if metrics.constraints_satisfied and not previous_metrics.constraints_satisfied:
218
+ reward += 3.0
219
+ if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
220
+ reward -= 3.0
 
 
221
 
222
+ if metrics.constraints_satisfied:
223
+ reward += (previous_metrics.max_elongation - metrics.max_elongation) * 10.0
224
+ else:
225
+ reward += (previous_metrics.p1_feasibility - metrics.p1_feasibility) * 5.0
226
 
227
  if intent != "submit":
228
  reward -= 0.1
229
 
230
  if intent == "submit":
231
+ if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
232
+ improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
233
+ 1.0 - self._state.initial_score, 1e-6
234
+ )
235
+ budget_efficiency = self._state.budget_remaining / self._state.budget_total
236
+ reward += 5.0 * improvement_ratio + budget_efficiency
237
  else:
238
  reward -= 1.0
239
+ elif done:
240
+ if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
241
+ improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
242
+ 1.0 - self._state.initial_score, 1e-6
243
+ )
244
+ reward += 2.0 * improvement_ratio
245
+ else:
246
+ reward -= 0.5
247
 
248
  return round(reward, 4)
249
 
 
253
 
254
  def _build_observation(
255
  self,
256
+ metrics: EvaluationMetrics,
 
257
  action_summary: str,
258
  reward: float | None = None,
259
  done: bool = False,
 
261
  text_lines = [
262
  action_summary,
263
  "",
264
+ f"max_elongation={metrics.max_elongation:.4f} | best_score={self._state.best_score:.6f}",
265
+ f"aspect_ratio={metrics.aspect_ratio:.4f} (<= {ASPECT_RATIO_MAX:.1f})",
266
+ f"average_triangularity={metrics.average_triangularity:.4f} (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
267
+ f"edge_iota_over_nfp={metrics.edge_iota_over_nfp:.4f} (>= {EDGE_IOTA_OVER_NFP_MIN:.1f})",
268
+ f"feasibility={metrics.p1_feasibility:.6f} | best_feasibility={self._state.best_feasibility:.6f}",
269
+ f"vacuum_well={metrics.vacuum_well:.4f}",
270
+ f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
271
+ f"step={self._state.step_count} | budget={self._state.budget_remaining}/{self._state.budget_total}",
272
  ]
273
 
274
  return StellaratorObservation(
275
  diagnostics_text="\n".join(text_lines),
276
+ max_elongation=metrics.max_elongation,
277
+ aspect_ratio=metrics.aspect_ratio,
278
+ average_triangularity=metrics.average_triangularity,
279
+ edge_iota_over_nfp=metrics.edge_iota_over_nfp,
280
+ p1_score=metrics.p1_score,
281
+ p1_feasibility=metrics.p1_feasibility,
282
+ vacuum_well=metrics.vacuum_well,
283
  step_number=self._state.step_count,
284
  budget_remaining=self._state.budget_remaining,
285
+ best_score=self._state.best_score,
286
+ best_feasibility=self._state.best_feasibility,
287
+ constraints_satisfied=metrics.constraints_satisfied,
288
  target_spec=TARGET_SPEC,
289
  reward=reward,
290
  done=done,
 
294
  # Action summaries
295
  # ------------------------------------------------------------------
296
 
297
+ def _summary_run(self, action: StellaratorAction, metrics: EvaluationMetrics) -> str:
298
+ assert action.parameter is not None
299
+ assert action.direction is not None
300
+ assert action.magnitude is not None
301
+ previous_metrics = self._last_metrics or metrics
302
+ if metrics.constraints_satisfied:
303
+ delta = previous_metrics.max_elongation - metrics.max_elongation
304
+ objective_summary = (
305
+ f"max_elongation changed by {delta:+.4f} to {metrics.max_elongation:.4f}."
306
+ )
307
+ else:
308
+ delta = previous_metrics.p1_feasibility - metrics.p1_feasibility
309
+ objective_summary = (
310
+ f"feasibility changed by {delta:+.6f} to {metrics.p1_feasibility:.6f}."
311
+ )
312
+ return (
313
+ f"Applied {action.parameter} {action.direction} {action.magnitude}. {objective_summary}"
314
+ )
315
 
316
+ def _summary_submit(self, metrics: EvaluationMetrics) -> str:
 
 
317
  return (
318
+ f"Submitted design with best_score={self._state.best_score:.6f}, "
319
+ f"best_feasibility={self._state.best_feasibility:.6f}, "
320
+ f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
321
+ )
322
+
323
+ def _initial_params(self, seed: int | None) -> RotatingEllipseParams:
324
+ if seed is None:
325
+ return BASELINE_PARAMS
326
+ rng = Random(seed)
327
+ return RotatingEllipseParams(
328
+ aspect_ratio=self._clamp(
329
+ BASELINE_PARAMS.aspect_ratio + rng.uniform(-0.1, 0.1),
330
+ parameter="aspect_ratio",
331
+ ),
332
+ elongation=self._clamp(
333
+ BASELINE_PARAMS.elongation + rng.uniform(-0.1, 0.1),
334
+ parameter="elongation",
335
+ ),
336
+ rotational_transform=self._clamp(
337
+ BASELINE_PARAMS.rotational_transform + rng.uniform(-0.015, 0.015),
338
+ parameter="rotational_transform",
339
+ ),
340
+ )
341
+
342
+ def _apply_action(
343
+ self,
344
+ params: RotatingEllipseParams,
345
+ parameter: str,
346
+ direction: str,
347
+ magnitude: str,
348
+ ) -> RotatingEllipseParams:
349
+ delta = PARAMETER_DELTAS[parameter][magnitude]
350
+ signed_delta = delta if direction == "increase" else -delta
351
+
352
+ next_values = params.model_dump()
353
+ next_values[parameter] = self._clamp(
354
+ next_values[parameter] + signed_delta,
355
+ parameter=parameter,
356
+ )
357
+ return RotatingEllipseParams.model_validate(next_values)
358
+
359
+ def _clamp(self, value: float, *, parameter: str) -> float:
360
+ lower, upper = PARAMETER_RANGES[parameter]
361
+ return min(max(value, lower), upper)
362
+
363
+ def _update_best(self, params: RotatingEllipseParams, metrics: EvaluationMetrics) -> None:
364
+ current_rank = self._candidate_rank(metrics)
365
+ best_rank = (
366
+ (1, self._state.best_score)
367
+ if self._state.best_feasibility <= FEASIBILITY_TOLERANCE
368
+ else (0, -self._state.best_feasibility)
369
  )
370
+ if current_rank > best_rank:
371
+ self._state.best_params = params
372
+ self._state.best_score = metrics.p1_score
373
+ self._state.best_feasibility = metrics.p1_feasibility
374
+
375
+ def _candidate_rank(self, metrics: EvaluationMetrics) -> tuple[int, float]:
376
+ if metrics.constraints_satisfied:
377
+ return (1, metrics.p1_score)
378
+ return (0, -metrics.p1_feasibility)
server/physics.py CHANGED
@@ -1,141 +1,97 @@
1
  from __future__ import annotations
2
 
3
- import math
4
- import random
5
- from dataclasses import dataclass, field
6
  from typing import Final
7
 
8
- NFP: Final[int] = 2
9
 
10
- BASELINE_COEFFS: Final[dict[str, float]] = {
11
- "rc10": 1.0,
12
- "rc11": 0.12,
13
- "zs11": 0.12,
14
- "zs12": -0.02,
15
- }
16
-
17
- OPTIMAL_COEFFS: Final[dict[str, float]] = {
18
- "rc10": 1.02,
19
- "rc11": 0.135,
20
- "zs11": 0.115,
21
- "zs12": -0.035,
22
- }
23
-
24
- MAGNITUDE_DELTAS: Final[dict[str, float]] = {
25
- "small": 0.005,
26
- "medium": 0.02,
27
- "large": 0.05,
28
- }
29
 
30
 
31
  @dataclass(frozen=True)
32
- class Diagnostics:
33
- qs_residual: float
34
  aspect_ratio: float
35
- iota_axis: float
36
- iota_edge: float
37
- volume: float
38
- magnetic_well_depth: float
39
- converged: bool
40
-
41
-
42
- @dataclass
43
- class PhysicsEngine:
44
- coeffs: dict[str, float] = field(default_factory=lambda: dict(BASELINE_COEFFS))
45
- best_coeffs: dict[str, float] = field(default_factory=lambda: dict(BASELINE_COEFFS))
46
- best_qs: float = float("inf")
47
- _rng: random.Random = field(default_factory=random.Random)
48
-
49
- def reset(self, seed: int | None = None) -> Diagnostics:
50
- self.coeffs = dict(BASELINE_COEFFS)
51
- self._rng = random.Random(seed)
52
- if seed is not None:
53
- for key in self.coeffs:
54
- self.coeffs[key] += self._rng.gauss(0, 0.002)
55
- self.best_coeffs = dict(self.coeffs)
56
- diag = self._compute_diagnostics(converged=True)
57
- self.best_qs = diag.qs_residual
58
- return diag
59
-
60
- def modify_and_run(
61
- self,
62
- operator: str,
63
- direction: str,
64
- magnitude: str,
65
- restart: str,
66
- ) -> Diagnostics:
67
- coeff_key = operator.removeprefix("tune_")
68
- delta = MAGNITUDE_DELTAS[magnitude]
69
- if direction == "decrease":
70
- delta = -delta
71
-
72
- prev_value = self.coeffs[coeff_key]
73
- self.coeffs[coeff_key] = prev_value + delta
74
-
75
- converged = self._simulate_convergence(magnitude, restart)
76
- if not converged:
77
- self.coeffs[coeff_key] = prev_value
78
- return self._compute_diagnostics(converged=False)
79
-
80
- diag = self._compute_diagnostics(converged=True)
81
- if diag.qs_residual < self.best_qs:
82
- self.best_qs = diag.qs_residual
83
- self.best_coeffs = dict(self.coeffs)
84
- return diag
85
-
86
- def restore_best(self) -> Diagnostics:
87
- self.coeffs = dict(self.best_coeffs)
88
- return self._compute_diagnostics(converged=True)
89
-
90
- def _compute_diagnostics(self, *, converged: bool) -> Diagnostics:
91
- rc10 = self.coeffs["rc10"]
92
- rc11 = self.coeffs["rc11"]
93
- zs11 = self.coeffs["zs11"]
94
- zs12 = self.coeffs["zs12"]
95
-
96
- r_minor = math.sqrt(rc11**2 + zs11**2)
97
- aspect_ratio = rc10 / max(r_minor, 1e-6)
98
- volume = 2.0 * math.pi**2 * rc10 * r_minor**2
99
-
100
- helical_excursion = abs(zs11 / max(abs(rc11), 1e-6))
101
- iota_axis = 0.35 + 0.15 * helical_excursion + 0.5 * abs(zs12)
102
- shear = 0.04 + 0.02 * abs(rc10 - 1.0)
103
- iota_edge = iota_axis + shear
104
-
105
- magnetic_well = 0.02 + 0.01 * (rc11 / max(abs(zs11), 1e-6) - 1.0)
106
-
107
- qs_residual = self._compute_qs_residual() if converged else float("inf")
108
-
109
- return Diagnostics(
110
- qs_residual=round(qs_residual, 6),
111
- aspect_ratio=round(aspect_ratio, 4),
112
- iota_axis=round(iota_axis, 4),
113
- iota_edge=round(iota_edge, 4),
114
- volume=round(volume, 4),
115
- magnetic_well_depth=round(magnetic_well, 4),
116
- converged=converged,
117
- )
118
-
119
- def _compute_qs_residual(self) -> float:
120
- d = {k: self.coeffs[k] - OPTIMAL_COEFFS[k] for k in OPTIMAL_COEFFS}
121
- quadratic = (
122
- 2.0 * d["rc10"] ** 2
123
- + 8.0 * d["rc11"] ** 2
124
- + 8.0 * d["zs11"] ** 2
125
- + 15.0 * d["zs12"] ** 2
126
- )
127
- cross = 4.0 * d["rc11"] * d["zs11"] - 3.0 * d["rc10"] * d["zs12"]
128
- noise = self._rng.gauss(0, 0.0003)
129
- return max(quadratic + cross + 0.002 + noise, 0.001)
130
-
131
- def _simulate_convergence(self, magnitude: str, restart: str) -> bool:
132
- fail_prob = {"small": 0.02, "medium": 0.08, "large": 0.20}[magnitude]
133
- if restart == "hot":
134
- fail_prob *= 0.5
135
- for key, val in self.coeffs.items():
136
- deviation = abs(val - BASELINE_COEFFS[key])
137
- if deviation > 0.1:
138
- fail_prob += 0.15
139
- elif deviation > 0.05:
140
- fail_prob += 0.05
141
- return self._rng.random() > min(fail_prob, 0.8)
 
1
  from __future__ import annotations
2
 
3
+ from dataclasses import dataclass
 
 
4
  from typing import Final
5
 
6
+ from fusion_lab.models import RotatingEllipseParams
7
 
8
+ ASPECT_RATIO_MAX: Final[float] = 4.0
9
+ AVERAGE_TRIANGULARITY_MAX: Final[float] = -0.5
10
+ EDGE_IOTA_OVER_NFP_MIN: Final[float] = 0.3
11
+ FEASIBILITY_TOLERANCE: Final[float] = 0.01
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
 
14
  @dataclass(frozen=True)
15
+ class EvaluationMetrics:
16
+ max_elongation: float
17
  aspect_ratio: float
18
+ average_triangularity: float
19
+ edge_iota_over_nfp: float
20
+ p1_score: float
21
+ p1_feasibility: float
22
+ constraints_satisfied: bool
23
+ vacuum_well: float
24
+
25
+
26
+ def _normalized_violation(value: float, *, limit: float, direction: str) -> float:
27
+ if direction == "max":
28
+ return max((value - limit) / max(abs(limit), 1e-6), 0.0)
29
+ return max((limit - value) / max(abs(limit), 1e-6), 0.0)
30
+
31
+
32
+ def evaluate_params(params: RotatingEllipseParams) -> EvaluationMetrics:
33
+ aspect_ratio = round(params.aspect_ratio, 4)
34
+ average_triangularity = round(
35
+ -0.2
36
+ - 0.35 * (params.elongation - 1.0)
37
+ - 0.2 * max(0.0, 0.35 - params.rotational_transform),
38
+ 4,
39
+ )
40
+ edge_iota_over_nfp = round(
41
+ params.rotational_transform
42
+ - 0.05 * max(0.0, params.aspect_ratio - ASPECT_RATIO_MAX)
43
+ + 0.03 * (params.elongation - 1.5),
44
+ 4,
45
+ )
46
+ max_elongation = round(
47
+ params.elongation
48
+ + 0.18 * (params.aspect_ratio - 3.4) ** 2
49
+ + 0.8 * abs(params.rotational_transform - 0.42)
50
+ + 0.2,
51
+ 4,
52
+ )
53
+ vacuum_well = round(
54
+ 0.03
55
+ + 0.02 * (4.0 - min(params.aspect_ratio, 4.0))
56
+ + 0.015 * (params.rotational_transform - 0.3)
57
+ - 0.01 * abs(params.elongation - 1.7),
58
+ 4,
59
+ )
60
+
61
+ aspect_ratio_violation = _normalized_violation(
62
+ aspect_ratio,
63
+ limit=ASPECT_RATIO_MAX,
64
+ direction="max",
65
+ )
66
+ triangularity_violation = _normalized_violation(
67
+ average_triangularity,
68
+ limit=AVERAGE_TRIANGULARITY_MAX,
69
+ direction="max",
70
+ )
71
+ iota_violation = _normalized_violation(
72
+ edge_iota_over_nfp,
73
+ limit=EDGE_IOTA_OVER_NFP_MIN,
74
+ direction="min",
75
+ )
76
+
77
+ p1_feasibility = round(
78
+ max(aspect_ratio_violation, triangularity_violation, iota_violation),
79
+ 6,
80
+ )
81
+ constraints_satisfied = p1_feasibility <= FEASIBILITY_TOLERANCE
82
+ p1_score = (
83
+ round(1.0 - min(max((max_elongation - 1.0) / 9.0, 0.0), 1.0), 6)
84
+ if constraints_satisfied
85
+ else 0.0
86
+ )
87
+
88
+ return EvaluationMetrics(
89
+ max_elongation=max_elongation,
90
+ aspect_ratio=aspect_ratio,
91
+ average_triangularity=average_triangularity,
92
+ edge_iota_over_nfp=edge_iota_over_nfp,
93
+ p1_score=p1_score,
94
+ p1_feasibility=p1_feasibility,
95
+ constraints_satisfied=constraints_satisfied,
96
+ vacuum_well=vacuum_well,
97
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
uv.lock CHANGED
The diff for this file is too large to render. See raw diff