File size: 9,893 Bytes
acb992c
 
e815b38
e8e5af5
e815b38
acb992c
e815b38
acb992c
e815b38
acb992c
e815b38
 
 
 
acb992c
e815b38
acb992c
e815b38
acb992c
 
 
e815b38
 
 
acb992c
e815b38
3270c54
e815b38
 
 
3270c54
e815b38
acb992c
e815b38
acb992c
e815b38
cdc237b
 
2d47f4f
acb992c
e815b38
acb992c
e815b38
 
acb992c
e815b38
acb992c
 
e815b38
acb992c
e815b38
acb992c
e815b38
acb992c
 
 
 
 
 
e815b38
2d47f4f
e815b38
 
 
2d47f4f
e815b38
acb992c
e815b38
acb992c
e815b38
 
 
acb992c
e815b38
acb992c
e815b38
acb992c
 
 
e815b38
011c17d
e815b38
 
 
011c17d
e815b38
acb992c
e815b38
acb992c
e815b38
acb992c
 
 
 
 
2fccde8
 
 
 
acb992c
 
3270c54
 
 
 
 
 
acb992c
cdc237b
6deaccc
 
3270c54
acb992c
5e0e606
 
 
 
acb992c
e815b38
acb992c
cdc237b
 
e815b38
2fccde8
 
5e0e606
cdc237b
 
acb992c
e815b38
acb992c
e815b38
 
 
 
cdc237b
e815b38
acb992c
e815b38
acb992c
e815b38
 
 
 
011c17d
e815b38
011c17d
e815b38
acb992c
e815b38
 
 
 
5e0e606
acb992c
e815b38
acb992c
cdc237b
 
acb992c
e815b38
acb992c
e815b38
 
 
 
 
 
 
 
 
 
 
 
 
acb992c
e815b38
2d47f4f
9d7dc15
 
cdc237b
 
9d7dc15
 
513a2e2
 
cdc237b
 
 
 
513a2e2
cdc237b
88d9b78
cdc237b
 
 
 
 
 
 
 
 
 
 
 
88d9b78
e815b38
acb992c
e815b38
 
2fccde8
cdc237b
 
2fccde8
 
e815b38
cdc237b
2fccde8
 
cdc237b
 
e815b38
 
acb992c
e815b38
acb992c
e815b38
2fccde8
 
e815b38
acb992c
e815b38
acb992c
e815b38
acb992c
e815b38
 
 
acb992c
e815b38
acb992c
e815b38
 
 
acb992c
e815b38
acb992c
e815b38
 
 
acb992c
e815b38
acb992c
e815b38
acb992c
e815b38
 
 
 
 
acb992c
e815b38
acb992c
 
e815b38
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# P1 Environment Contract V1

**Role:** Live technical contract SSOT for the current implementation phase
**Planning dependency:** [`FUSION_DESIGN_LAB_PLAN_V2.md`](./FUSION_DESIGN_LAB_PLAN_V2.md)
**Evidence dependency:** [`P1_PARAMETERIZATION_DEEPDIVE.md`](P1_PARAMETERIZATION_DEEPDIVE.md)

## 1. Scope

This document defines the live technical contract for:

- [`server/physics.py`](../server/physics.py)
- [`fusion_lab/models.py`](../fusion_lab/models.py)
- [`server/environment.py`](../server/environment.py)
- [`server/app.py`](../server/app.py)

If the observation schema, action schema, episode flow, terminal conditions, or reward semantics change, update this file in the same task.

## 2. Design Split

Keep three layers separate:

1. boundary builder
2. official verifier
3. environment

Boundary builder owns:

- the repaired low-dimensional family
- rotating-ellipse seed generation
- explicit triangularity control injection

Official verifier owns:

- boundary in, metrics out
- official `P1` feasibility semantics
- objective direction and score ordering
- low-fidelity live evaluation mode
- optional higher-fidelity offline validation mode
- explicit failure results when VMEC or forward-model evaluation fails

Environment owns:

- reset pool
- discrete actions
- episode budget
- best-state tracking
- reward shaping

## 3. Boundary Family

The historical 3-knob upstream rotating-ellipse family is not the live contract.

The live controllable knobs are:

- `aspect_ratio`
- `elongation`
- `rotational_transform`
- `triangularity_scale`

Rules:

- stay low-dimensional and human-playable
- treat the current family as rotating-ellipse-derived, not plain upstream rotating ellipse
- the coarse measured sweep is now recorded, but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks

## 4. Action Contract

`intent` is one of:

- `run`
- `submit`
- `restore_best`

For `run`, the action also includes:

- `parameter`: one of `aspect_ratio | elongation | rotational_transform | triangularity_scale`
- `direction`: `increase | decrease`
- `magnitude`: `small | medium | large`

Constraints:

- keep the discrete interaction style
- do not expose the full Fourier action space as the primary environment
- do not use action complexity to compensate for missing clarity elsewhere

## 5. Observation Contract

The observation must stay metric-centered and human-readable.

Required fields:

- `max_elongation`
- `aspect_ratio`
- `average_triangularity`
- `edge_iota_over_nfp`
- `aspect_ratio_violation`
- `triangularity_violation`
- `iota_violation`
- `dominant_constraint`
- `p1_feasibility`
- `p1_score`
- `constraints_satisfied`
- `vacuum_well`
- `evaluation_fidelity`
- `evaluation_failed`
- `failure_reason`
- `step_number`
- `budget_remaining`
- `no_progress_steps`
- `best_low_fidelity_score`
- `best_low_fidelity_feasibility`
- `target_spec`
- `diagnostics_text`
- `reward_breakdown`
- `action_monitor`
- `episode_total_reward`
- `trajectory_summary`

Interpretation rules:

- live environment metrics must be labeled as low-fidelity
- best-state reporting should reflect the single live reward surface
- the observation must be understandable without hidden state
- normalized constraint-violation telemetry must follow the official `P1` constraint scales
- the dominant active constraint must be visible so a human can explain repair-phase rewards
- reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
- action telemetry must expose parameter values before and after the action, including clamped, no-op, and repeat-state moves
- anti-stagnation state that can change reward must be visible in structured observation fields, not only free text

## 6. Episode Flow

1. Reset from one frozen repaired-family seed or a small frozen seed set.
2. Evaluate the initial state with low fidelity and return the first observation.
3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity.
4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget.
5. On `submit`, re-evaluate the current state with low fidelity, consume budget, and end the episode.
6. End the episode on `submit` or budget exhaustion.

Failure semantics:

- failed evaluations still consume budget
- failed evaluations produce visible failure observations
- failed evaluations apply a documented penalty
- the environment must not silently convert failures into success paths

## 7. Terminal Contract

At termination, the environment must provide:

- final best design metrics
- final feasibility status
- total reward
- a short human-readable trajectory summary
- the final reward breakdown and action telemetry for the terminal step

Terminal reporting rules:

- keep submit-time reporting on the same live low-fidelity truth surface as the rest of the episode
- keep any higher-fidelity validation artifacts explicitly outside the live environment observation contract

## 8. Verifier Contract

The verifier of record is `constellaration.problems.GeometricalProblem`.

The implementation must preserve:

- objective direction
- constraint direction
- feasibility semantics
- score ordering

The verifier should stay boundary-based:

- `build_boundary_from_params(...) -> SurfaceRZFourier`
- `evaluate_boundary(boundary, fidelity) -> EvaluationMetrics`

Do not treat parameterization-specific logic as verifier truth.

VMEC preset mapping:

- `run`, `restore_best`, and `submit` use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence)
- higher-fidelity validation uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution) outside the live environment loop
- the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries

Training and evaluation rule:

- use the live low-fidelity environment contract, including explicit `submit`, as the RL surface
- the standard repository notebook and `training/llm_rollout.py` workflows should stay aligned to that same action and reward contract
- keep higher-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts
- do not reintroduce a separate high-fidelity submit path into the live environment unless the contract is deliberately redefined

## 9. Reward V2

`Reward V2` keeps the verifier-native structure from `Reward V1` and adds a small amount of
trajectory-aware shaping. `Reward V1` fixed the main coarse-signal pathology from `Reward V0`:
pure `Δ official_feasibility` was too coarse because official feasibility is a max over
normalized constraint violations, so useful repair steps on non-dominant constraints could be
nearly invisible to the reward.

The remaining `Reward V1` pathology was not verifier mismatch. It was short-horizon shaping:

- the agent got no extra signal for setting a new best infeasible point
- near-feasible progress below `0.02` had no milestone signal unless it crossed the full feasible boundary
- feasible improvements only saw step-to-step objective deltas, not "new best feasible score" progress
- repeated local loops or three-step stagnation had no explicit penalty beyond normal step cost

Target behavior:

- infeasible to feasible crossing gets a clear positive bonus
- feasible to infeasible regression gets a clear penalty
- when both states are infeasible, reduced official feasibility violation should still help
- on low-fidelity `run` steps, setting a new best infeasible feasibility should help
- entering the near-feasible corridor around `p1_feasibility <= 0.02` should get a small bounded bonus
- when both states are infeasible, reduced normalized triangularity violation should help the most
- when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
- when both states are feasible, lower `max_elongation` should help
- on low-fidelity `run` steps, beating the previous best feasible score should help
- larger `run` actions should pay a larger step cost than smaller `run` actions
- `restore_best` should keep a flat non-submit step cost
- repeated local revisits without improvement should pay a small penalty
- three non-improving steps in a row should pay a small stagnation penalty
- `submit` should be better than passive exhaustion when the design is genuinely improved
- recovery after a failed evaluation may receive a modest bounded bonus

Rules:

- keep reward scalar and verifier-driven
- keep the infeasible shaping tied to official normalized constraint violations, not family-name priors
- do not add family-specific reward shaping from `scadena`, `CreativeEngineer`, `Samet`, or `egodos`
- do not use reward complexity to compensate for blocked parameterization, poor seeds, or unclear observations

## 10. Reset and Fixture Policy

Reset policy:

- start with exact frozen seeds
- keep `n_field_periods = 3`
- prefer a small reproducible seed set

Each seed should be:

- reproducible
- near enough to the feasible boundary to make the budget meaningful
- not already solved

Fixture policy:

- track good, boundary, and clearly bad references
- use fixtures for verifier and reward sanity checks
- do not turn fixture mining into a separate broad project

## 11. Open Measurements

These items remain open until measured on the repaired family:

- exact repaired-family range bounds
- exact `triangularity_scale` deltas
- exact `rotational_transform` bounds
- exact reset seed pool
- whether the budget should stay at 6 or change

## 12. Out of Scope

- porting the old `ai-sci-feasible-designs` harness
- broad Fourier-mode action space as the main environment
- complicated reward shaping before playtest evidence
- a wider task family than the single stellarator environment