CreativeEngineer commited on
Commit
eb446cf
·
1 Parent(s): 5271cce

fix: preserve recovery signal after VMEC failure

Browse files
docs/P1_ENV_CONTRACT_V1.md CHANGED
@@ -201,6 +201,9 @@ Target structure:
201
  - reward lower `max_elongation`
202
  - non-submit step:
203
  - small step cost
 
 
 
204
  - explicit `submit`:
205
  - better than passive budget exhaustion when the design is improved
206
 
@@ -248,6 +251,7 @@ If the answer is no, fix:
248
  - the boundary family
249
  - the step magnitudes
250
  - the seed pool
 
251
 
252
  before tuning reward further
253
 
@@ -260,9 +264,10 @@ before tuning reward further
260
  5. Update the task summary and public action description in [server/app.py](../server/app.py).
261
  6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
262
  7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
263
- 8. Freeze 1-2 repaired low-dimensional fixtures.
264
- 9. Run manual playtesting.
265
- 10. Refresh the heuristic baseline only after that evidence exists.
 
266
 
267
  ## Out of Scope
268
 
 
201
  - reward lower `max_elongation`
202
  - non-submit step:
203
  - small step cost
204
+ - recovery after a failed evaluation:
205
+ - modest positive signal for returning to a valid verifier result
206
+ - do not compute this from the failed sentinel feasibility value
207
  - explicit `submit`:
208
  - better than passive budget exhaustion when the design is improved
209
 
 
251
  - the boundary family
252
  - the step magnitudes
253
  - the seed pool
254
+ - the observation semantics around low-fi vs high-fi best-state reporting
255
 
256
  before tuning reward further
257
 
 
264
  5. Update the task summary and public action description in [server/app.py](../server/app.py).
265
  6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
266
  7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
267
+ 8. Verify that observation semantics are human-readable and that low-fi versus high-fi best-state reporting is explicit.
268
+ 9. Freeze 1-2 repaired low-dimensional fixtures.
269
+ 10. Run manual playtesting.
270
+ 11. Refresh the heuristic baseline only after that evidence exists.
271
 
272
  ## Out of Scope
273
 
docs/P1_PARAMETERIZATION_DEEPDIVE.md CHANGED
@@ -216,10 +216,10 @@ is approximately `rot_transform ≥ 2.0` combined with `tri_scale ≥ 0.7`.
216
  - `average_triangularity ≤ -0.5`
217
  - `edge_rotational_transform / n_field_periods ≥ 0.3`
218
 
219
- ### What needs to change
220
 
221
- - `evaluate_params` currently takes `RotatingEllipseParams` and calls
222
- `generate_rotating_ellipse` directly. It should be split into:
223
  - `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
224
  - `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
225
 
@@ -234,6 +234,7 @@ Feasibility transition: ±3.0 on crossing the feasible/infeasible boundary
234
  Dual-track step shaping:
235
  feasible + feasible → (prev_elongation - curr_elongation) * 10.0
236
  otherwise → (prev_feasibility - curr_feasibility) * 5.0
 
237
  Per-step cost: -0.1 for non-submit actions
238
  Terminal bonus (submit): 5.0 * improvement_ratio + budget_efficiency
239
  Terminal bonus (exhaust): 2.0 * improvement_ratio
@@ -242,10 +243,15 @@ Not improved penalty: -1.0 (submit) / -0.5 (exhaust)
242
 
243
  ### Assessment
244
 
245
- **The reward is well-designed and should stay unchanged.** It only uses two scalars
246
  from the verifier: `feasibility` and `objective (max_elongation)`. These are
247
  problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
248
 
 
 
 
 
 
249
  Things the reward correctly avoids:
250
  - Per-constraint shaping (would overfit to P1's specific constraint structure)
251
  - Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)
 
216
  - `average_triangularity ≤ -0.5`
217
  - `edge_rotational_transform / n_field_periods ≥ 0.3`
218
 
219
+ ### Current implementation
220
 
221
+ - The old `evaluate_params` helper has been retired.
222
+ - The runtime is now split into:
223
  - `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
224
  - `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
225
 
 
234
  Dual-track step shaping:
235
  feasible + feasible → (prev_elongation - curr_elongation) * 10.0
236
  otherwise → (prev_feasibility - curr_feasibility) * 5.0
237
+ Post-failure recovery: +1.0 on the first successful step after a failed evaluation
238
  Per-step cost: -0.1 for non-submit actions
239
  Terminal bonus (submit): 5.0 * improvement_ratio + budget_efficiency
240
  Terminal bonus (exhaust): 2.0 * improvement_ratio
 
243
 
244
  ### Assessment
245
 
246
+ **The reward is still simple and should stay close to unchanged.** It mostly uses two scalars
247
  from the verifier: `feasibility` and `objective (max_elongation)`. These are
248
  problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
249
 
250
+ One small exception is now explicit: recovery from a failed VMEC evaluation gets a
251
+ modest fixed bonus instead of comparing against the failure sentinel. The previous
252
+ behavior could erase recovery signal by comparing a successful step against itself,
253
+ while a naive sentinel comparison would explode the reward into an unbounded spike.
254
+
255
  Things the reward correctly avoids:
256
  - Per-constraint shaping (would overfit to P1's specific constraint structure)
257
  - Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)
server/environment.py CHANGED
@@ -10,6 +10,7 @@ from fusion_lab.models import (
10
  StellaratorObservation,
11
  StellaratorState,
12
  )
 
13
  from server.physics import (
14
  ASPECT_RATIO_MAX,
15
  AVERAGE_TRIANGULARITY_MAX,
@@ -21,7 +22,6 @@ from server.physics import (
21
  )
22
 
23
  BUDGET: Final[int] = 6
24
- N_FIELD_PERIODS: Final[int] = 3
25
 
26
  PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
27
  "aspect_ratio": (3.2, 3.8),
@@ -37,27 +37,6 @@ PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
37
  "triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
38
  }
39
 
40
- RESET_SEEDS: Final[tuple[LowDimBoundaryParams, ...]] = (
41
- LowDimBoundaryParams(
42
- aspect_ratio=3.6,
43
- elongation=1.4,
44
- rotational_transform=1.5,
45
- triangularity_scale=0.55,
46
- ),
47
- LowDimBoundaryParams(
48
- aspect_ratio=3.4,
49
- elongation=1.4,
50
- rotational_transform=1.6,
51
- triangularity_scale=0.55,
52
- ),
53
- LowDimBoundaryParams(
54
- aspect_ratio=3.8,
55
- elongation=1.4,
56
- rotational_transform=1.5,
57
- triangularity_scale=0.55,
58
- ),
59
- )
60
-
61
  TARGET_SPEC: Final[str] = (
62
  "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
63
  "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
@@ -240,6 +219,7 @@ class StellaratorEnvironment(
240
  done: bool,
241
  initial_reference_score: float | None = None,
242
  ) -> float:
 
243
  previous_metrics = self._reference_metrics(metrics)
244
  if metrics.evaluation_failed:
245
  reward = FAILURE_PENALTY
@@ -266,6 +246,9 @@ class StellaratorEnvironment(
266
  if intent != "submit":
267
  reward -= 0.1
268
 
 
 
 
269
  if intent == "submit" or done:
270
  base_score = (
271
  initial_reference_score
@@ -347,6 +330,13 @@ class StellaratorEnvironment(
347
  f"Low-fidelity evaluation failed: {metrics.failure_reason}"
348
  )
349
 
 
 
 
 
 
 
 
350
  previous_metrics = self._reference_metrics(metrics)
351
  if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
352
  delta = previous_metrics.max_elongation - metrics.max_elongation
@@ -430,6 +420,13 @@ class StellaratorEnvironment(
430
  return self._last_successful_metrics
431
  return fallback
432
 
 
 
 
 
 
 
 
433
  def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
434
  if self._state.initial_high_fidelity_score is not None:
435
  return self._evaluate_params(self._state.initial_params, fidelity="high")
 
10
  StellaratorObservation,
11
  StellaratorState,
12
  )
13
+ from server.contract import N_FIELD_PERIODS, RESET_SEEDS
14
  from server.physics import (
15
  ASPECT_RATIO_MAX,
16
  AVERAGE_TRIANGULARITY_MAX,
 
22
  )
23
 
24
  BUDGET: Final[int] = 6
 
25
 
26
  PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
27
  "aspect_ratio": (3.2, 3.8),
 
37
  "triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
38
  }
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  TARGET_SPEC: Final[str] = (
41
  "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
42
  "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
 
219
  done: bool,
220
  initial_reference_score: float | None = None,
221
  ) -> float:
222
+ recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
223
  previous_metrics = self._reference_metrics(metrics)
224
  if metrics.evaluation_failed:
225
  reward = FAILURE_PENALTY
 
246
  if intent != "submit":
247
  reward -= 0.1
248
 
249
+ if recovered_from_failure:
250
+ reward += 1.0
251
+
252
  if intent == "submit" or done:
253
  base_score = (
254
  initial_reference_score
 
330
  f"Low-fidelity evaluation failed: {metrics.failure_reason}"
331
  )
332
 
333
+ if self._recovered_from_failed_evaluation(metrics):
334
+ return (
335
+ f"Applied {action.parameter} {action.direction} {action.magnitude}. "
336
+ "Low-fidelity evaluation recovered from the previous failed evaluation. "
337
+ f"feasibility={metrics.p1_feasibility:.6f}."
338
+ )
339
+
340
  previous_metrics = self._reference_metrics(metrics)
341
  if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
342
  delta = previous_metrics.max_elongation - metrics.max_elongation
 
420
  return self._last_successful_metrics
421
  return fallback
422
 
423
+ def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
424
+ return (
425
+ not metrics.evaluation_failed
426
+ and self._last_metrics is not None
427
+ and self._last_metrics.evaluation_failed
428
+ )
429
+
430
  def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
431
  if self._state.initial_high_fidelity_score is not None:
432
  return self._evaluate_params(self._state.initial_params, fidelity="high")