Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 8

Commit

eb446cf

1 Parent(s): 5271cce

fix: preserve recovery signal after VMEC failure

Browse files

Files changed (3) hide show

docs/P1_ENV_CONTRACT_V1.md +8 -3
docs/P1_PARAMETERIZATION_DEEPDIVE.md +10 -4
server/environment.py +19 -22

docs/P1_ENV_CONTRACT_V1.md CHANGED Viewed

@@ -201,6 +201,9 @@ Target structure:
   - reward lower `max_elongation`
 - non-submit step:
   - small step cost
 - explicit `submit`:
   - better than passive budget exhaustion when the design is improved
@@ -248,6 +251,7 @@ If the answer is no, fix:
 - the boundary family
 - the step magnitudes
 - the seed pool
 before tuning reward further
@@ -260,9 +264,10 @@ before tuning reward further
 5. Update the task summary and public action description in [server/app.py](../server/app.py).
 6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
 7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
-8. Freeze 1-2 repaired low-dimensional fixtures.
-9. Run manual playtesting.
-10. Refresh the heuristic baseline only after that evidence exists.
 ## Out of Scope

   - reward lower `max_elongation`
 - non-submit step:
   - small step cost
+- recovery after a failed evaluation:
+  - modest positive signal for returning to a valid verifier result
+  - do not compute this from the failed sentinel feasibility value
 - explicit `submit`:
   - better than passive budget exhaustion when the design is improved
 - the boundary family
 - the step magnitudes
 - the seed pool
+- the observation semantics around low-fi vs high-fi best-state reporting
 before tuning reward further
 5. Update the task summary and public action description in [server/app.py](../server/app.py).
 6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
 7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
+8. Verify that observation semantics are human-readable and that low-fi versus high-fi best-state reporting is explicit.
+9. Freeze 1-2 repaired low-dimensional fixtures.
+10. Run manual playtesting.
+11. Refresh the heuristic baseline only after that evidence exists.
 ## Out of Scope

docs/P1_PARAMETERIZATION_DEEPDIVE.md CHANGED Viewed

@@ -216,10 +216,10 @@ is approximately `rot_transform ≥ 2.0` combined with `tri_scale ≥ 0.7`.
    - `average_triangularity ≤ -0.5`
    - `edge_rotational_transform / n_field_periods ≥ 0.3`
-### What needs to change
-- `evaluate_params` currently takes `RotatingEllipseParams` and calls
-  `generate_rotating_ellipse` directly. It should be split into:
   - `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
   - `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
@@ -234,6 +234,7 @@ Feasibility transition:      ±3.0 on crossing the feasible/infeasible boundary
 Dual-track step shaping:
   feasible + feasible     →  (prev_elongation - curr_elongation) * 10.0
   otherwise               →  (prev_feasibility - curr_feasibility) * 5.0
 Per-step cost:               -0.1 for non-submit actions
 Terminal bonus (submit):      5.0 * improvement_ratio + budget_efficiency
 Terminal bonus (exhaust):     2.0 * improvement_ratio
@@ -242,10 +243,15 @@ Not improved penalty:        -1.0 (submit) / -0.5 (exhaust)
 ### Assessment
-**The reward is well-designed and should stay unchanged.** It only uses two scalars
 from the verifier: `feasibility` and `objective (max_elongation)`. These are
 problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
 Things the reward correctly avoids:
 - Per-constraint shaping (would overfit to P1's specific constraint structure)
 - Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)

    - `average_triangularity ≤ -0.5`
    - `edge_rotational_transform / n_field_periods ≥ 0.3`
+### Current implementation
+- The old `evaluate_params` helper has been retired.
+- The runtime is now split into:
   - `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
   - `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
 Dual-track step shaping:
   feasible + feasible     →  (prev_elongation - curr_elongation) * 10.0
   otherwise               →  (prev_feasibility - curr_feasibility) * 5.0
+Post-failure recovery:       +1.0 on the first successful step after a failed evaluation
 Per-step cost:               -0.1 for non-submit actions
 Terminal bonus (submit):      5.0 * improvement_ratio + budget_efficiency
 Terminal bonus (exhaust):     2.0 * improvement_ratio
 ### Assessment
+**The reward is still simple and should stay close to unchanged.** It mostly uses two scalars
 from the verifier: `feasibility` and `objective (max_elongation)`. These are
 problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
+One small exception is now explicit: recovery from a failed VMEC evaluation gets a
+modest fixed bonus instead of comparing against the failure sentinel. The previous
+behavior could erase recovery signal by comparing a successful step against itself,
+while a naive sentinel comparison would explode the reward into an unbounded spike.
 Things the reward correctly avoids:
 - Per-constraint shaping (would overfit to P1's specific constraint structure)
 - Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)

server/environment.py CHANGED Viewed

@@ -10,6 +10,7 @@ from fusion_lab.models import (
     StellaratorObservation,
     StellaratorState,
 )
 from server.physics import (
     ASPECT_RATIO_MAX,
     AVERAGE_TRIANGULARITY_MAX,
@@ -21,7 +22,6 @@ from server.physics import (
 )
 BUDGET: Final[int] = 6
-N_FIELD_PERIODS: Final[int] = 3
 PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
     "aspect_ratio": (3.2, 3.8),
@@ -37,27 +37,6 @@ PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
     "triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
 }
-RESET_SEEDS: Final[tuple[LowDimBoundaryParams, ...]] = (
-    LowDimBoundaryParams(
-        aspect_ratio=3.6,
-        elongation=1.4,
-        rotational_transform=1.5,
-        triangularity_scale=0.55,
-    ),
-    LowDimBoundaryParams(
-        aspect_ratio=3.4,
-        elongation=1.4,
-        rotational_transform=1.6,
-        triangularity_scale=0.55,
-    ),
-    LowDimBoundaryParams(
-        aspect_ratio=3.8,
-        elongation=1.4,
-        rotational_transform=1.5,
-        triangularity_scale=0.55,
-    ),
-)
 TARGET_SPEC: Final[str] = (
     "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
     "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
@@ -240,6 +219,7 @@ class StellaratorEnvironment(
         done: bool,
         initial_reference_score: float | None = None,
     ) -> float:
         previous_metrics = self._reference_metrics(metrics)
         if metrics.evaluation_failed:
             reward = FAILURE_PENALTY
@@ -266,6 +246,9 @@ class StellaratorEnvironment(
         if intent != "submit":
             reward -= 0.1
         if intent == "submit" or done:
             base_score = (
                 initial_reference_score
@@ -347,6 +330,13 @@ class StellaratorEnvironment(
                 f"Low-fidelity evaluation failed: {metrics.failure_reason}"
             )
         previous_metrics = self._reference_metrics(metrics)
         if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
             delta = previous_metrics.max_elongation - metrics.max_elongation
@@ -430,6 +420,13 @@ class StellaratorEnvironment(
             return self._last_successful_metrics
         return fallback
     def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
         if self._state.initial_high_fidelity_score is not None:
             return self._evaluate_params(self._state.initial_params, fidelity="high")

     StellaratorObservation,
     StellaratorState,
 )
+from server.contract import N_FIELD_PERIODS, RESET_SEEDS
 from server.physics import (
     ASPECT_RATIO_MAX,
     AVERAGE_TRIANGULARITY_MAX,
 )
 BUDGET: Final[int] = 6
 PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
     "aspect_ratio": (3.2, 3.8),
     "triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
 }
 TARGET_SPEC: Final[str] = (
     "Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
     "from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
         done: bool,
         initial_reference_score: float | None = None,
     ) -> float:
+        recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
         previous_metrics = self._reference_metrics(metrics)
         if metrics.evaluation_failed:
             reward = FAILURE_PENALTY
         if intent != "submit":
             reward -= 0.1
+        if recovered_from_failure:
+            reward += 1.0
         if intent == "submit" or done:
             base_score = (
                 initial_reference_score
                 f"Low-fidelity evaluation failed: {metrics.failure_reason}"
             )
+        if self._recovered_from_failed_evaluation(metrics):
+            return (
+                f"Applied {action.parameter} {action.direction} {action.magnitude}. "
+                "Low-fidelity evaluation recovered from the previous failed evaluation. "
+                f"feasibility={metrics.p1_feasibility:.6f}."
+            )
         previous_metrics = self._reference_metrics(metrics)
         if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
             delta = previous_metrics.max_elongation - metrics.max_elongation
             return self._last_successful_metrics
         return fallback
+    def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
+        return (
+            not metrics.evaluation_failed
+            and self._last_metrics is not None
+            and self._last_metrics.evaluation_failed
+        )
     def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
         if self._state.initial_high_fidelity_score is not None:
             return self._evaluate_params(self._state.initial_params, fidelity="high")