Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
eb446cf
1
Parent(s): 5271cce
fix: preserve recovery signal after VMEC failure
Browse files- docs/P1_ENV_CONTRACT_V1.md +8 -3
- docs/P1_PARAMETERIZATION_DEEPDIVE.md +10 -4
- server/environment.py +19 -22
docs/P1_ENV_CONTRACT_V1.md
CHANGED
|
@@ -201,6 +201,9 @@ Target structure:
|
|
| 201 |
- reward lower `max_elongation`
|
| 202 |
- non-submit step:
|
| 203 |
- small step cost
|
|
|
|
|
|
|
|
|
|
| 204 |
- explicit `submit`:
|
| 205 |
- better than passive budget exhaustion when the design is improved
|
| 206 |
|
|
@@ -248,6 +251,7 @@ If the answer is no, fix:
|
|
| 248 |
- the boundary family
|
| 249 |
- the step magnitudes
|
| 250 |
- the seed pool
|
|
|
|
| 251 |
|
| 252 |
before tuning reward further
|
| 253 |
|
|
@@ -260,9 +264,10 @@ before tuning reward further
|
|
| 260 |
5. Update the task summary and public action description in [server/app.py](../server/app.py).
|
| 261 |
6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
|
| 262 |
7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
|
| 263 |
-
8.
|
| 264 |
-
9.
|
| 265 |
-
10.
|
|
|
|
| 266 |
|
| 267 |
## Out of Scope
|
| 268 |
|
|
|
|
| 201 |
- reward lower `max_elongation`
|
| 202 |
- non-submit step:
|
| 203 |
- small step cost
|
| 204 |
+
- recovery after a failed evaluation:
|
| 205 |
+
- modest positive signal for returning to a valid verifier result
|
| 206 |
+
- do not compute this from the failed sentinel feasibility value
|
| 207 |
- explicit `submit`:
|
| 208 |
- better than passive budget exhaustion when the design is improved
|
| 209 |
|
|
|
|
| 251 |
- the boundary family
|
| 252 |
- the step magnitudes
|
| 253 |
- the seed pool
|
| 254 |
+
- the observation semantics around low-fi vs high-fi best-state reporting
|
| 255 |
|
| 256 |
before tuning reward further
|
| 257 |
|
|
|
|
| 264 |
5. Update the task summary and public action description in [server/app.py](../server/app.py).
|
| 265 |
6. Add explicit VMEC failure semantics in [server/environment.py](../server/environment.py).
|
| 266 |
7. Run a small measured sweep to choose ranges, deltas, and reset seeds.
|
| 267 |
+
8. Verify that observation semantics are human-readable and that low-fi versus high-fi best-state reporting is explicit.
|
| 268 |
+
9. Freeze 1-2 repaired low-dimensional fixtures.
|
| 269 |
+
10. Run manual playtesting.
|
| 270 |
+
11. Refresh the heuristic baseline only after that evidence exists.
|
| 271 |
|
| 272 |
## Out of Scope
|
| 273 |
|
docs/P1_PARAMETERIZATION_DEEPDIVE.md
CHANGED
|
@@ -216,10 +216,10 @@ is approximately `rot_transform ≥ 2.0` combined with `tri_scale ≥ 0.7`.
|
|
| 216 |
- `average_triangularity ≤ -0.5`
|
| 217 |
- `edge_rotational_transform / n_field_periods ≥ 0.3`
|
| 218 |
|
| 219 |
-
###
|
| 220 |
|
| 221 |
-
- `evaluate_params`
|
| 222 |
-
|
| 223 |
- `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
|
| 224 |
- `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
|
| 225 |
|
|
@@ -234,6 +234,7 @@ Feasibility transition: ±3.0 on crossing the feasible/infeasible boundary
|
|
| 234 |
Dual-track step shaping:
|
| 235 |
feasible + feasible → (prev_elongation - curr_elongation) * 10.0
|
| 236 |
otherwise → (prev_feasibility - curr_feasibility) * 5.0
|
|
|
|
| 237 |
Per-step cost: -0.1 for non-submit actions
|
| 238 |
Terminal bonus (submit): 5.0 * improvement_ratio + budget_efficiency
|
| 239 |
Terminal bonus (exhaust): 2.0 * improvement_ratio
|
|
@@ -242,10 +243,15 @@ Not improved penalty: -1.0 (submit) / -0.5 (exhaust)
|
|
| 242 |
|
| 243 |
### Assessment
|
| 244 |
|
| 245 |
-
**The reward is
|
| 246 |
from the verifier: `feasibility` and `objective (max_elongation)`. These are
|
| 247 |
problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
|
| 248 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
Things the reward correctly avoids:
|
| 250 |
- Per-constraint shaping (would overfit to P1's specific constraint structure)
|
| 251 |
- Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)
|
|
|
|
| 216 |
- `average_triangularity ≤ -0.5`
|
| 217 |
- `edge_rotational_transform / n_field_periods ≥ 0.3`
|
| 218 |
|
| 219 |
+
### Current implementation
|
| 220 |
|
| 221 |
+
- The old `evaluate_params` helper has been retired.
|
| 222 |
+
- The runtime is now split into:
|
| 223 |
- `build_boundary_from_params(...)` → `SurfaceRZFourier` (handles mode expansion + tri_scale injection)
|
| 224 |
- `evaluate_boundary(boundary, fidelity)` → `EvaluationMetrics` (pure evaluation, no parameterization knowledge)
|
| 225 |
|
|
|
|
| 234 |
Dual-track step shaping:
|
| 235 |
feasible + feasible → (prev_elongation - curr_elongation) * 10.0
|
| 236 |
otherwise → (prev_feasibility - curr_feasibility) * 5.0
|
| 237 |
+
Post-failure recovery: +1.0 on the first successful step after a failed evaluation
|
| 238 |
Per-step cost: -0.1 for non-submit actions
|
| 239 |
Terminal bonus (submit): 5.0 * improvement_ratio + budget_efficiency
|
| 240 |
Terminal bonus (exhaust): 2.0 * improvement_ratio
|
|
|
|
| 243 |
|
| 244 |
### Assessment
|
| 245 |
|
| 246 |
+
**The reward is still simple and should stay close to unchanged.** It mostly uses two scalars
|
| 247 |
from the verifier: `feasibility` and `objective (max_elongation)`. These are
|
| 248 |
problem-agnostic quantities that `GeometricalProblem` provides for any problem variant.
|
| 249 |
|
| 250 |
+
One small exception is now explicit: recovery from a failed VMEC evaluation gets a
|
| 251 |
+
modest fixed bonus instead of comparing against the failure sentinel. The previous
|
| 252 |
+
behavior could erase recovery signal by comparing a successful step against itself,
|
| 253 |
+
while a naive sentinel comparison would explode the reward into an unbounded spike.
|
| 254 |
+
|
| 255 |
Things the reward correctly avoids:
|
| 256 |
- Per-constraint shaping (would overfit to P1's specific constraint structure)
|
| 257 |
- Tolerance-exploit bonus (would overfit to the 1% evaluator quirk)
|
server/environment.py
CHANGED
|
@@ -10,6 +10,7 @@ from fusion_lab.models import (
|
|
| 10 |
StellaratorObservation,
|
| 11 |
StellaratorState,
|
| 12 |
)
|
|
|
|
| 13 |
from server.physics import (
|
| 14 |
ASPECT_RATIO_MAX,
|
| 15 |
AVERAGE_TRIANGULARITY_MAX,
|
|
@@ -21,7 +22,6 @@ from server.physics import (
|
|
| 21 |
)
|
| 22 |
|
| 23 |
BUDGET: Final[int] = 6
|
| 24 |
-
N_FIELD_PERIODS: Final[int] = 3
|
| 25 |
|
| 26 |
PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
|
| 27 |
"aspect_ratio": (3.2, 3.8),
|
|
@@ -37,27 +37,6 @@ PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
|
|
| 37 |
"triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
|
| 38 |
}
|
| 39 |
|
| 40 |
-
RESET_SEEDS: Final[tuple[LowDimBoundaryParams, ...]] = (
|
| 41 |
-
LowDimBoundaryParams(
|
| 42 |
-
aspect_ratio=3.6,
|
| 43 |
-
elongation=1.4,
|
| 44 |
-
rotational_transform=1.5,
|
| 45 |
-
triangularity_scale=0.55,
|
| 46 |
-
),
|
| 47 |
-
LowDimBoundaryParams(
|
| 48 |
-
aspect_ratio=3.4,
|
| 49 |
-
elongation=1.4,
|
| 50 |
-
rotational_transform=1.6,
|
| 51 |
-
triangularity_scale=0.55,
|
| 52 |
-
),
|
| 53 |
-
LowDimBoundaryParams(
|
| 54 |
-
aspect_ratio=3.8,
|
| 55 |
-
elongation=1.4,
|
| 56 |
-
rotational_transform=1.5,
|
| 57 |
-
triangularity_scale=0.55,
|
| 58 |
-
),
|
| 59 |
-
)
|
| 60 |
-
|
| 61 |
TARGET_SPEC: Final[str] = (
|
| 62 |
"Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
|
| 63 |
"from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
|
|
@@ -240,6 +219,7 @@ class StellaratorEnvironment(
|
|
| 240 |
done: bool,
|
| 241 |
initial_reference_score: float | None = None,
|
| 242 |
) -> float:
|
|
|
|
| 243 |
previous_metrics = self._reference_metrics(metrics)
|
| 244 |
if metrics.evaluation_failed:
|
| 245 |
reward = FAILURE_PENALTY
|
|
@@ -266,6 +246,9 @@ class StellaratorEnvironment(
|
|
| 266 |
if intent != "submit":
|
| 267 |
reward -= 0.1
|
| 268 |
|
|
|
|
|
|
|
|
|
|
| 269 |
if intent == "submit" or done:
|
| 270 |
base_score = (
|
| 271 |
initial_reference_score
|
|
@@ -347,6 +330,13 @@ class StellaratorEnvironment(
|
|
| 347 |
f"Low-fidelity evaluation failed: {metrics.failure_reason}"
|
| 348 |
)
|
| 349 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 350 |
previous_metrics = self._reference_metrics(metrics)
|
| 351 |
if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
|
| 352 |
delta = previous_metrics.max_elongation - metrics.max_elongation
|
|
@@ -430,6 +420,13 @@ class StellaratorEnvironment(
|
|
| 430 |
return self._last_successful_metrics
|
| 431 |
return fallback
|
| 432 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 433 |
def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
|
| 434 |
if self._state.initial_high_fidelity_score is not None:
|
| 435 |
return self._evaluate_params(self._state.initial_params, fidelity="high")
|
|
|
|
| 10 |
StellaratorObservation,
|
| 11 |
StellaratorState,
|
| 12 |
)
|
| 13 |
+
from server.contract import N_FIELD_PERIODS, RESET_SEEDS
|
| 14 |
from server.physics import (
|
| 15 |
ASPECT_RATIO_MAX,
|
| 16 |
AVERAGE_TRIANGULARITY_MAX,
|
|
|
|
| 22 |
)
|
| 23 |
|
| 24 |
BUDGET: Final[int] = 6
|
|
|
|
| 25 |
|
| 26 |
PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
|
| 27 |
"aspect_ratio": (3.2, 3.8),
|
|
|
|
| 37 |
"triangularity_scale": {"small": 0.02, "medium": 0.05, "large": 0.1},
|
| 38 |
}
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
TARGET_SPEC: Final[str] = (
|
| 41 |
"Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
|
| 42 |
"from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
|
|
|
|
| 219 |
done: bool,
|
| 220 |
initial_reference_score: float | None = None,
|
| 221 |
) -> float:
|
| 222 |
+
recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
|
| 223 |
previous_metrics = self._reference_metrics(metrics)
|
| 224 |
if metrics.evaluation_failed:
|
| 225 |
reward = FAILURE_PENALTY
|
|
|
|
| 246 |
if intent != "submit":
|
| 247 |
reward -= 0.1
|
| 248 |
|
| 249 |
+
if recovered_from_failure:
|
| 250 |
+
reward += 1.0
|
| 251 |
+
|
| 252 |
if intent == "submit" or done:
|
| 253 |
base_score = (
|
| 254 |
initial_reference_score
|
|
|
|
| 330 |
f"Low-fidelity evaluation failed: {metrics.failure_reason}"
|
| 331 |
)
|
| 332 |
|
| 333 |
+
if self._recovered_from_failed_evaluation(metrics):
|
| 334 |
+
return (
|
| 335 |
+
f"Applied {action.parameter} {action.direction} {action.magnitude}. "
|
| 336 |
+
"Low-fidelity evaluation recovered from the previous failed evaluation. "
|
| 337 |
+
f"feasibility={metrics.p1_feasibility:.6f}."
|
| 338 |
+
)
|
| 339 |
+
|
| 340 |
previous_metrics = self._reference_metrics(metrics)
|
| 341 |
if metrics.constraints_satisfied and previous_metrics.constraints_satisfied:
|
| 342 |
delta = previous_metrics.max_elongation - metrics.max_elongation
|
|
|
|
| 420 |
return self._last_successful_metrics
|
| 421 |
return fallback
|
| 422 |
|
| 423 |
+
def _recovered_from_failed_evaluation(self, metrics: EvaluationMetrics) -> bool:
|
| 424 |
+
return (
|
| 425 |
+
not metrics.evaluation_failed
|
| 426 |
+
and self._last_metrics is not None
|
| 427 |
+
and self._last_metrics.evaluation_failed
|
| 428 |
+
)
|
| 429 |
+
|
| 430 |
def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
|
| 431 |
if self._state.initial_high_fidelity_score is not None:
|
| 432 |
return self._evaluate_params(self._state.initial_params, fidelity="high")
|