Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
2fccde8
1
Parent(s): 2cb6617
feat: add verifier-native reward v1
Browse files- README.md +2 -2
- TODO.md +7 -5
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +1 -1
- docs/P1_ENV_CONTRACT_V1.md +18 -5
- docs/P1_MANUAL_PLAYTEST_LOG.md +3 -1
- docs/P1_PARAMETERIZATION_DEEPDIVE.md +1 -1
- fusion_lab/llm_agent.py +6 -2
- fusion_lab/models.py +13 -0
- models.py +2 -0
- server/data/p1/lowfi_feasible_local.json +1 -1
- server/environment.py +58 -7
- server/physics.py +55 -1
- training/notebooks/fusion_design_lab_training.ipynb +2 -2
README.md
CHANGED
|
@@ -13,7 +13,7 @@ An RL environment where agents optimize stellarator fusion reactor designs by ad
|
|
| 13 |
|---|---|
|
| 14 |
| `aspect_ratio` | ≤ 4.0 |
|
| 15 |
| `average_triangularity` | ≤ -0.5 |
|
| 16 |
-
| `edge_iota_over_nfp` | ≥ 0.3 |
|
| 17 |
|
| 18 |
The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. The live environment still exposes **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), but the standard GRPO notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on the low-fidelity `run` surface and ignore `submit` by default.
|
| 19 |
|
|
@@ -66,7 +66,7 @@ The environment uses [`constellaration`](https://pypi.org/project/constellaratio
|
|
| 66 |
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
|
| 67 |
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
|
| 68 |
- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
|
| 69 |
-
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `
|
| 70 |
- The standard LLM training and evaluation workflow is now low-fidelity-only: the repo notebook and `training/llm_rollout.py` `monitor` / `evaluate` ignore `submit` by default. Reserve `submit` for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence.
|
| 71 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 72 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
|
|
|
| 13 |
|---|---|
|
| 14 |
| `aspect_ratio` | ≤ 4.0 |
|
| 15 |
| `average_triangularity` | ≤ -0.5 |
|
| 16 |
+
| `abs(edge_iota_over_nfp)` | ≥ 0.3 |
|
| 17 |
|
| 18 |
The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. The live environment still exposes **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit), but the standard GRPO notebook and `training/llm_rollout.py` `monitor` / `evaluate` workflows stay on the low-fidelity `run` surface and ignore `submit` by default.
|
| 19 |
|
|
|
|
| 66 |
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
|
| 67 |
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
|
| 68 |
- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
|
| 69 |
+
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `from_boundary_resolution`; do not present step-time metrics as final submission metrics.
|
| 70 |
- The standard LLM training and evaluation workflow is now low-fidelity-only: the repo notebook and `training/llm_rollout.py` `monitor` / `evaluate` ignore `submit` by default. Reserve `submit` for explicit replay/debug work, paired fixture checks, submit-side traces, and final evidence.
|
| 71 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 72 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
TODO.md
CHANGED
|
@@ -72,7 +72,7 @@ flowchart TD
|
|
| 72 |
|
| 73 |
- [x] Lock the exact `P1` environment contract
|
| 74 |
Goal:
|
| 75 |
-
freeze observation schema, action schema, episode loop, terminal conditions, and
|
| 76 |
Related:
|
| 77 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 78 |
[Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
|
@@ -213,16 +213,16 @@ flowchart TD
|
|
| 213 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 214 |
[Deliverables Map](docs/archive/FUSION_DELIVERABLES_MAP.md)
|
| 215 |
|
| 216 |
-
- [
|
| 217 |
Goal:
|
| 218 |
keep a short exploit -> fix -> behavior improvement story
|
| 219 |
Related:
|
| 220 |
[AGENTS.md](AGENTS.md),
|
| 221 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 222 |
|
| 223 |
-
- [
|
| 224 |
Goal:
|
| 225 |
-
|
| 226 |
Related:
|
| 227 |
[README.md](README.md),
|
| 228 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
|
@@ -294,4 +294,6 @@ flowchart TD
|
|
| 294 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 295 |
- [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
|
| 296 |
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
|
| 297 |
-
- [
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
- [x] Lock the exact `P1` environment contract
|
| 74 |
Goal:
|
| 75 |
+
freeze observation schema, action schema, episode loop, terminal conditions, and the live reward contract
|
| 76 |
Related:
|
| 77 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 78 |
[Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
|
|
|
| 213 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 214 |
[Deliverables Map](docs/archive/FUSION_DELIVERABLES_MAP.md)
|
| 215 |
|
| 216 |
+
- [x] Update reward from `V0` to `V1` after playtesting exposed a real repair-path pathology
|
| 217 |
Goal:
|
| 218 |
keep a short exploit -> fix -> behavior improvement story
|
| 219 |
Related:
|
| 220 |
[AGENTS.md](AGENTS.md),
|
| 221 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 222 |
|
| 223 |
+
- [x] Write down why `Reward V0` did not survive unchanged
|
| 224 |
Goal:
|
| 225 |
+
document the concrete pathology: pure `Δ official_feasibility` hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations
|
| 226 |
Related:
|
| 227 |
[README.md](README.md),
|
| 228 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
|
|
|
| 294 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 295 |
- [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
|
| 296 |
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
|
| 297 |
+
- [x] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
|
| 298 |
+
Note:
|
| 299 |
+
completed by recording the concrete `Reward V0` pathology and only then moving to `Reward V1`
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -138,7 +138,7 @@ The environment contract must stay narrow and legible:
|
|
| 138 |
- low-fidelity verifier for normal steps
|
| 139 |
- high-fidelity verifier for `submit`
|
| 140 |
- readable observation surface with explicit fidelity labeling
|
| 141 |
-
- `Reward
|
| 142 |
|
| 143 |
The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
|
| 144 |
|
|
|
|
| 138 |
- low-fidelity verifier for normal steps
|
| 139 |
- high-fidelity verifier for `submit`
|
| 140 |
- readable observation surface with explicit fidelity labeling
|
| 141 |
+
- `Reward V1` kept verifier-native and repair-first, with official normalized violation telemetry
|
| 142 |
|
| 143 |
The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md), not here.
|
| 144 |
|
docs/P1_ENV_CONTRACT_V1.md
CHANGED
|
@@ -92,6 +92,10 @@ Required fields:
|
|
| 92 |
- `aspect_ratio`
|
| 93 |
- `average_triangularity`
|
| 94 |
- `edge_iota_over_nfp`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
- `p1_feasibility`
|
| 96 |
- `p1_score`
|
| 97 |
- `constraints_satisfied`
|
|
@@ -118,6 +122,8 @@ Interpretation rules:
|
|
| 118 |
- high-fidelity `submit` metrics must be labeled as high-fidelity
|
| 119 |
- low-fidelity and high-fidelity best-state reporting must stay separate
|
| 120 |
- the observation must be understandable without hidden state
|
|
|
|
|
|
|
| 121 |
- reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
|
| 122 |
- action telemetry must expose parameter values before and after the action, including clamped and no-op moves
|
| 123 |
|
|
@@ -183,24 +189,31 @@ Training and evaluation rule:
|
|
| 183 |
- keep higher-fidelity `submit` for terminal truth, explicit replay/debug work, paired fixture checks, and submit-side manual traces
|
| 184 |
- do not move VMEC-backed submit evaluation into every training step unless the contract is deliberately redefined
|
| 185 |
|
| 186 |
-
## 9. Reward
|
| 187 |
|
| 188 |
-
`Reward
|
|
|
|
|
|
|
|
|
|
| 189 |
|
| 190 |
Target behavior:
|
| 191 |
|
| 192 |
- infeasible to feasible crossing gets a clear positive bonus
|
| 193 |
- feasible to infeasible regression gets a clear penalty
|
| 194 |
-
- when both states are infeasible, reduced official feasibility violation should help
|
|
|
|
|
|
|
| 195 |
- when both states are feasible, lower `max_elongation` should help
|
| 196 |
-
-
|
|
|
|
| 197 |
- `submit` should be better than passive exhaustion when the design is genuinely improved
|
| 198 |
- recovery after a failed evaluation may receive a modest bounded bonus
|
| 199 |
|
| 200 |
Rules:
|
| 201 |
|
| 202 |
- keep reward scalar and verifier-driven
|
| 203 |
-
-
|
|
|
|
| 204 |
- do not use reward complexity to compensate for blocked parameterization, poor seeds, or unclear observations
|
| 205 |
|
| 206 |
## 10. Reset and Fixture Policy
|
|
|
|
| 92 |
- `aspect_ratio`
|
| 93 |
- `average_triangularity`
|
| 94 |
- `edge_iota_over_nfp`
|
| 95 |
+
- `aspect_ratio_violation`
|
| 96 |
+
- `triangularity_violation`
|
| 97 |
+
- `iota_violation`
|
| 98 |
+
- `dominant_constraint`
|
| 99 |
- `p1_feasibility`
|
| 100 |
- `p1_score`
|
| 101 |
- `constraints_satisfied`
|
|
|
|
| 122 |
- high-fidelity `submit` metrics must be labeled as high-fidelity
|
| 123 |
- low-fidelity and high-fidelity best-state reporting must stay separate
|
| 124 |
- the observation must be understandable without hidden state
|
| 125 |
+
- normalized constraint-violation telemetry must follow the official `P1` constraint scales
|
| 126 |
+
- the dominant active constraint must be visible so a human can explain repair-phase rewards
|
| 127 |
- reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward
|
| 128 |
- action telemetry must expose parameter values before and after the action, including clamped and no-op moves
|
| 129 |
|
|
|
|
| 189 |
- keep higher-fidelity `submit` for terminal truth, explicit replay/debug work, paired fixture checks, and submit-side manual traces
|
| 190 |
- do not move VMEC-backed submit evaluation into every training step unless the contract is deliberately redefined
|
| 191 |
|
| 192 |
+
## 9. Reward V1
|
| 193 |
|
| 194 |
+
`Reward V1` replaces `Reward V0` because the old infeasible shaping only used `Δ official_feasibility`.
|
| 195 |
+
That was too coarse once the transferred P1 findings made the main pathology clear: official
|
| 196 |
+
feasibility is a max over normalized constraint violations, so useful repair steps on
|
| 197 |
+
non-dominant constraints could be nearly invisible to the reward.
|
| 198 |
|
| 199 |
Target behavior:
|
| 200 |
|
| 201 |
- infeasible to feasible crossing gets a clear positive bonus
|
| 202 |
- feasible to infeasible regression gets a clear penalty
|
| 203 |
+
- when both states are infeasible, reduced official feasibility violation should still help
|
| 204 |
+
- when both states are infeasible, reduced normalized triangularity violation should help the most
|
| 205 |
+
- when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help
|
| 206 |
- when both states are feasible, lower `max_elongation` should help
|
| 207 |
+
- larger `run` actions should pay a larger step cost than smaller `run` actions
|
| 208 |
+
- `restore_best` should keep a flat non-submit step cost
|
| 209 |
- `submit` should be better than passive exhaustion when the design is genuinely improved
|
| 210 |
- recovery after a failed evaluation may receive a modest bounded bonus
|
| 211 |
|
| 212 |
Rules:
|
| 213 |
|
| 214 |
- keep reward scalar and verifier-driven
|
| 215 |
+
- keep the infeasible shaping tied to official normalized constraint violations, not family-name priors
|
| 216 |
+
- do not add family-specific reward shaping from `scadena`, `CreativeEngineer`, `Samet`, or `egodos`
|
| 217 |
- do not use reward complexity to compensate for blocked parameterization, poor seeds, or unclear observations
|
| 218 |
|
| 219 |
## 10. Reset and Fixture Policy
|
docs/P1_MANUAL_PLAYTEST_LOG.md
CHANGED
|
@@ -49,7 +49,7 @@ Step 1:
|
|
| 49 |
|
| 50 |
Current conclusion:
|
| 51 |
|
| 52 |
-
- Reward V0
|
| 53 |
- a real `submit` trace is now recorded; next manual validation is to extend beyond the initial 5-10 episode path and look for one clear exploit or ambiguity
|
| 54 |
|
| 55 |
Episode C: submit-side manual trace
|
|
@@ -80,3 +80,5 @@ Step sequence:
|
|
| 80 |
Artifact:
|
| 81 |
|
| 82 |
- [manual submit trace JSON](../baselines/submit_side_trace.json)
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
Current conclusion:
|
| 51 |
|
| 52 |
+
- At the time of this initial playtest, Reward V0 was legible on the low-fidelity repair path around the default reset seed
|
| 53 |
- a real `submit` trace is now recorded; next manual validation is to extend beyond the initial 5-10 episode path and look for one clear exploit or ambiguity
|
| 54 |
|
| 55 |
Episode C: submit-side manual trace
|
|
|
|
| 80 |
Artifact:
|
| 81 |
|
| 82 |
- [manual submit trace JSON](../baselines/submit_side_trace.json)
|
| 83 |
+
Note:
|
| 84 |
+
this is a historical submit-side artifact from the earlier Reward V0 / pre-telemetry contract surface. Keep it as supporting evidence for the old submit path, not as the current Reward V1 observation-format example.
|
docs/P1_PARAMETERIZATION_DEEPDIVE.md
CHANGED
|
@@ -37,7 +37,7 @@ Observed behavior:
|
|
| 37 |
|
| 38 |
`generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` does not meaningfully expose the Fourier mode that controls triangularity.
|
| 39 |
|
| 40 |
-
The historical `rotational_transform` range was also too low to reach the `edge_iota_over_nfp >= 0.3` requirement reliably.
|
| 41 |
|
| 42 |
## 2. Original Winning Session
|
| 43 |
|
|
|
|
| 37 |
|
| 38 |
`generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` does not meaningfully expose the Fourier mode that controls triangularity.
|
| 39 |
|
| 40 |
+
The historical `rotational_transform` range was also too low to reach the `abs(edge_iota_over_nfp) >= 0.3` requirement reliably.
|
| 41 |
|
| 42 |
## 2. Original Winning Session
|
| 43 |
|
fusion_lab/llm_agent.py
CHANGED
|
@@ -45,7 +45,7 @@ Action rules:
|
|
| 45 |
Constraint directions:
|
| 46 |
- aspect_ratio <= 4.0
|
| 47 |
- average_triangularity <= -0.5
|
| 48 |
-
- edge_iota_over_nfp >= 0.3"""
|
| 49 |
|
| 50 |
|
| 51 |
def _extract_json_array(text: str) -> str | None:
|
|
@@ -143,7 +143,11 @@ def format_observation(observation: StellaratorObservation) -> str:
|
|
| 143 |
f"- average_triangularity: {observation.average_triangularity:.6f} "
|
| 144 |
"(must stay <= -0.5)\n"
|
| 145 |
f"- edge_iota_over_nfp: {observation.edge_iota_over_nfp:.4f} "
|
| 146 |
-
"(must
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
f"- p1_score: {observation.p1_score:.4f}\n"
|
| 148 |
f"- p1_feasibility: {observation.p1_feasibility:.6f}\n"
|
| 149 |
f"- constraints_satisfied: {observation.constraints_satisfied}\n"
|
|
|
|
| 45 |
Constraint directions:
|
| 46 |
- aspect_ratio <= 4.0
|
| 47 |
- average_triangularity <= -0.5
|
| 48 |
+
- abs(edge_iota_over_nfp) >= 0.3"""
|
| 49 |
|
| 50 |
|
| 51 |
def _extract_json_array(text: str) -> str | None:
|
|
|
|
| 143 |
f"- average_triangularity: {observation.average_triangularity:.6f} "
|
| 144 |
"(must stay <= -0.5)\n"
|
| 145 |
f"- edge_iota_over_nfp: {observation.edge_iota_over_nfp:.4f} "
|
| 146 |
+
"(must satisfy abs(.) >= 0.3)\n"
|
| 147 |
+
f"- aspect_ratio_violation: {observation.aspect_ratio_violation:.6f}\n"
|
| 148 |
+
f"- triangularity_violation: {observation.triangularity_violation:.6f}\n"
|
| 149 |
+
f"- iota_violation: {observation.iota_violation:.6f}\n"
|
| 150 |
+
f"- dominant_constraint: {observation.dominant_constraint}\n"
|
| 151 |
f"- p1_score: {observation.p1_score:.4f}\n"
|
| 152 |
f"- p1_feasibility: {observation.p1_feasibility:.6f}\n"
|
| 153 |
f"- constraints_satisfied: {observation.constraints_satisfied}\n"
|
fusion_lab/models.py
CHANGED
|
@@ -6,6 +6,12 @@ from openenv.core import Action, Observation, State
|
|
| 6 |
from pydantic import BaseModel, Field
|
| 7 |
|
| 8 |
ActionIntent = Literal["run", "submit", "restore_best"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
ParameterName = Literal[
|
| 10 |
"aspect_ratio",
|
| 11 |
"elongation",
|
|
@@ -51,6 +57,9 @@ class RewardBreakdown(BaseModel):
|
|
| 51 |
feasibility_crossing_bonus: float = 0.0
|
| 52 |
feasibility_regression_penalty: float = 0.0
|
| 53 |
feasibility_delta_reward: float = 0.0
|
|
|
|
|
|
|
|
|
|
| 54 |
objective_delta_reward: float = 0.0
|
| 55 |
step_cost: float = 0.0
|
| 56 |
recovery_bonus: float = 0.0
|
|
@@ -94,6 +103,10 @@ class StellaratorObservation(Observation):
|
|
| 94 |
aspect_ratio: float = 0.0
|
| 95 |
average_triangularity: float = 0.0
|
| 96 |
edge_iota_over_nfp: float = 0.0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
p1_score: float = 0.0
|
| 98 |
p1_feasibility: float = 0.0
|
| 99 |
vacuum_well: float = 0.0
|
|
|
|
| 6 |
from pydantic import BaseModel, Field
|
| 7 |
|
| 8 |
ActionIntent = Literal["run", "submit", "restore_best"]
|
| 9 |
+
ConstraintName = Literal[
|
| 10 |
+
"none",
|
| 11 |
+
"aspect_ratio",
|
| 12 |
+
"average_triangularity",
|
| 13 |
+
"edge_iota_over_nfp",
|
| 14 |
+
]
|
| 15 |
ParameterName = Literal[
|
| 16 |
"aspect_ratio",
|
| 17 |
"elongation",
|
|
|
|
| 57 |
feasibility_crossing_bonus: float = 0.0
|
| 58 |
feasibility_regression_penalty: float = 0.0
|
| 59 |
feasibility_delta_reward: float = 0.0
|
| 60 |
+
aspect_ratio_repair_reward: float = 0.0
|
| 61 |
+
triangularity_repair_reward: float = 0.0
|
| 62 |
+
iota_repair_reward: float = 0.0
|
| 63 |
objective_delta_reward: float = 0.0
|
| 64 |
step_cost: float = 0.0
|
| 65 |
recovery_bonus: float = 0.0
|
|
|
|
| 103 |
aspect_ratio: float = 0.0
|
| 104 |
average_triangularity: float = 0.0
|
| 105 |
edge_iota_over_nfp: float = 0.0
|
| 106 |
+
aspect_ratio_violation: float = 0.0
|
| 107 |
+
triangularity_violation: float = 0.0
|
| 108 |
+
iota_violation: float = 0.0
|
| 109 |
+
dominant_constraint: ConstraintName = "none"
|
| 110 |
p1_score: float = 0.0
|
| 111 |
p1_feasibility: float = 0.0
|
| 112 |
vacuum_well: float = 0.0
|
models.py
CHANGED
|
@@ -3,6 +3,7 @@
|
|
| 3 |
from fusion_lab.models import (
|
| 4 |
ActionMonitor,
|
| 5 |
ActionIntent,
|
|
|
|
| 6 |
DirectionName,
|
| 7 |
EvaluationFidelityName,
|
| 8 |
LowDimBoundaryParams,
|
|
@@ -20,6 +21,7 @@ from fusion_lab.models import (
|
|
| 20 |
__all__ = [
|
| 21 |
"ActionIntent",
|
| 22 |
"ActionMonitor",
|
|
|
|
| 23 |
"DirectionName",
|
| 24 |
"EvaluationFidelityName",
|
| 25 |
"LowDimBoundaryParams",
|
|
|
|
| 3 |
from fusion_lab.models import (
|
| 4 |
ActionMonitor,
|
| 5 |
ActionIntent,
|
| 6 |
+
ConstraintName,
|
| 7 |
DirectionName,
|
| 8 |
EvaluationFidelityName,
|
| 9 |
LowDimBoundaryParams,
|
|
|
|
| 21 |
__all__ = [
|
| 22 |
"ActionIntent",
|
| 23 |
"ActionMonitor",
|
| 24 |
+
"ConstraintName",
|
| 25 |
"DirectionName",
|
| 26 |
"EvaluationFidelityName",
|
| 27 |
"LowDimBoundaryParams",
|
server/data/p1/lowfi_feasible_local.json
CHANGED
|
@@ -3,7 +3,7 @@
|
|
| 3 |
"status": "low_fidelity_calibrated",
|
| 4 |
"notes": [
|
| 5 |
"Local repair target reached from the default reset band by increasing rotational_transform and triangularity_scale.",
|
| 6 |
-
"Useful as a low-fidelity feasibility reference for
|
| 7 |
"High-fidelity submit spot check is complete."
|
| 8 |
],
|
| 9 |
"params": {
|
|
|
|
| 3 |
"status": "low_fidelity_calibrated",
|
| 4 |
"notes": [
|
| 5 |
"Local repair target reached from the default reset band by increasing rotational_transform and triangularity_scale.",
|
| 6 |
+
"Useful as a low-fidelity feasibility reference for reward sanity checks.",
|
| 7 |
"High-fidelity submit spot check is complete."
|
| 8 |
],
|
| 9 |
"params": {
|
server/environment.py
CHANGED
|
@@ -8,6 +8,7 @@ from fusion_lab.models import (
|
|
| 8 |
ActionMonitor,
|
| 9 |
ActionIntent,
|
| 10 |
LowDimBoundaryParams,
|
|
|
|
| 11 |
RewardBreakdown,
|
| 12 |
StellaratorAction,
|
| 13 |
StellaratorObservation,
|
|
@@ -43,12 +44,22 @@ PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
|
|
| 43 |
TARGET_SPEC: Final[str] = (
|
| 44 |
"Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
|
| 45 |
"from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
|
| 46 |
-
"triangularity <= -0.5, edge rotational transform / n_field_periods >= 0.3. "
|
| 47 |
"Run actions use low-fidelity verification. Submit uses high-fidelity verification. "
|
| 48 |
"Budget: 6 evaluations."
|
| 49 |
)
|
| 50 |
|
| 51 |
FAILURE_PENALTY: Final[float] = -2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
|
| 54 |
class StellaratorEnvironment(
|
|
@@ -150,7 +161,12 @@ class StellaratorEnvironment(
|
|
| 150 |
self._update_best(params, metrics)
|
| 151 |
|
| 152 |
done = self._state.budget_remaining <= 0
|
| 153 |
-
reward_breakdown = self._compute_reward_breakdown(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
reward = reward_breakdown.total
|
| 155 |
summary = self._summary_run(action, metrics, action_monitor)
|
| 156 |
self._state.history.append(summary)
|
|
@@ -265,6 +281,7 @@ class StellaratorEnvironment(
|
|
| 265 |
metrics: EvaluationMetrics,
|
| 266 |
intent: ActionIntent,
|
| 267 |
done: bool,
|
|
|
|
| 268 |
initial_reference_score: float | None = None,
|
| 269 |
) -> RewardBreakdown:
|
| 270 |
recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
|
|
@@ -282,7 +299,7 @@ class StellaratorEnvironment(
|
|
| 282 |
if metrics.evaluation_failed:
|
| 283 |
breakdown.failure_penalty = FAILURE_PENALTY
|
| 284 |
if intent != "submit":
|
| 285 |
-
breakdown.step_cost =
|
| 286 |
if intent == "submit":
|
| 287 |
breakdown.failure_submit_penalty = -1.0
|
| 288 |
elif done:
|
|
@@ -302,10 +319,19 @@ class StellaratorEnvironment(
|
|
| 302 |
else:
|
| 303 |
breakdown.feasibility_delta_reward = (
|
| 304 |
previous_metrics.p1_feasibility - metrics.p1_feasibility
|
| 305 |
-
) *
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 306 |
|
| 307 |
if intent != "submit":
|
| 308 |
-
breakdown.step_cost =
|
| 309 |
|
| 310 |
if recovered_from_failure:
|
| 311 |
breakdown.recovery_bonus = 1.0
|
|
@@ -365,8 +391,15 @@ class StellaratorEnvironment(
|
|
| 365 |
f"max_elongation={metrics.max_elongation:.4f}",
|
| 366 |
f"aspect_ratio={metrics.aspect_ratio:.4f} (<= {ASPECT_RATIO_MAX:.1f})",
|
| 367 |
f"average_triangularity={metrics.average_triangularity:.4f} (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
|
| 368 |
-
|
|
|
|
|
|
|
|
|
|
| 369 |
f"feasibility={metrics.p1_feasibility:.6f}",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
|
| 371 |
f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
|
| 372 |
(
|
|
@@ -394,6 +427,10 @@ class StellaratorEnvironment(
|
|
| 394 |
aspect_ratio=metrics.aspect_ratio,
|
| 395 |
average_triangularity=metrics.average_triangularity,
|
| 396 |
edge_iota_over_nfp=metrics.edge_iota_over_nfp,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 397 |
p1_score=metrics.p1_score,
|
| 398 |
p1_feasibility=metrics.p1_feasibility,
|
| 399 |
vacuum_well=metrics.vacuum_well,
|
|
@@ -450,7 +487,8 @@ class StellaratorEnvironment(
|
|
| 450 |
else:
|
| 451 |
delta = previous_metrics.p1_feasibility - metrics.p1_feasibility
|
| 452 |
objective_summary = (
|
| 453 |
-
f"feasibility changed by {delta:+.6f} to {metrics.p1_feasibility:.6f}
|
|
|
|
| 454 |
)
|
| 455 |
return (
|
| 456 |
f"Applied {action.parameter} {action.direction} {action.magnitude}. "
|
|
@@ -606,6 +644,13 @@ class StellaratorEnvironment(
|
|
| 606 |
return "The requested move was clipped to stay inside the allowed parameter range. "
|
| 607 |
return ""
|
| 608 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 609 |
def _reward_total(self, breakdown: RewardBreakdown) -> float:
|
| 610 |
total = (
|
| 611 |
breakdown.invalid_action_penalty
|
|
@@ -615,6 +660,9 @@ class StellaratorEnvironment(
|
|
| 615 |
+ breakdown.feasibility_crossing_bonus
|
| 616 |
+ breakdown.feasibility_regression_penalty
|
| 617 |
+ breakdown.feasibility_delta_reward
|
|
|
|
|
|
|
|
|
|
| 618 |
+ breakdown.objective_delta_reward
|
| 619 |
+ breakdown.step_cost
|
| 620 |
+ breakdown.recovery_bonus
|
|
@@ -633,6 +681,9 @@ class StellaratorEnvironment(
|
|
| 633 |
("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
|
| 634 |
("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
|
| 635 |
("feasibility_delta_reward", breakdown.feasibility_delta_reward),
|
|
|
|
|
|
|
|
|
|
| 636 |
("objective_delta_reward", breakdown.objective_delta_reward),
|
| 637 |
("step_cost", breakdown.step_cost),
|
| 638 |
("recovery_bonus", breakdown.recovery_bonus),
|
|
|
|
| 8 |
ActionMonitor,
|
| 9 |
ActionIntent,
|
| 10 |
LowDimBoundaryParams,
|
| 11 |
+
MagnitudeName,
|
| 12 |
RewardBreakdown,
|
| 13 |
StellaratorAction,
|
| 14 |
StellaratorObservation,
|
|
|
|
| 44 |
TARGET_SPEC: Final[str] = (
|
| 45 |
"Optimize the P1 benchmark using a custom low-dimensional boundary family derived "
|
| 46 |
"from a rotating-ellipse seed. Constraints: aspect ratio <= 4.0, average "
|
| 47 |
+
"triangularity <= -0.5, abs(edge rotational transform / n_field_periods) >= 0.3. "
|
| 48 |
"Run actions use low-fidelity verification. Submit uses high-fidelity verification. "
|
| 49 |
"Budget: 6 evaluations."
|
| 50 |
)
|
| 51 |
|
| 52 |
FAILURE_PENALTY: Final[float] = -2.0
|
| 53 |
+
FEASIBILITY_DELTA_WEIGHT: Final[float] = 2.0
|
| 54 |
+
TRIANGULARITY_REPAIR_WEIGHT: Final[float] = 2.0
|
| 55 |
+
ASPECT_RATIO_REPAIR_WEIGHT: Final[float] = 1.0
|
| 56 |
+
IOTA_REPAIR_WEIGHT: Final[float] = 1.0
|
| 57 |
+
STEP_COST_BY_MAGNITUDE: Final[dict[MagnitudeName, float]] = {
|
| 58 |
+
"small": -0.05,
|
| 59 |
+
"medium": -0.1,
|
| 60 |
+
"large": -0.2,
|
| 61 |
+
}
|
| 62 |
+
RESTORE_STEP_COST: Final[float] = -0.1
|
| 63 |
|
| 64 |
|
| 65 |
class StellaratorEnvironment(
|
|
|
|
| 161 |
self._update_best(params, metrics)
|
| 162 |
|
| 163 |
done = self._state.budget_remaining <= 0
|
| 164 |
+
reward_breakdown = self._compute_reward_breakdown(
|
| 165 |
+
metrics,
|
| 166 |
+
action.intent,
|
| 167 |
+
done,
|
| 168 |
+
magnitude=action.magnitude,
|
| 169 |
+
)
|
| 170 |
reward = reward_breakdown.total
|
| 171 |
summary = self._summary_run(action, metrics, action_monitor)
|
| 172 |
self._state.history.append(summary)
|
|
|
|
| 281 |
metrics: EvaluationMetrics,
|
| 282 |
intent: ActionIntent,
|
| 283 |
done: bool,
|
| 284 |
+
magnitude: MagnitudeName | None = None,
|
| 285 |
initial_reference_score: float | None = None,
|
| 286 |
) -> RewardBreakdown:
|
| 287 |
recovered_from_failure = self._recovered_from_failed_evaluation(metrics)
|
|
|
|
| 299 |
if metrics.evaluation_failed:
|
| 300 |
breakdown.failure_penalty = FAILURE_PENALTY
|
| 301 |
if intent != "submit":
|
| 302 |
+
breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
|
| 303 |
if intent == "submit":
|
| 304 |
breakdown.failure_submit_penalty = -1.0
|
| 305 |
elif done:
|
|
|
|
| 319 |
else:
|
| 320 |
breakdown.feasibility_delta_reward = (
|
| 321 |
previous_metrics.p1_feasibility - metrics.p1_feasibility
|
| 322 |
+
) * FEASIBILITY_DELTA_WEIGHT
|
| 323 |
+
breakdown.triangularity_repair_reward = (
|
| 324 |
+
previous_metrics.triangularity_violation - metrics.triangularity_violation
|
| 325 |
+
) * TRIANGULARITY_REPAIR_WEIGHT
|
| 326 |
+
breakdown.aspect_ratio_repair_reward = (
|
| 327 |
+
previous_metrics.aspect_ratio_violation - metrics.aspect_ratio_violation
|
| 328 |
+
) * ASPECT_RATIO_REPAIR_WEIGHT
|
| 329 |
+
breakdown.iota_repair_reward = (
|
| 330 |
+
previous_metrics.iota_violation - metrics.iota_violation
|
| 331 |
+
) * IOTA_REPAIR_WEIGHT
|
| 332 |
|
| 333 |
if intent != "submit":
|
| 334 |
+
breakdown.step_cost = self._step_cost(intent=intent, magnitude=magnitude)
|
| 335 |
|
| 336 |
if recovered_from_failure:
|
| 337 |
breakdown.recovery_bonus = 1.0
|
|
|
|
| 391 |
f"max_elongation={metrics.max_elongation:.4f}",
|
| 392 |
f"aspect_ratio={metrics.aspect_ratio:.4f} (<= {ASPECT_RATIO_MAX:.1f})",
|
| 393 |
f"average_triangularity={metrics.average_triangularity:.4f} (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
|
| 394 |
+
(
|
| 395 |
+
"edge_iota_over_nfp="
|
| 396 |
+
f"{metrics.edge_iota_over_nfp:.4f} (abs(.) >= {EDGE_IOTA_OVER_NFP_MIN:.1f})"
|
| 397 |
+
),
|
| 398 |
f"feasibility={metrics.p1_feasibility:.6f}",
|
| 399 |
+
f"aspect_ratio_violation={metrics.aspect_ratio_violation:.6f}",
|
| 400 |
+
f"triangularity_violation={metrics.triangularity_violation:.6f}",
|
| 401 |
+
f"iota_violation={metrics.iota_violation:.6f}",
|
| 402 |
+
f"dominant_constraint={metrics.dominant_constraint}",
|
| 403 |
f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
|
| 404 |
f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
|
| 405 |
(
|
|
|
|
| 427 |
aspect_ratio=metrics.aspect_ratio,
|
| 428 |
average_triangularity=metrics.average_triangularity,
|
| 429 |
edge_iota_over_nfp=metrics.edge_iota_over_nfp,
|
| 430 |
+
aspect_ratio_violation=metrics.aspect_ratio_violation,
|
| 431 |
+
triangularity_violation=metrics.triangularity_violation,
|
| 432 |
+
iota_violation=metrics.iota_violation,
|
| 433 |
+
dominant_constraint=metrics.dominant_constraint,
|
| 434 |
p1_score=metrics.p1_score,
|
| 435 |
p1_feasibility=metrics.p1_feasibility,
|
| 436 |
vacuum_well=metrics.vacuum_well,
|
|
|
|
| 487 |
else:
|
| 488 |
delta = previous_metrics.p1_feasibility - metrics.p1_feasibility
|
| 489 |
objective_summary = (
|
| 490 |
+
f"feasibility changed by {delta:+.6f} to {metrics.p1_feasibility:.6f}; "
|
| 491 |
+
f"dominant_constraint={metrics.dominant_constraint}."
|
| 492 |
)
|
| 493 |
return (
|
| 494 |
f"Applied {action.parameter} {action.direction} {action.magnitude}. "
|
|
|
|
| 644 |
return "The requested move was clipped to stay inside the allowed parameter range. "
|
| 645 |
return ""
|
| 646 |
|
| 647 |
+
def _step_cost(self, *, intent: ActionIntent, magnitude: MagnitudeName | None) -> float:
|
| 648 |
+
if intent == "restore_best":
|
| 649 |
+
return RESTORE_STEP_COST
|
| 650 |
+
if magnitude is None:
|
| 651 |
+
return STEP_COST_BY_MAGNITUDE["medium"]
|
| 652 |
+
return STEP_COST_BY_MAGNITUDE[magnitude]
|
| 653 |
+
|
| 654 |
def _reward_total(self, breakdown: RewardBreakdown) -> float:
|
| 655 |
total = (
|
| 656 |
breakdown.invalid_action_penalty
|
|
|
|
| 660 |
+ breakdown.feasibility_crossing_bonus
|
| 661 |
+ breakdown.feasibility_regression_penalty
|
| 662 |
+ breakdown.feasibility_delta_reward
|
| 663 |
+
+ breakdown.aspect_ratio_repair_reward
|
| 664 |
+
+ breakdown.triangularity_repair_reward
|
| 665 |
+
+ breakdown.iota_repair_reward
|
| 666 |
+ breakdown.objective_delta_reward
|
| 667 |
+ breakdown.step_cost
|
| 668 |
+ breakdown.recovery_bonus
|
|
|
|
| 681 |
("feasibility_crossing_bonus", breakdown.feasibility_crossing_bonus),
|
| 682 |
("feasibility_regression_penalty", breakdown.feasibility_regression_penalty),
|
| 683 |
("feasibility_delta_reward", breakdown.feasibility_delta_reward),
|
| 684 |
+
("aspect_ratio_repair_reward", breakdown.aspect_ratio_repair_reward),
|
| 685 |
+
("triangularity_repair_reward", breakdown.triangularity_repair_reward),
|
| 686 |
+
("iota_repair_reward", breakdown.iota_repair_reward),
|
| 687 |
("objective_delta_reward", breakdown.objective_delta_reward),
|
| 688 |
("step_cost", breakdown.step_cost),
|
| 689 |
("recovery_bonus", breakdown.recovery_bonus),
|
server/physics.py
CHANGED
|
@@ -15,7 +15,7 @@ from constellaration.geometry.surface_rz_fourier import SurfaceRZFourier
|
|
| 15 |
from constellaration.initial_guess import generate_rotating_ellipse
|
| 16 |
from constellaration.problems import GeometricalProblem
|
| 17 |
|
| 18 |
-
from fusion_lab.models import LowDimBoundaryParams
|
| 19 |
|
| 20 |
ASPECT_RATIO_MAX: Final[float] = 4.0
|
| 21 |
AVERAGE_TRIANGULARITY_MAX: Final[float] = -0.5
|
|
@@ -35,6 +35,10 @@ class EvaluationMetrics:
|
|
| 35 |
aspect_ratio: float
|
| 36 |
average_triangularity: float
|
| 37 |
edge_iota_over_nfp: float
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
p1_score: float
|
| 39 |
p1_feasibility: float
|
| 40 |
constraints_satisfied: bool
|
|
@@ -119,12 +123,19 @@ def _to_evaluation_metrics(
|
|
| 119 |
if not minimize_objective:
|
| 120 |
raise ValueError("P1 objective is expected to be minimize-only.")
|
| 121 |
p1_score = _score_from_objective(float(objective)) if constraints_satisfied else 0.0
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
return EvaluationMetrics(
|
| 124 |
max_elongation=float(objective),
|
| 125 |
aspect_ratio=float(metrics.aspect_ratio),
|
| 126 |
average_triangularity=float(metrics.average_triangularity),
|
| 127 |
edge_iota_over_nfp=float(metrics.edge_rotational_transform_over_n_field_periods),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
p1_score=p1_score,
|
| 129 |
p1_feasibility=p1_feasibility,
|
| 130 |
constraints_satisfied=constraints_satisfied,
|
|
@@ -145,6 +156,10 @@ def _failure_metrics(
|
|
| 145 |
aspect_ratio=0.0,
|
| 146 |
average_triangularity=0.0,
|
| 147 |
edge_iota_over_nfp=0.0,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
p1_score=0.0,
|
| 149 |
p1_feasibility=FAILED_FEASIBILITY,
|
| 150 |
constraints_satisfied=False,
|
|
@@ -158,3 +173,42 @@ def _failure_metrics(
|
|
| 158 |
def _score_from_objective(objective: float) -> float:
|
| 159 |
normalized = min(max((objective - 1.0) / 9.0, 0.0), 1.0)
|
| 160 |
return 1.0 - normalized
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
from constellaration.initial_guess import generate_rotating_ellipse
|
| 16 |
from constellaration.problems import GeometricalProblem
|
| 17 |
|
| 18 |
+
from fusion_lab.models import ConstraintName, LowDimBoundaryParams
|
| 19 |
|
| 20 |
ASPECT_RATIO_MAX: Final[float] = 4.0
|
| 21 |
AVERAGE_TRIANGULARITY_MAX: Final[float] = -0.5
|
|
|
|
| 35 |
aspect_ratio: float
|
| 36 |
average_triangularity: float
|
| 37 |
edge_iota_over_nfp: float
|
| 38 |
+
aspect_ratio_violation: float
|
| 39 |
+
triangularity_violation: float
|
| 40 |
+
iota_violation: float
|
| 41 |
+
dominant_constraint: ConstraintName
|
| 42 |
p1_score: float
|
| 43 |
p1_feasibility: float
|
| 44 |
constraints_satisfied: bool
|
|
|
|
| 123 |
if not minimize_objective:
|
| 124 |
raise ValueError("P1 objective is expected to be minimize-only.")
|
| 125 |
p1_score = _score_from_objective(float(objective)) if constraints_satisfied else 0.0
|
| 126 |
+
aspect_ratio_violation, triangularity_violation, iota_violation, dominant_constraint = (
|
| 127 |
+
_constraint_violation_metrics(metrics)
|
| 128 |
+
)
|
| 129 |
|
| 130 |
return EvaluationMetrics(
|
| 131 |
max_elongation=float(objective),
|
| 132 |
aspect_ratio=float(metrics.aspect_ratio),
|
| 133 |
average_triangularity=float(metrics.average_triangularity),
|
| 134 |
edge_iota_over_nfp=float(metrics.edge_rotational_transform_over_n_field_periods),
|
| 135 |
+
aspect_ratio_violation=aspect_ratio_violation,
|
| 136 |
+
triangularity_violation=triangularity_violation,
|
| 137 |
+
iota_violation=iota_violation,
|
| 138 |
+
dominant_constraint=dominant_constraint,
|
| 139 |
p1_score=p1_score,
|
| 140 |
p1_feasibility=p1_feasibility,
|
| 141 |
constraints_satisfied=constraints_satisfied,
|
|
|
|
| 156 |
aspect_ratio=0.0,
|
| 157 |
average_triangularity=0.0,
|
| 158 |
edge_iota_over_nfp=0.0,
|
| 159 |
+
aspect_ratio_violation=0.0,
|
| 160 |
+
triangularity_violation=0.0,
|
| 161 |
+
iota_violation=0.0,
|
| 162 |
+
dominant_constraint="none",
|
| 163 |
p1_score=0.0,
|
| 164 |
p1_feasibility=FAILED_FEASIBILITY,
|
| 165 |
constraints_satisfied=False,
|
|
|
|
| 173 |
def _score_from_objective(objective: float) -> float:
|
| 174 |
normalized = min(max((objective - 1.0) / 9.0, 0.0), 1.0)
|
| 175 |
return 1.0 - normalized
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def _constraint_violation_metrics(
|
| 179 |
+
metrics: ConstellarationMetrics,
|
| 180 |
+
) -> tuple[float, float, float, ConstraintName]:
|
| 181 |
+
aspect_ratio_violation = max(float(metrics.aspect_ratio) - ASPECT_RATIO_MAX, 0.0) / (
|
| 182 |
+
ASPECT_RATIO_MAX
|
| 183 |
+
)
|
| 184 |
+
triangularity_violation = max(
|
| 185 |
+
float(metrics.average_triangularity) - AVERAGE_TRIANGULARITY_MAX,
|
| 186 |
+
0.0,
|
| 187 |
+
) / abs(AVERAGE_TRIANGULARITY_MAX)
|
| 188 |
+
iota_violation = (
|
| 189 |
+
max(
|
| 190 |
+
EDGE_IOTA_OVER_NFP_MIN
|
| 191 |
+
- abs(float(metrics.edge_rotational_transform_over_n_field_periods)),
|
| 192 |
+
0.0,
|
| 193 |
+
)
|
| 194 |
+
/ EDGE_IOTA_OVER_NFP_MIN
|
| 195 |
+
)
|
| 196 |
+
|
| 197 |
+
dominant_constraint: ConstraintName = "none"
|
| 198 |
+
dominant_violation = 0.0
|
| 199 |
+
constraint_violations: tuple[tuple[ConstraintName, float], ...] = (
|
| 200 |
+
("aspect_ratio", aspect_ratio_violation),
|
| 201 |
+
("average_triangularity", triangularity_violation),
|
| 202 |
+
("edge_iota_over_nfp", iota_violation),
|
| 203 |
+
)
|
| 204 |
+
for constraint_name, violation in constraint_violations:
|
| 205 |
+
if violation > dominant_violation:
|
| 206 |
+
dominant_constraint = constraint_name
|
| 207 |
+
dominant_violation = violation
|
| 208 |
+
|
| 209 |
+
return (
|
| 210 |
+
aspect_ratio_violation,
|
| 211 |
+
triangularity_violation,
|
| 212 |
+
iota_violation,
|
| 213 |
+
dominant_constraint,
|
| 214 |
+
)
|
training/notebooks/fusion_design_lab_training.ipynb
CHANGED
|
@@ -12,7 +12,7 @@
|
|
| 12 |
"The agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n",
|
| 13 |
"- `aspect_ratio ≤ 4.0`\n",
|
| 14 |
"- `average_triangularity ≤ -0.5`\n",
|
| 15 |
-
"- `edge_iota_over_nfp ≥ 0.3`\n",
|
| 16 |
"\n",
|
| 17 |
"Each episode has **6 evaluations** budgeted. The agent produces a plan of actions and the environment scores it via the `constellaration` physics verifier.\n",
|
| 18 |
"\n",
|
|
@@ -198,7 +198,7 @@
|
|
| 198 |
"source": [
|
| 199 |
"## 6. Reward Function\n",
|
| 200 |
"\n",
|
| 201 |
-
"The environment reward executes each generated action plan in the stellarator environment and returns the cumulative low-fidelity Reward
|
| 202 |
"\n",
|
| 203 |
"For the current training workflow, the notebook ignores `submit` and does not auto-submit. GRPO therefore optimizes the low-fidelity `run` path only. The live observation telemetry still exposes `reward_breakdown` and `action_monitor` for debugging reward behavior.\n"
|
| 204 |
]
|
|
|
|
| 12 |
"The agent interacts with a constrained optimization environment where it adjusts 4 geometric knobs of a stellarator boundary, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:\n",
|
| 13 |
"- `aspect_ratio ≤ 4.0`\n",
|
| 14 |
"- `average_triangularity ≤ -0.5`\n",
|
| 15 |
+
"- `abs(edge_iota_over_nfp) ≥ 0.3`\n",
|
| 16 |
"\n",
|
| 17 |
"Each episode has **6 evaluations** budgeted. The agent produces a plan of actions and the environment scores it via the `constellaration` physics verifier.\n",
|
| 18 |
"\n",
|
|
|
|
| 198 |
"source": [
|
| 199 |
"## 6. Reward Function\n",
|
| 200 |
"\n",
|
| 201 |
+
"The environment reward executes each generated action plan in the stellarator environment and returns the cumulative low-fidelity Reward V1 from the live environment. The environment's built-in reward decomposes feasibility (+3/-3 crossing bonuses, official feasibility progress, weighted triangularity/aspect/iota repair terms), objective (max elongation improvement), step costs, and failure penalties — see `server/environment.py:_compute_reward_breakdown(...)`.\n",
|
| 202 |
"\n",
|
| 203 |
"For the current training workflow, the notebook ignores `submit` and does not auto-submit. GRPO therefore optimizes the low-fidelity `run` path only. The live observation telemetry still exposes `reward_breakdown` and `action_monitor` for debugging reward behavior.\n"
|
| 204 |
]
|