Commit ·
f238af4
1
Parent(s): 3bfd80a
feat: refresh heuristic baseline and sync docs
Browse files- README.md +5 -5
- TODO.md +3 -1
- baselines/README.md +25 -2
- baselines/heuristic_agent.py +33 -8
- docs/findings/FUSION_DESIGN_LAB_PLAN_V2.md +2 -4
- docs/findings/P1_REPLAY_PLAYTEST_REPORT.md +31 -18
README.md
CHANGED
|
@@ -55,7 +55,7 @@ Implementation status:
|
|
| 55 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 56 |
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
|
| 57 |
- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
|
| 58 |
-
- [
|
| 59 |
- [ ] Deploy the real environment to HF Space
|
| 60 |
|
| 61 |
## Known Gaps
|
|
@@ -69,14 +69,14 @@ Implementation status:
|
|
| 69 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
| 70 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
| 71 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 72 |
-
- The real-verifier
|
| 73 |
-
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now
|
| 74 |
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). It produced a valid trajectory artifact and exposed a repeated-action local failure, which is the right outcome for a smoke run.
|
| 75 |
|
| 76 |
Current mode:
|
| 77 |
|
| 78 |
- strategic task choice is already locked
|
| 79 |
-
- the next work is
|
| 80 |
- new planning text should only appear when a real blocker forces a decision change
|
| 81 |
|
| 82 |
## Planned Repository Layout
|
|
@@ -135,7 +135,7 @@ uv sync --extra notebooks
|
|
| 135 |
- [x] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
|
| 136 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 137 |
- [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
|
| 138 |
-
- [ ]
|
| 139 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 140 |
- [ ] Deploy the environment to HF Space.
|
| 141 |
- [ ] Add the Colab notebook under `training/notebooks`.
|
|
|
|
| 55 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 56 |
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
|
| 57 |
- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
|
| 58 |
+
- [x] Refresh the heuristic baseline for the real verifier path
|
| 59 |
- [ ] Deploy the real environment to HF Space
|
| 60 |
|
| 61 |
## Known Gaps
|
|
|
|
| 69 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
| 70 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
| 71 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 72 |
+
- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
|
| 73 |
+
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
|
| 74 |
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). It produced a valid trajectory artifact and exposed a repeated-action local failure, which is the right outcome for a smoke run.
|
| 75 |
|
| 76 |
Current mode:
|
| 77 |
|
| 78 |
- strategic task choice is already locked
|
| 79 |
+
- the next work is reset-seed confirmation, trace export, and deployment
|
| 80 |
- new planning text should only appear when a real blocker forces a decision change
|
| 81 |
|
| 82 |
## Planned Repository Layout
|
|
|
|
| 135 |
- [x] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
|
| 136 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 137 |
- [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
|
| 138 |
+
- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
|
| 139 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 140 |
- [ ] Deploy the environment to HF Space.
|
| 141 |
- [ ] Add the Colab notebook under `training/notebooks`.
|
TODO.md
CHANGED
|
@@ -46,7 +46,9 @@ Priority source:
|
|
| 46 |
- [x] tiny low-fi PPO smoke run exists
|
| 47 |
Note:
|
| 48 |
`training/ppo_smoke.py` now runs a diagnostic-only low-fidelity PPO smoke pass and the first artifact is summarized in `docs/P1_PPO_SMOKE_NOTE.md`
|
| 49 |
-
- [
|
|
|
|
|
|
|
| 50 |
|
| 51 |
## Execution Graph
|
| 52 |
|
|
|
|
| 46 |
- [x] tiny low-fi PPO smoke run exists
|
| 47 |
Note:
|
| 48 |
`training/ppo_smoke.py` now runs a diagnostic-only low-fidelity PPO smoke pass and the first artifact is summarized in `docs/P1_PPO_SMOKE_NOTE.md`
|
| 49 |
+
- [x] refresh the heuristic baseline for the real verifier path
|
| 50 |
+
Note:
|
| 51 |
+
the refreshed heuristic now uses the measured `rotational_transform -> triangularity_scale -> elongation -> submit` path; a fresh `uv run python baselines/compare.py 5` rerun finished at `5/5` feasible high-fidelity finals and `5/5` wins over random
|
| 52 |
|
| 53 |
## Execution Graph
|
| 54 |
|
baselines/README.md
CHANGED
|
@@ -8,11 +8,34 @@ Random and heuristic baselines will live here.
|
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
- [x] verified that the current 3-knob family is blocked on P1 triangularity under the real verifier path
|
| 10 |
- [x] repair the low-dimensional parameterization before further heuristic work
|
| 11 |
-
- [
|
| 12 |
-
- [
|
| 13 |
- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
|
| 14 |
- [ ] presentation-ready comparison trace exported
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
The first baseline milestone is:
|
| 17 |
|
| 18 |
- one random agent
|
|
|
|
| 8 |
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
- [x] verified that the current 3-knob family is blocked on P1 triangularity under the real verifier path
|
| 10 |
- [x] repair the low-dimensional parameterization before further heuristic work
|
| 11 |
+
- [x] use the measured repaired-family evidence and current frozen seed set before retuning the heuristic
|
| 12 |
+
- [x] heuristic refreshed after the real-verifier rerun
|
| 13 |
- [ ] near-boundary fixture-backed baseline start chosen for manual playtesting
|
| 14 |
- [ ] presentation-ready comparison trace exported
|
| 15 |
|
| 16 |
+
## Current Heuristic
|
| 17 |
+
|
| 18 |
+
The refreshed heuristic now follows the measured repaired-family transition pattern:
|
| 19 |
+
|
| 20 |
+
- if a low-fidelity evaluation fails, `restore_best`
|
| 21 |
+
- if a reset starts with low `edge_iota_over_nfp`, push `rotational_transform increase medium` first
|
| 22 |
+
- once `average_triangularity` is close enough, push `triangularity_scale increase medium`
|
| 23 |
+
- once feasible, take at most a small amount of `elongation decrease small`
|
| 24 |
+
- submit as soon as the design is feasible and the elongation is in the safe band
|
| 25 |
+
|
| 26 |
+
This keeps the baseline on the real verifier path instead of relying on the older threshold-only policy that over-pushed triangularity and missed the feasible sequence.
|
| 27 |
+
|
| 28 |
+
## Latest Rerun
|
| 29 |
+
|
| 30 |
+
`uv run python baselines/compare.py 5`
|
| 31 |
+
|
| 32 |
+
- random mean reward: `-2.2438`
|
| 33 |
+
- heuristic mean reward: `+5.2825`
|
| 34 |
+
- random mean final `P1` score: `0.000000`
|
| 35 |
+
- heuristic mean final `P1` score: `0.291951`
|
| 36 |
+
- feasible high-fidelity finals: `0/5` random vs `5/5` heuristic
|
| 37 |
+
- heuristic wins: `5/5`
|
| 38 |
+
|
| 39 |
The first baseline milestone is:
|
| 40 |
|
| 41 |
- one random agent
|
baselines/heuristic_agent.py
CHANGED
|
@@ -7,6 +7,12 @@ import sys
|
|
| 7 |
from fusion_lab.models import StellaratorAction, StellaratorObservation
|
| 8 |
from server.environment import StellaratorEnvironment
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
def heuristic_episode(
|
| 12 |
env: StellaratorEnvironment, seed: int | None = None
|
|
@@ -19,6 +25,10 @@ def heuristic_episode(
|
|
| 19 |
"score": obs.p1_score,
|
| 20 |
"evaluation_fidelity": obs.evaluation_fidelity,
|
| 21 |
"constraints_satisfied": obs.constraints_satisfied,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
}
|
| 23 |
]
|
| 24 |
|
|
@@ -35,6 +45,10 @@ def heuristic_episode(
|
|
| 35 |
"score": obs.p1_score,
|
| 36 |
"evaluation_fidelity": obs.evaluation_fidelity,
|
| 37 |
"constraints_satisfied": obs.constraints_satisfied,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
"reward": obs.reward,
|
| 39 |
"failure": obs.evaluation_failed,
|
| 40 |
}
|
|
@@ -44,8 +58,15 @@ def heuristic_episode(
|
|
| 44 |
|
| 45 |
|
| 46 |
def _choose_action(obs: StellaratorObservation) -> StellaratorAction:
|
|
|
|
|
|
|
|
|
|
| 47 |
if obs.constraints_satisfied:
|
| 48 |
-
if
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
return StellaratorAction(intent="submit")
|
| 50 |
return StellaratorAction(
|
| 51 |
intent="run",
|
|
@@ -54,18 +75,22 @@ def _choose_action(obs: StellaratorObservation) -> StellaratorAction:
|
|
| 54 |
magnitude="small",
|
| 55 |
)
|
| 56 |
|
| 57 |
-
if obs.
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
return StellaratorAction(
|
| 62 |
intent="run",
|
| 63 |
parameter="triangularity_scale",
|
| 64 |
direction="increase",
|
| 65 |
-
magnitude="
|
| 66 |
)
|
| 67 |
|
| 68 |
-
if obs.edge_iota_over_nfp <
|
| 69 |
return StellaratorAction(
|
| 70 |
intent="run",
|
| 71 |
parameter="rotational_transform",
|
|
@@ -73,7 +98,7 @@ def _choose_action(obs: StellaratorObservation) -> StellaratorAction:
|
|
| 73 |
magnitude="small",
|
| 74 |
)
|
| 75 |
|
| 76 |
-
if obs.aspect_ratio >
|
| 77 |
return StellaratorAction(
|
| 78 |
intent="run",
|
| 79 |
parameter="aspect_ratio",
|
|
|
|
| 7 |
from fusion_lab.models import StellaratorAction, StellaratorObservation
|
| 8 |
from server.environment import StellaratorEnvironment
|
| 9 |
|
| 10 |
+
FEASIBLE_SUBMIT_ELONGATION_MAX = 7.45
|
| 11 |
+
TRIANGULARITY_TARGET_MAX = -0.5
|
| 12 |
+
LOW_IOTA_RESET_THRESHOLD = 0.305
|
| 13 |
+
IOTA_RECOVERY_THRESHOLD = 0.3
|
| 14 |
+
ASPECT_RATIO_TARGET_MAX = 4.0
|
| 15 |
+
|
| 16 |
|
| 17 |
def heuristic_episode(
|
| 18 |
env: StellaratorEnvironment, seed: int | None = None
|
|
|
|
| 25 |
"score": obs.p1_score,
|
| 26 |
"evaluation_fidelity": obs.evaluation_fidelity,
|
| 27 |
"constraints_satisfied": obs.constraints_satisfied,
|
| 28 |
+
"feasibility": obs.p1_feasibility,
|
| 29 |
+
"max_elongation": obs.max_elongation,
|
| 30 |
+
"average_triangularity": obs.average_triangularity,
|
| 31 |
+
"edge_iota_over_nfp": obs.edge_iota_over_nfp,
|
| 32 |
}
|
| 33 |
]
|
| 34 |
|
|
|
|
| 45 |
"score": obs.p1_score,
|
| 46 |
"evaluation_fidelity": obs.evaluation_fidelity,
|
| 47 |
"constraints_satisfied": obs.constraints_satisfied,
|
| 48 |
+
"feasibility": obs.p1_feasibility,
|
| 49 |
+
"max_elongation": obs.max_elongation,
|
| 50 |
+
"average_triangularity": obs.average_triangularity,
|
| 51 |
+
"edge_iota_over_nfp": obs.edge_iota_over_nfp,
|
| 52 |
"reward": obs.reward,
|
| 53 |
"failure": obs.evaluation_failed,
|
| 54 |
}
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
def _choose_action(obs: StellaratorObservation) -> StellaratorAction:
|
| 61 |
+
if obs.evaluation_failed:
|
| 62 |
+
return StellaratorAction(intent="restore_best")
|
| 63 |
+
|
| 64 |
if obs.constraints_satisfied:
|
| 65 |
+
if (
|
| 66 |
+
obs.max_elongation <= FEASIBLE_SUBMIT_ELONGATION_MAX
|
| 67 |
+
or obs.budget_remaining <= 2
|
| 68 |
+
or obs.step_number >= 3
|
| 69 |
+
):
|
| 70 |
return StellaratorAction(intent="submit")
|
| 71 |
return StellaratorAction(
|
| 72 |
intent="run",
|
|
|
|
| 75 |
magnitude="small",
|
| 76 |
)
|
| 77 |
|
| 78 |
+
if obs.average_triangularity > TRIANGULARITY_TARGET_MAX:
|
| 79 |
+
if obs.step_number == 0 and obs.edge_iota_over_nfp < LOW_IOTA_RESET_THRESHOLD:
|
| 80 |
+
return StellaratorAction(
|
| 81 |
+
intent="run",
|
| 82 |
+
parameter="rotational_transform",
|
| 83 |
+
direction="increase",
|
| 84 |
+
magnitude="medium",
|
| 85 |
+
)
|
| 86 |
return StellaratorAction(
|
| 87 |
intent="run",
|
| 88 |
parameter="triangularity_scale",
|
| 89 |
direction="increase",
|
| 90 |
+
magnitude="medium",
|
| 91 |
)
|
| 92 |
|
| 93 |
+
if obs.edge_iota_over_nfp < IOTA_RECOVERY_THRESHOLD:
|
| 94 |
return StellaratorAction(
|
| 95 |
intent="run",
|
| 96 |
parameter="rotational_transform",
|
|
|
|
| 98 |
magnitude="small",
|
| 99 |
)
|
| 100 |
|
| 101 |
+
if obs.aspect_ratio > ASPECT_RATIO_TARGET_MAX:
|
| 102 |
return StellaratorAction(
|
| 103 |
intent="run",
|
| 104 |
parameter="aspect_ratio",
|
docs/findings/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -40,9 +40,7 @@ Completed:
|
|
| 40 |
|
| 41 |
Still open:
|
| 42 |
|
| 43 |
-
- tiny low-fidelity PPO smoke evidence
|
| 44 |
- decision on whether reset-seed pool should change from paired checks
|
| 45 |
-
- heuristic baseline refresh on the repaired real-verifier path
|
| 46 |
- HF Space deployment evidence
|
| 47 |
- Colab artifact wiring
|
| 48 |
- demo and README polish after the artifacts are real
|
|
@@ -144,7 +142,7 @@ The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V
|
|
| 144 |
- [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
|
| 145 |
- [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
|
| 146 |
- [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
|
| 147 |
-
- [
|
| 148 |
- [ ] Prove a stable local episode path.
|
| 149 |
- [ ] Deploy the same task contract to HF Space and prove one clean remote episode.
|
| 150 |
- [ ] Wire the Colab artifact to the live environment.
|
|
@@ -226,5 +224,5 @@ If the repaired family is too easy:
|
|
| 226 |
- [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
|
| 227 |
- [x] Pair the tracked fixtures with high-fidelity submit checks.
|
| 228 |
- [x] Record one submit-side manual trace.
|
| 229 |
-
- [
|
| 230 |
- [ ] Verify one clean HF Space episode with the same contract.
|
|
|
|
| 40 |
|
| 41 |
Still open:
|
| 42 |
|
|
|
|
| 43 |
- decision on whether reset-seed pool should change from paired checks
|
|
|
|
| 44 |
- HF Space deployment evidence
|
| 45 |
- Colab artifact wiring
|
| 46 |
- demo and README polish after the artifacts are real
|
|
|
|
| 142 |
- [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
|
| 143 |
- [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
|
| 144 |
- [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
|
| 145 |
+
- [x] Refresh the heuristic baseline using the repaired-family evidence.
|
| 146 |
- [ ] Prove a stable local episode path.
|
| 147 |
- [ ] Deploy the same task contract to HF Space and prove one clean remote episode.
|
| 148 |
- [ ] Wire the Colab artifact to the live environment.
|
|
|
|
| 224 |
- [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
|
| 225 |
- [x] Pair the tracked fixtures with high-fidelity submit checks.
|
| 226 |
- [x] Record one submit-side manual trace.
|
| 227 |
+
- [x] Refresh the heuristic baseline from that playtest evidence.
|
| 228 |
- [ ] Verify one clean HF Space episode with the same contract.
|
docs/findings/P1_REPLAY_PLAYTEST_REPORT.md
CHANGED
|
@@ -2,6 +2,16 @@
|
|
| 2 |
|
| 3 |
Date: 2026-03-07
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
## Purpose
|
| 6 |
|
| 7 |
Expand reward branch coverage beyond the initial manual playtest (Episodes A-B in
|
|
@@ -148,11 +158,13 @@ Branches exercised:
|
|
| 148 |
- **submit high-fidelity evaluation** (step 4)
|
| 149 |
- **submit failure penalty** (-3.0, step 4: VMEC crash at high fidelity)
|
| 150 |
|
| 151 |
-
|
| 152 |
`(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` passes low-fidelity evaluation
|
| 153 |
(step 3: score=0.296, constraints satisfied) but **crashes at high-fidelity
|
| 154 |
-
evaluation** (step 4: VMEC failure).
|
| 155 |
-
the
|
|
|
|
|
|
|
| 156 |
|
| 157 |
## Reward branch coverage summary
|
| 158 |
|
|
@@ -168,26 +180,27 @@ the real final check for this particular path.
|
|
| 168 |
| Budget exhaustion done-penalty | `environment.py:264-265` | not tested | Ep 3 step 6 |
|
| 169 |
| Recovery bonus (+1.0) | `environment.py:248-249` | not tested | Ep 1 step 6, Ep 4 step 4 |
|
| 170 |
| Budget exhaustion done-bonus | `environment.py:258-263` | not tested | Ep 1 step 6, Ep 2 step 6, Ep 4 step 6 |
|
| 171 |
-
| Submit improvement bonus | `environment.py:260-261` | not tested |
|
| 172 |
| Clamping (no physics change) | `environment.py:412-414` | not tested | Ep 3 step 1 |
|
| 173 |
| restore_best | `environment.py:175-195` | not tested | Ep 4 step 4 |
|
| 174 |
|
| 175 |
-
Coverage: 12 of 13 branches exercised. The only untested branch
|
| 176 |
-
**submit improvement bonus**
|
| 177 |
-
|
|
|
|
| 178 |
|
| 179 |
## Critical findings
|
| 180 |
|
| 181 |
-
### 1.
|
| 182 |
|
| 183 |
The canonical repair path from seed 0 (increase rt medium, increase tri medium,
|
| 184 |
-
decrease elong small)
|
| 185 |
-
fidelity
|
| 186 |
-
line 53 and `FUSION_DESIGN_LAB_PLAN_V2.md` open items.
|
| 187 |
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
|
|
|
| 191 |
|
| 192 |
### 2. Elongation crash pocket (Episode 1)
|
| 193 |
|
|
@@ -224,13 +237,13 @@ monotonically improve the design.
|
|
| 224 |
| Feasible-side shaping | not tested | confirmed legible |
|
| 225 |
| VMEC crash handling | not tested | confirmed legible |
|
| 226 |
| restore_best | not tested | confirmed working |
|
| 227 |
-
| Submit tested | no | yes (
|
| 228 |
-
| Cross-fidelity evidence | none |
|
| 229 |
|
| 230 |
## Open items
|
| 231 |
|
| 232 |
-
1. **
|
| 233 |
-
|
| 234 |
2. **Map the elongation crash pocket** with a targeted sweep over the elongation
|
| 235 |
dimension at feasible parameter combinations.
|
| 236 |
3. **Update the measured sweep report** to document the elongation crash zone.
|
|
|
|
| 2 |
|
| 3 |
Date: 2026-03-07
|
| 4 |
|
| 5 |
+
Update: 2026-03-08
|
| 6 |
+
|
| 7 |
+
This report is still useful for reward-branch coverage and low-fidelity failure
|
| 8 |
+
pathologies, but its Episode 5 submit result is now historical only. The newer
|
| 9 |
+
manual submit trace in `../P1_MANUAL_PLAYTEST_LOG.md` records the same
|
| 10 |
+
`rotational_transform increase medium -> triangularity_scale increase medium ->
|
| 11 |
+
elongation decrease small -> submit` path succeeding at high fidelity with
|
| 12 |
+
score `0.296059`. Do not use this replay report as the current source of truth
|
| 13 |
+
for submit viability.
|
| 14 |
+
|
| 15 |
## Purpose
|
| 16 |
|
| 17 |
Expand reward branch coverage beyond the initial manual playtest (Episodes A-B in
|
|
|
|
| 158 |
- **submit high-fidelity evaluation** (step 4)
|
| 159 |
- **submit failure penalty** (-3.0, step 4: VMEC crash at high fidelity)
|
| 160 |
|
| 161 |
+
Historical finding: the state at
|
| 162 |
`(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` passes low-fidelity evaluation
|
| 163 |
(step 3: score=0.296, constraints satisfied) but **crashes at high-fidelity
|
| 164 |
+
evaluation** in this replay run (step 4: VMEC failure). A newer manual submit
|
| 165 |
+
trace now records the same action sequence succeeding at high fidelity, so this
|
| 166 |
+
episode should be treated as a historical discrepancy rather than live evidence
|
| 167 |
+
of a persistent cross-fidelity gap.
|
| 168 |
|
| 169 |
## Reward branch coverage summary
|
| 170 |
|
|
|
|
| 180 |
| Budget exhaustion done-penalty | `environment.py:264-265` | not tested | Ep 3 step 6 |
|
| 181 |
| Recovery bonus (+1.0) | `environment.py:248-249` | not tested | Ep 1 step 6, Ep 4 step 4 |
|
| 182 |
| Budget exhaustion done-bonus | `environment.py:258-263` | not tested | Ep 1 step 6, Ep 2 step 6, Ep 4 step 6 |
|
| 183 |
+
| Submit improvement bonus | `environment.py:260-261` | not tested | historical replay did not trigger it |
|
| 184 |
| Clamping (no physics change) | `environment.py:412-414` | not tested | Ep 3 step 1 |
|
| 185 |
| restore_best | `environment.py:175-195` | not tested | Ep 4 step 4 |
|
| 186 |
|
| 187 |
+
Coverage: 12 of 13 branches exercised in this replay. The only untested branch
|
| 188 |
+
here is the **submit improvement bonus**. A newer manual submit trace now
|
| 189 |
+
provides positive high-fidelity submit evidence, but that branch was not
|
| 190 |
+
exercised in this historical replay artifact.
|
| 191 |
|
| 192 |
## Critical findings
|
| 193 |
|
| 194 |
+
### 1. Historical submit discrepancy (Episode 5)
|
| 195 |
|
| 196 |
The canonical repair path from seed 0 (increase rt medium, increase tri medium,
|
| 197 |
+
decrease elong small) produced a low-fi feasible state that crashed at high
|
| 198 |
+
fidelity in this replay run.
|
|
|
|
| 199 |
|
| 200 |
+
Update: this is no longer the live repo conclusion. The newer manual submit
|
| 201 |
+
trace in `../P1_MANUAL_PLAYTEST_LOG.md` records the same path succeeding at
|
| 202 |
+
high fidelity. Treat Episode 5 as evidence that submit behavior needed repeated
|
| 203 |
+
checking, not as proof that seed 0 lacks a viable submit path.
|
| 204 |
|
| 205 |
### 2. Elongation crash pocket (Episode 1)
|
| 206 |
|
|
|
|
| 237 |
| Feasible-side shaping | not tested | confirmed legible |
|
| 238 |
| VMEC crash handling | not tested | confirmed legible |
|
| 239 |
| restore_best | not tested | confirmed working |
|
| 240 |
+
| Submit tested | no | yes (historical replay crash) |
|
| 241 |
+
| Cross-fidelity evidence | none | mixed; superseded by newer successful manual submit trace |
|
| 242 |
|
| 243 |
## Open items
|
| 244 |
|
| 245 |
+
1. **Export the newer high-fidelity-safe submit trace** alongside this replay so
|
| 246 |
+
the historical Episode 5 crash is not read as the live repo conclusion.
|
| 247 |
2. **Map the elongation crash pocket** with a targeted sweep over the elongation
|
| 248 |
dimension at feasible parameter combinations.
|
| 249 |
3. **Update the measured sweep report** to document the elongation crash zone.
|