P1 Replay Playtest Report
Date: 2026-03-07
Update: 2026-03-08
This report is still useful for reward-branch coverage and low-fidelity failure
pathologies, but its Episode 5 submit result is now historical only. The newer
manual submit trace in ../P1_MANUAL_PLAYTEST_LOG.md records the same
rotational_transform increase medium -> triangularity_scale increase medium -> elongation decrease small -> submit path succeeding at high fidelity with
score 0.296059. Do not use this replay report as the current source of truth
for submit viability.
Purpose
Expand reward branch coverage beyond the initial manual playtest (Episodes A-B in
P1_MANUAL_PLAYTEST_LOG.md). That log covered 1 seed, 2 episodes, 3 steps.
This replay covers all 3 seeds, 5 episodes, 27 steps, and exercises every
reward branch in server/environment.py:_compute_reward.
Method
Script: baselines/replay_playtest.py
- direct
StellaratorEnvironmentinstantiation (no server) - fixed action sequences for reproducibility
- same pattern as
baselines/random_agent.pyandbaselines/heuristic_agent.py
Episode results
Episode 1: seed 0 — repair + objective shaping + budget exhaustion
Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0
| Step | Action | Reward | Score | Feasibility | Elongation | Status | Budget |
|---|---|---|---|---|---|---|---|
| 1 | rt increase medium | -0.1000 | 0.000000 | 0.050653 | 6.7295 | viol | 5 |
| 2 | tri increase medium | +3.1533 | 0.291660 | 0.000000 | 7.3751 | OK | 4 |
| 3 | elong decrease small | +0.2665 | 0.295731 | 0.000865 | 7.3384 | OK | 3 |
| 4 | elong decrease small | -2.1000 | 0.000000 | 1000000 | 10.0000 | FAIL | 2 |
| 5 | elong decrease small | -2.1000 | 0.000000 | 1000000 | 10.0000 | FAIL | 1 |
| 6 | elong decrease small | +2.5350 | 0.307074 | 0.004561 | 7.2363 | OK | 0 |
Total reward: +1.6548
Branches exercised:
- feasibility crossing bonus (+3.0, step 2)
- feasible-side elongation shaping (step 3)
- VMEC failure penalty (-2.1, steps 4-5)
- recovery bonus (+1.0, step 6)
- budget exhaustion done-time improvement bonus (step 6)
Finding: elongation crash pocket at elong ~1.30-1.25. Steps 4-5 crashed
during low-fi evaluation after decreasing elongation from 1.35 to 1.30 and 1.25.
Recovery occurred at elong=1.20 (step 6). This crash zone is within the
documented parameter range (1.2, 1.8) and is not mapped in the measured sweep.
Episode 2: seed 1 — repair from different seed
Start: ar=3.4, elong=1.4, rt=1.6, tri=0.55, feasibility=0.050653, score=0.0
| Step | Action | Reward | Score | Feasibility | Elongation | Status | Budget |
|---|---|---|---|---|---|---|---|
| 1 | rt increase medium | -0.1000 | 0.000000 | 0.050653 | 6.8493 | viol | 5 |
| 2 | tri increase medium | +3.1042 | 0.276209 | 0.009819 | 7.5141 | OK | 4 |
| 3 | elong decrease small | +0.2824 | 0.280458 | 0.001415 | 7.4759 | OK | 3 |
| 4 | elong decrease small | +0.2724 | 0.284596 | 0.002252 | 7.4386 | OK | 2 |
| 5 | elong decrease small | +0.2557 | 0.288548 | 0.003499 | 7.4031 | OK | 1 |
| 6 | elong decrease small | +0.8212 | 0.292289 | 0.004561 | 7.3694 | OK | 0 |
Total reward: +4.6359
Branches exercised:
- feasibility crossing from a non-default seed (step 2)
- sustained feasible-side elongation shaping (steps 3-6)
- budget exhaustion done-time improvement bonus (step 6)
Finding: cleanest full-episode success. Six consecutive successful evaluations, monotonic score improvement, positive reward every step after crossing. Confirms that the repair+optimize arc is legible across a full episode from seed 1.
Episode 3: seed 2 — boundary clamping + feasibility regression
Start: ar=3.8, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0
| Step | Action | Reward | Score | Feasibility | Elongation | Status | Budget |
|---|---|---|---|---|---|---|---|
| 1 | ar increase large | -0.1000 | 0.000000 | 0.050653 | 6.5502 | viol | 5 |
| 2 | tri increase medium | +3.1533 | 0.314255 | 0.000000 | 7.1717 | OK | 4 |
| 3 | tri increase medium | -3.3598 | 0.000000 | 0.051950 | 7.8596 | viol | 3 |
| 4 | elong decrease small | -0.0715 | 0.000000 | 0.046243 | 7.8309 | viol | 2 |
| 5 | ar decrease large | -0.4932 | 0.000000 | 0.124880 | 7.3386 | viol | 1 |
| 6 | elong decrease small | -0.5650 | 0.000000 | 0.117873 | 7.3091 | viol | 0 |
Total reward: -1.4362
Branches exercised:
- boundary clamping (step 1: ar=3.8 + 0.2 clamped at 3.8, no physics change, reward = step cost only)
- feasibility crossing bonus (+3.0, step 2)
- feasibility regression penalty (-3.0, step 3: pushed tri too far, lost feasibility)
- infeasible feasibility shaping (steps 4-6)
- budget exhaustion done-time penalty (step 6: not improved)
Finding: feasibility is non-monotonic in triangularity_scale. Crossing at tri=0.60 (score=0.314), but tri=0.65 breaks feasibility (feas=0.052). The feasible zone is a narrow band, not an open region. The regression penalty (-3.36 total) is clearly legible.
Episode 4: seed 0 — crash recovery + restore_best
Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0
| Step | Action | Reward | Score | Feasibility | Elongation | Status | Budget |
|---|---|---|---|---|---|---|---|
| 1 | tri increase medium | -0.2593 | 0.000000 | 0.082515 | 6.7218 | viol | 5 |
| 2 | rt increase large | +3.3126 | 0.210239 | 0.000000 | 8.1079 | OK | 4 |
| 3 | rt increase large | -2.1000 | 0.000000 | 1000000 | 10.0000 | FAIL | 3 |
| 4 | restore_best | +0.9000 | 0.210239 | 0.000000 | 8.1079 | OK | 2 |
| 5 | elong decrease small | +0.2541 | 0.214174 | 0.000865 | 8.0724 | OK | 1 |
| 6 | elong decrease small | +0.6821 | 0.218018 | 0.002252 | 8.0378 | OK | 0 |
Total reward: +2.7895
Branches exercised:
- infeasible feasibility shaping (step 1: tri alone worsened feasibility)
- feasibility crossing via large rt jump (step 2)
- VMEC failure at rt=1.9 (-2.1, step 3: crash zone as documented in sweep report)
- restore_best + recovery bonus (+0.9, step 4: reverts to best-known state, +1.0 recovery -0.1 step cost)
- feasible-side elongation shaping (steps 5-6)
- budget exhaustion done-time improvement bonus (step 6)
Finding: restore_best works correctly and the recovery bonus (+1.0) is legible. After reverting from a VMEC crash, the agent can continue improving from its saved best state.
Note: step 1 reveals that triangularity_scale increase medium alone (without
a preceding rt increase) worsens feasibility for seed 0. The feasibility
boundary is a multi-parameter surface, not a single-knob threshold.
Episode 5: seed 0 — repair + objective move + explicit submit
Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0
| Step | Action | Reward | Score | Feasibility | Elongation | Status | Budget |
|---|---|---|---|---|---|---|---|
| 1 | rt increase medium | -0.1000 | 0.000000 | 0.050653 | 6.7295 | viol | 5 |
| 2 | tri increase medium | +3.1533 | 0.291660 | 0.000000 | 7.3751 | OK | 4 |
| 3 | elong decrease small | +0.2665 | 0.295731 | 0.000865 | 7.3384 | OK | 3 |
| 4 | submit | -3.0000 | 0.000000 | 1000000 | 10.0000 | FAIL | 3 |
Total reward: +0.3198
Branches exercised:
- feasibility crossing (step 2)
- feasible-side elongation shaping (step 3)
- submit high-fidelity evaluation (step 4)
- submit failure penalty (-3.0, step 4: VMEC crash at high fidelity)
Historical finding: the state at
(ar=3.6, elong=1.35, rt=1.6, tri=0.60) passes low-fidelity evaluation
(step 3: score=0.296, constraints satisfied) but crashes at high-fidelity
evaluation in this replay run (step 4: VMEC failure). A newer manual submit
trace now records the same action sequence succeeding at high fidelity, so this
episode should be treated as a historical discrepancy rather than live evidence
of a persistent cross-fidelity gap.
Reward branch coverage summary
| Branch | Code reference | First run | This replay |
|---|---|---|---|
| Feasibility crossing bonus (+3.0) | environment.py:235-236 |
Ep A step 2 | Ep 1-4 |
| Feasibility regression penalty (-3.0) | environment.py:237-238 |
not tested | Ep 3 step 3 |
| Feasible-side elongation shaping | environment.py:240-241 |
not tested | Ep 1-2, Ep 4 |
| Infeasible feasibility shaping | environment.py:242-243 |
Ep A step 1 | Ep 3 steps 4-6 |
| Step cost (-0.1) | environment.py:245-246 |
Ep A step 1 | all run steps |
| VMEC failure penalty (-2.1) | environment.py:223-226 |
not tested | Ep 1 steps 4-5, Ep 4 step 3 |
| Submit failure penalty (-3.0) | environment.py:227-228 |
not tested | Ep 5 step 4 |
| Budget exhaustion done-penalty | environment.py:264-265 |
not tested | Ep 3 step 6 |
| Recovery bonus (+1.0) | environment.py:248-249 |
not tested | Ep 1 step 6, Ep 4 step 4 |
| Budget exhaustion done-bonus | environment.py:258-263 |
not tested | Ep 1 step 6, Ep 2 step 6, Ep 4 step 6 |
| Submit improvement bonus | environment.py:260-261 |
not tested | historical replay did not trigger it |
| Clamping (no physics change) | environment.py:412-414 |
not tested | Ep 3 step 1 |
| restore_best | environment.py:175-195 |
not tested | Ep 4 step 4 |
Coverage: 12 of 13 branches exercised in this replay. The only untested branch here is the submit improvement bonus. A newer manual submit trace now provides positive high-fidelity submit evidence, but that branch was not exercised in this historical replay artifact.
Critical findings
1. Historical submit discrepancy (Episode 5)
The canonical repair path from seed 0 (increase rt medium, increase tri medium, decrease elong small) produced a low-fi feasible state that crashed at high fidelity in this replay run.
Update: this is no longer the live repo conclusion. The newer manual submit
trace in ../P1_MANUAL_PLAYTEST_LOG.md records the same path succeeding at
high fidelity. Treat Episode 5 as evidence that submit behavior needed repeated
checking, not as proof that seed 0 lacks a viable submit path.
2. Elongation crash pocket (Episode 1)
VMEC crashes at elongation ~1.25-1.30 during low-fi evaluation, with recovery
at elongation=1.20. This crash zone is inside the documented parameter range
(1.2, 1.8) and was not discovered by the measured sweep (which only varied
rotational_transform and triangularity_scale in the targeted grid).
Implication: the elongation dimension has internal crash pockets that the current sweep does not map. Agents that decrease elongation aggressively will hit unexpected crashes.
3. Feasibility boundary is multi-parametric (Episode 4 step 1)
triangularity_scale increase medium alone worsens feasibility for seed 0
(0.051 to 0.083). The original manual playtest crossed feasibility only because
rotational_transform was already increased to 1.6 first. The feasibility
boundary is a surface in 4D parameter space, not a threshold on a single knob.
4. Feasibility is non-monotonic in triangularity (Episode 3 steps 2-3)
triangularity_scale=0.60 is feasible but 0.65 is not (from seed 2). The
feasible zone is a narrow band. Pushing a single knob further does not
monotonically improve the design.
Comparison with initial manual playtest
| Property | Initial (Ep A-B) | This replay |
|---|---|---|
| Seeds tested | 1 (seed 0) | 3 (seeds 0, 1, 2) |
| Episodes | 2 | 5 |
| Total steps | 3 | 27 |
| Reward branches covered | 3 of 13 | 12 of 13 |
| Feasible-side shaping | not tested | confirmed legible |
| VMEC crash handling | not tested | confirmed legible |
| restore_best | not tested | confirmed working |
| Submit tested | no | yes (historical replay crash) |
| Cross-fidelity evidence | none | mixed; superseded by newer successful manual submit trace |
Open items
- Export the newer high-fidelity-safe submit trace alongside this replay so the historical Episode 5 crash is not read as the live repo conclusion.
- Map the elongation crash pocket with a targeted sweep over the elongation dimension at feasible parameter combinations.
- Update the measured sweep report to document the elongation crash zone.
- Consider narrowing
elongationrange or documenting the crash pocket as a known hazard in the environment contract.