fusion-design-lab / docs /findings /P1_REPLAY_PLAYTEST_REPORT.md
CreativeEngineer's picture
feat: refresh heuristic baseline and sync docs
f238af4

P1 Replay Playtest Report

Date: 2026-03-07

Update: 2026-03-08

This report is still useful for reward-branch coverage and low-fidelity failure pathologies, but its Episode 5 submit result is now historical only. The newer manual submit trace in ../P1_MANUAL_PLAYTEST_LOG.md records the same rotational_transform increase medium -> triangularity_scale increase medium -> elongation decrease small -> submit path succeeding at high fidelity with score 0.296059. Do not use this replay report as the current source of truth for submit viability.

Purpose

Expand reward branch coverage beyond the initial manual playtest (Episodes A-B in P1_MANUAL_PLAYTEST_LOG.md). That log covered 1 seed, 2 episodes, 3 steps. This replay covers all 3 seeds, 5 episodes, 27 steps, and exercises every reward branch in server/environment.py:_compute_reward.

Method

Script: baselines/replay_playtest.py

  • direct StellaratorEnvironment instantiation (no server)
  • fixed action sequences for reproducibility
  • same pattern as baselines/random_agent.py and baselines/heuristic_agent.py

Episode results

Episode 1: seed 0 — repair + objective shaping + budget exhaustion

Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0

Step Action Reward Score Feasibility Elongation Status Budget
1 rt increase medium -0.1000 0.000000 0.050653 6.7295 viol 5
2 tri increase medium +3.1533 0.291660 0.000000 7.3751 OK 4
3 elong decrease small +0.2665 0.295731 0.000865 7.3384 OK 3
4 elong decrease small -2.1000 0.000000 1000000 10.0000 FAIL 2
5 elong decrease small -2.1000 0.000000 1000000 10.0000 FAIL 1
6 elong decrease small +2.5350 0.307074 0.004561 7.2363 OK 0

Total reward: +1.6548

Branches exercised:

  • feasibility crossing bonus (+3.0, step 2)
  • feasible-side elongation shaping (step 3)
  • VMEC failure penalty (-2.1, steps 4-5)
  • recovery bonus (+1.0, step 6)
  • budget exhaustion done-time improvement bonus (step 6)

Finding: elongation crash pocket at elong ~1.30-1.25. Steps 4-5 crashed during low-fi evaluation after decreasing elongation from 1.35 to 1.30 and 1.25. Recovery occurred at elong=1.20 (step 6). This crash zone is within the documented parameter range (1.2, 1.8) and is not mapped in the measured sweep.

Episode 2: seed 1 — repair from different seed

Start: ar=3.4, elong=1.4, rt=1.6, tri=0.55, feasibility=0.050653, score=0.0

Step Action Reward Score Feasibility Elongation Status Budget
1 rt increase medium -0.1000 0.000000 0.050653 6.8493 viol 5
2 tri increase medium +3.1042 0.276209 0.009819 7.5141 OK 4
3 elong decrease small +0.2824 0.280458 0.001415 7.4759 OK 3
4 elong decrease small +0.2724 0.284596 0.002252 7.4386 OK 2
5 elong decrease small +0.2557 0.288548 0.003499 7.4031 OK 1
6 elong decrease small +0.8212 0.292289 0.004561 7.3694 OK 0

Total reward: +4.6359

Branches exercised:

  • feasibility crossing from a non-default seed (step 2)
  • sustained feasible-side elongation shaping (steps 3-6)
  • budget exhaustion done-time improvement bonus (step 6)

Finding: cleanest full-episode success. Six consecutive successful evaluations, monotonic score improvement, positive reward every step after crossing. Confirms that the repair+optimize arc is legible across a full episode from seed 1.

Episode 3: seed 2 — boundary clamping + feasibility regression

Start: ar=3.8, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0

Step Action Reward Score Feasibility Elongation Status Budget
1 ar increase large -0.1000 0.000000 0.050653 6.5502 viol 5
2 tri increase medium +3.1533 0.314255 0.000000 7.1717 OK 4
3 tri increase medium -3.3598 0.000000 0.051950 7.8596 viol 3
4 elong decrease small -0.0715 0.000000 0.046243 7.8309 viol 2
5 ar decrease large -0.4932 0.000000 0.124880 7.3386 viol 1
6 elong decrease small -0.5650 0.000000 0.117873 7.3091 viol 0

Total reward: -1.4362

Branches exercised:

  • boundary clamping (step 1: ar=3.8 + 0.2 clamped at 3.8, no physics change, reward = step cost only)
  • feasibility crossing bonus (+3.0, step 2)
  • feasibility regression penalty (-3.0, step 3: pushed tri too far, lost feasibility)
  • infeasible feasibility shaping (steps 4-6)
  • budget exhaustion done-time penalty (step 6: not improved)

Finding: feasibility is non-monotonic in triangularity_scale. Crossing at tri=0.60 (score=0.314), but tri=0.65 breaks feasibility (feas=0.052). The feasible zone is a narrow band, not an open region. The regression penalty (-3.36 total) is clearly legible.

Episode 4: seed 0 — crash recovery + restore_best

Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0

Step Action Reward Score Feasibility Elongation Status Budget
1 tri increase medium -0.2593 0.000000 0.082515 6.7218 viol 5
2 rt increase large +3.3126 0.210239 0.000000 8.1079 OK 4
3 rt increase large -2.1000 0.000000 1000000 10.0000 FAIL 3
4 restore_best +0.9000 0.210239 0.000000 8.1079 OK 2
5 elong decrease small +0.2541 0.214174 0.000865 8.0724 OK 1
6 elong decrease small +0.6821 0.218018 0.002252 8.0378 OK 0

Total reward: +2.7895

Branches exercised:

  • infeasible feasibility shaping (step 1: tri alone worsened feasibility)
  • feasibility crossing via large rt jump (step 2)
  • VMEC failure at rt=1.9 (-2.1, step 3: crash zone as documented in sweep report)
  • restore_best + recovery bonus (+0.9, step 4: reverts to best-known state, +1.0 recovery -0.1 step cost)
  • feasible-side elongation shaping (steps 5-6)
  • budget exhaustion done-time improvement bonus (step 6)

Finding: restore_best works correctly and the recovery bonus (+1.0) is legible. After reverting from a VMEC crash, the agent can continue improving from its saved best state.

Note: step 1 reveals that triangularity_scale increase medium alone (without a preceding rt increase) worsens feasibility for seed 0. The feasibility boundary is a multi-parameter surface, not a single-knob threshold.

Episode 5: seed 0 — repair + objective move + explicit submit

Start: ar=3.6, elong=1.4, rt=1.5, tri=0.55, feasibility=0.050653, score=0.0

Step Action Reward Score Feasibility Elongation Status Budget
1 rt increase medium -0.1000 0.000000 0.050653 6.7295 viol 5
2 tri increase medium +3.1533 0.291660 0.000000 7.3751 OK 4
3 elong decrease small +0.2665 0.295731 0.000865 7.3384 OK 3
4 submit -3.0000 0.000000 1000000 10.0000 FAIL 3

Total reward: +0.3198

Branches exercised:

  • feasibility crossing (step 2)
  • feasible-side elongation shaping (step 3)
  • submit high-fidelity evaluation (step 4)
  • submit failure penalty (-3.0, step 4: VMEC crash at high fidelity)

Historical finding: the state at (ar=3.6, elong=1.35, rt=1.6, tri=0.60) passes low-fidelity evaluation (step 3: score=0.296, constraints satisfied) but crashes at high-fidelity evaluation in this replay run (step 4: VMEC failure). A newer manual submit trace now records the same action sequence succeeding at high fidelity, so this episode should be treated as a historical discrepancy rather than live evidence of a persistent cross-fidelity gap.

Reward branch coverage summary

Branch Code reference First run This replay
Feasibility crossing bonus (+3.0) environment.py:235-236 Ep A step 2 Ep 1-4
Feasibility regression penalty (-3.0) environment.py:237-238 not tested Ep 3 step 3
Feasible-side elongation shaping environment.py:240-241 not tested Ep 1-2, Ep 4
Infeasible feasibility shaping environment.py:242-243 Ep A step 1 Ep 3 steps 4-6
Step cost (-0.1) environment.py:245-246 Ep A step 1 all run steps
VMEC failure penalty (-2.1) environment.py:223-226 not tested Ep 1 steps 4-5, Ep 4 step 3
Submit failure penalty (-3.0) environment.py:227-228 not tested Ep 5 step 4
Budget exhaustion done-penalty environment.py:264-265 not tested Ep 3 step 6
Recovery bonus (+1.0) environment.py:248-249 not tested Ep 1 step 6, Ep 4 step 4
Budget exhaustion done-bonus environment.py:258-263 not tested Ep 1 step 6, Ep 2 step 6, Ep 4 step 6
Submit improvement bonus environment.py:260-261 not tested historical replay did not trigger it
Clamping (no physics change) environment.py:412-414 not tested Ep 3 step 1
restore_best environment.py:175-195 not tested Ep 4 step 4

Coverage: 12 of 13 branches exercised in this replay. The only untested branch here is the submit improvement bonus. A newer manual submit trace now provides positive high-fidelity submit evidence, but that branch was not exercised in this historical replay artifact.

Critical findings

1. Historical submit discrepancy (Episode 5)

The canonical repair path from seed 0 (increase rt medium, increase tri medium, decrease elong small) produced a low-fi feasible state that crashed at high fidelity in this replay run.

Update: this is no longer the live repo conclusion. The newer manual submit trace in ../P1_MANUAL_PLAYTEST_LOG.md records the same path succeeding at high fidelity. Treat Episode 5 as evidence that submit behavior needed repeated checking, not as proof that seed 0 lacks a viable submit path.

2. Elongation crash pocket (Episode 1)

VMEC crashes at elongation ~1.25-1.30 during low-fi evaluation, with recovery at elongation=1.20. This crash zone is inside the documented parameter range (1.2, 1.8) and was not discovered by the measured sweep (which only varied rotational_transform and triangularity_scale in the targeted grid).

Implication: the elongation dimension has internal crash pockets that the current sweep does not map. Agents that decrease elongation aggressively will hit unexpected crashes.

3. Feasibility boundary is multi-parametric (Episode 4 step 1)

triangularity_scale increase medium alone worsens feasibility for seed 0 (0.051 to 0.083). The original manual playtest crossed feasibility only because rotational_transform was already increased to 1.6 first. The feasibility boundary is a surface in 4D parameter space, not a threshold on a single knob.

4. Feasibility is non-monotonic in triangularity (Episode 3 steps 2-3)

triangularity_scale=0.60 is feasible but 0.65 is not (from seed 2). The feasible zone is a narrow band. Pushing a single knob further does not monotonically improve the design.

Comparison with initial manual playtest

Property Initial (Ep A-B) This replay
Seeds tested 1 (seed 0) 3 (seeds 0, 1, 2)
Episodes 2 5
Total steps 3 27
Reward branches covered 3 of 13 12 of 13
Feasible-side shaping not tested confirmed legible
VMEC crash handling not tested confirmed legible
restore_best not tested confirmed working
Submit tested no yes (historical replay crash)
Cross-fidelity evidence none mixed; superseded by newer successful manual submit trace

Open items

  1. Export the newer high-fidelity-safe submit trace alongside this replay so the historical Episode 5 crash is not read as the live repo conclusion.
  2. Map the elongation crash pocket with a targeted sweep over the elongation dimension at feasible parameter combinations.
  3. Update the measured sweep report to document the elongation crash zone.
  4. Consider narrowing elongation range or documenting the crash pocket as a known hazard in the environment contract.