CreativeEngineer Claude Opus 4.6 commited on
Commit
25f0cfa
Β·
1 Parent(s): 9d7dc15

docs: archive submission plan with SSOT-aligned fixes

Browse files

Move to docs/archive/ since it is a tactical checklist, not a planning
SSOT. Fix training-optimism, align fallback language to SSOT section 10,
split training milestones, and soften reward branch claim.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs/SUBMISSION_PLAN_14H.md DELETED
@@ -1,111 +0,0 @@
1
- # Submission Plan β€” 14 Hours Remaining
2
-
3
- Date: 2026-03-07
4
-
5
- ## Current state
6
-
7
- Done:
8
-
9
- - environment contract locked and stable
10
- - official constellaration verifier wired (low-fi run, high-fi submit)
11
- - 3 frozen reset seeds validated by measured sweep
12
- - reward V0 tested across 12 of 13 branches (replay playtest report)
13
- - random and heuristic baselines committed
14
- - PPO smoke passed on Northflank H100 (plumbing gate)
15
- - replay playtest script committed with full trace output
16
-
17
- Not done:
18
-
19
- - [ ] trained policy with demo-quality trajectories
20
- - [ ] HF Space deployment
21
- - [ ] Colab notebook
22
- - [ ] 1-minute demo video
23
- - [ ] README polish
24
- - [ ] paired high-fidelity fixture checks
25
- - [ ] submit-side manual playtest with successful high-fi outcome
26
-
27
- ## Key finding: cross-fidelity gap
28
-
29
- The replay playtest (episode 5) confirmed that the canonical low-fi repair
30
- path from seed 0 crashes at high-fidelity evaluation. The state
31
- `(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` is low-fi feasible but high-fi
32
- VMEC failure.
33
-
34
- Decision: **do not attempt to fix this in the remaining time**. Reasons:
35
-
36
- 1. finding a high-fi-safe path requires sweep work with no time guarantee
37
- 2. the plan doc already frames the trained policy as supporting evidence, not the product
38
- 3. the gap itself is a strong honest finding for the submission narrative
39
-
40
- ## Time allocation
41
-
42
- | Task | Hours | Notes |
43
- |------|-------|-------|
44
- | Train low-fi PPO | 2-3 | PPO smoke already passed. 24 discrete actions, 6-step budget. Target: visible score improvement across seeds. Run on Northflank H100. |
45
- | HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. Debug dependency issues. |
46
- | Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
47
- | Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
48
- | README and repo polish | 1 | Last step. Only after artifacts exist. |
49
- | Buffer | 2-3 | Deployment issues, training bugs, unexpected blockers. |
50
- | **Total** | **~11-14** | |
51
-
52
- ## Execution order
53
-
54
- 1. **Training** β€” start first because it can run while doing other work.
55
- Use the existing `training/ppo_smoke.py` as the base. Train on all 3 seeds.
56
- Stop when a few trajectories show clear repair-arc behavior (cross
57
- feasibility, improve score). Do not overfit to one seed.
58
-
59
- 2. **HF Space** β€” deploy while training runs or immediately after. The server
60
- is already in `server/app.py`. Need to verify dependencies resolve on HF
61
- infra and that one reset-step-submit cycle completes cleanly.
62
-
63
- 3. **Colab notebook** β€” wire to the live HF Space endpoint. Load the trained
64
- checkpoint. Run a short episode. Add minimal narrative connecting the
65
- environment design to the trajectory evidence.
66
-
67
- 4. **Demo video** β€” 1 minute. Structure:
68
- - the problem (stellarator design is hard)
69
- - the environment (narrow, human-playable, real verifier)
70
- - human playtest (replay output showing legible reward)
71
- - trained agent (trajectory with visible improvement)
72
- - honest findings (cross-fidelity gap as a real insight)
73
-
74
- 5. **README polish** β€” update with links to HF Space, Colab, and video.
75
- Keep claims conservative. Reference the evidence docs.
76
-
77
- ## What to cut if time runs short
78
-
79
- Priority order (cut from the bottom):
80
-
81
- 1. Colab polish β€” minimal working notebook is enough
82
- 2. Training length β€” a few readable improving trajectories over a long run
83
- 3. README depth β€” link to docs, keep top-level short
84
- 4. Do NOT cut HF Space β€” hard requirement
85
- 5. Do NOT cut demo video β€” primary judge-facing artifact
86
-
87
- ## Submission narrative
88
-
89
- Frame the cross-fidelity gap as a strength, not a failure:
90
-
91
- > The environment is instrumented well enough to reveal that low-fidelity
92
- > feasibility does not guarantee high-fidelity success. This is a real
93
- > challenge in multi-fidelity scientific design and exactly the kind of
94
- > insight a well-built environment should surface.
95
-
96
- The story is:
97
-
98
- 1. we built a clear, narrow environment for one constrained design task
99
- 2. we tested it thoroughly (sweep, baselines, replay playtest, 12/13 reward branches)
100
- 3. we trained a policy that learns the low-fi repair arc
101
- 4. we discovered and documented an honest cross-fidelity gap
102
- 5. the environment is the product; the policy is evidence that it works
103
-
104
- ## Risk assessment
105
-
106
- | Risk | Likelihood | Mitigation |
107
- |------|-----------|------------|
108
- | Training does not converge | Low β€” smoke already passed | Show the smoke trajectories as evidence. Document what was tried. |
109
- | HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
110
- | High-fi submit never works | Already confirmed | Frame as documented finding. Do not promise high-fi results. |
111
- | Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/archive/SUBMISSION_PLAN_14H.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Submission Plan β€” 14 Hours Remaining
2
+
3
+ Date: 2026-03-07
4
+
5
+ Status: late-stage submission checklist, not a planning SSOT. The planning
6
+ SSOT is `FUSION_DESIGN_LAB_PLAN_V2.md`. This file captures time allocation
7
+ and cut priorities for the final push.
8
+
9
+ ## Current state (aligned to SSOT)
10
+
11
+ Done:
12
+
13
+ - environment contract locked and stable
14
+ - official constellaration verifier wired (low-fi run, high-fi submit)
15
+ - 3 frozen reset seeds validated by measured sweep
16
+ - reward V0 replay playtest committed (see `P1_REPLAY_PLAYTEST_REPORT.md` for branch-level detail)
17
+ - random and heuristic baselines committed
18
+ - paired high-fidelity fixture checks complete (all 3 fixtures)
19
+ - one successful submit-side manual trace recorded (Episode C in playtest log)
20
+ - PPO smoke completed as a plumbing/debugging gate (exposed repeated-action collapse, not training success)
21
+ - replay playtest script committed with full trace output
22
+
23
+ Not done (per SSOT execution order):
24
+
25
+ - [ ] reset-seed pool decision from paired checks
26
+ - [ ] heuristic baseline refresh on repaired-family evidence
27
+ - [ ] trained policy with non-degenerate trajectories (current smoke collapses to a single repeated action)
28
+ - [ ] if non-degenerate: push toward demo-quality trajectories showing feasibility crossing
29
+ - [ ] HF Space deployment
30
+ - [ ] Colab notebook
31
+ - [ ] 1-minute demo video
32
+ - [ ] README polish
33
+
34
+ ## Cross-fidelity status
35
+
36
+ The cross-fidelity picture is nuanced, not a blanket failure:
37
+
38
+ - `lowfi_feasible_local.json` shows a successful paired high-fi evaluation at
39
+ `(ar=3.6, elong=1.4, rt=1.6, tri=0.60)` β€” constraints satisfied, score=0.292
40
+ - Episode C manual trace shows a successful high-fi submit from seed 0 with
41
+ score=0.296, constraints satisfied
42
+ - the replay playtest episode 5 crashed at high-fi from
43
+ `(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` β€” one elongation decrease away
44
+
45
+ So: high-fi works for some feasible states but not all. The gap is
46
+ **path-dependent**, not universal. States closer to the original params
47
+ (elong=1.4) survive high-fi; states with decreased elongation (elong=1.35)
48
+ may not. This is a real multi-fidelity challenge, not a total blocker.
49
+
50
+ Decision: do not spend time trying to map the full high-fi-safe region.
51
+ Use the known-good submit path for the demo. Document the path-dependent
52
+ gap honestly.
53
+
54
+ ## Time allocation
55
+
56
+ | Task | Hours | Notes |
57
+ |------|-------|-------|
58
+ | Heuristic refresh + reset-seed decision | 1 | Per SSOT, this is the next execution step. Quick because the sweep and fixture evidence already exist. |
59
+ | Train low-fi PPO | 2-3 | Smoke passed as plumbing only. 25 discrete actions (24 run + restore_best), 6-step budget. The repeated-action collapse needs more timesteps or reward tuning. Convergence risk is real β€” the smoke exposed a failure mode, not success. |
60
+ | HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. |
61
+ | Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
62
+ | Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
63
+ | README and repo polish | 1 | Last step. Only after artifacts exist. |
64
+ | Buffer | 2-3 | Deployment issues, training tuning, unexpected blockers. |
65
+ | **Total** | **~11-14** | |
66
+
67
+ ## Execution order (aligned to SSOT)
68
+
69
+ 1. **Heuristic refresh + reset-seed decision** β€” per SSOT, this comes before
70
+ broader training. The measured sweep and paired fixture evidence already
71
+ exist. Decide whether any seed should move, refresh the heuristic to use
72
+ the repair path from the playtest log, and save one comparison trace.
73
+
74
+ 2. **Training** β€” start after the heuristic is refreshed so training runs
75
+ against a confirmed environment configuration. Use `training/ppo_smoke.py`
76
+ as the base but increase timesteps significantly. The smoke ran 64
77
+ timesteps and collapsed to a single repeated action β€” this is the expected
78
+ outcome of a plumbing gate, not evidence that training will converge
79
+ easily. First milestone: non-degenerate trajectories (varied actions,
80
+ not single-action collapse). Second milestone: feasibility crossing in
81
+ at least one evaluation episode. Do not assume demo-quality trajectories
82
+ are reachable without tuning. Can run on Northflank H100 in the background.
83
+ **Do not block HF Space, notebook, or video on training success.**
84
+
85
+ 3. **HF Space** β€” deploy while training runs. The server is in `server/app.py`.
86
+ Verify dependencies, prove one clean remote episode.
87
+
88
+ 4. **Colab notebook** β€” wire to the live HF Space endpoint. Load trained
89
+ checkpoint if available; otherwise show the heuristic or manual
90
+ trajectory as evidence.
91
+
92
+ 5. **Demo video** β€” 1 minute. Structure:
93
+ - the problem (stellarator design is hard)
94
+ - the environment (narrow, human-playable, real verifier)
95
+ - the evidence (successful submit trace, replay playtest coverage)
96
+ - trained agent if available (trajectory with visible improvement)
97
+ - honest findings (path-dependent cross-fidelity gap)
98
+
99
+ 6. **README polish** β€” update with links to HF Space, Colab, and video.
100
+ Keep claims conservative. Reference the evidence docs.
101
+
102
+ ## What to cut if time runs short
103
+
104
+ Priority order (cut from the bottom):
105
+
106
+ 1. Colab polish β€” minimal working notebook is enough
107
+ 2. Training length β€” a few readable trajectories over a long run
108
+ 3. README depth β€” link to docs, keep top-level short
109
+ 4. Reset-seed decision β€” keep current seeds if evidence is ambiguous
110
+ 5. Do NOT cut HF Space β€” hard requirement
111
+ 6. Do NOT cut demo video β€” primary judge-facing artifact
112
+
113
+ If training remains weak or degenerate:
114
+
115
+ - still ship the trained-policy demonstration, even if it only shows collapse or weak behavior
116
+ - supplement with the heuristic baseline or manual playtest Episode C as the primary evidence of environment usability
117
+ - document the training limitations plainly in the video and README
118
+ - per SSOT fallback rules (`FUSION_DESIGN_LAB_PLAN_V2.md` section 10): "keep claims conservative about policy quality" and "still ship a trained-policy demonstration and document its limitations plainly"
119
+ - do NOT wait for strong PPO before shipping HF Space, notebook, and video
120
+
121
+ ## Submission narrative
122
+
123
+ The story is:
124
+
125
+ 1. we built a clear, narrow environment for one constrained design task
126
+ 2. we tested it thoroughly (sweep, baselines, replay playtest with broad reward branch coverage)
127
+ 3. the environment has a known-good submit path (Episode C: successful high-fi, score=0.296)
128
+ 4. we discovered a path-dependent cross-fidelity gap (some low-fi feasible states crash at high-fi)
129
+ 5. the environment is the product; the policy is evidence that it works
130
+
131
+ ## Risk assessment
132
+
133
+ | Risk | Likelihood | Mitigation |
134
+ |------|-----------|------------|
135
+ | Training does not converge | Medium β€” smoke exposed collapse, not success | Show the heuristic trajectory or manual playtest as fallback evidence. Document what was tried. Keep claims conservative per SSOT fallback rules. |
136
+ | HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
137
+ | Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |