CreativeEngineer Claude Opus 4.6 commited on
Commit
a02ffad
Β·
1 Parent(s): bca349b

docs: add 14-hour submission plan with time allocation and cut priorities

Browse files
Files changed (1) hide show
  1. docs/SUBMISSION_PLAN_14H.md +111 -0
docs/SUBMISSION_PLAN_14H.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Submission Plan β€” 14 Hours Remaining
2
+
3
+ Date: 2026-03-07
4
+
5
+ ## Current state
6
+
7
+ Done:
8
+
9
+ - environment contract locked and stable
10
+ - official constellaration verifier wired (low-fi run, high-fi submit)
11
+ - 3 frozen reset seeds validated by measured sweep
12
+ - reward V0 tested across 12 of 13 branches (replay playtest report)
13
+ - random and heuristic baselines committed
14
+ - PPO smoke passed on Northflank H100 (plumbing gate)
15
+ - replay playtest script committed with full trace output
16
+
17
+ Not done:
18
+
19
+ - [ ] trained policy with demo-quality trajectories
20
+ - [ ] HF Space deployment
21
+ - [ ] Colab notebook
22
+ - [ ] 1-minute demo video
23
+ - [ ] README polish
24
+ - [ ] paired high-fidelity fixture checks
25
+ - [ ] submit-side manual playtest with successful high-fi outcome
26
+
27
+ ## Key finding: cross-fidelity gap
28
+
29
+ The replay playtest (episode 5) confirmed that the canonical low-fi repair
30
+ path from seed 0 crashes at high-fidelity evaluation. The state
31
+ `(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` is low-fi feasible but high-fi
32
+ VMEC failure.
33
+
34
+ Decision: **do not attempt to fix this in the remaining time**. Reasons:
35
+
36
+ 1. finding a high-fi-safe path requires sweep work with no time guarantee
37
+ 2. the plan doc already frames the trained policy as supporting evidence, not the product
38
+ 3. the gap itself is a strong honest finding for the submission narrative
39
+
40
+ ## Time allocation
41
+
42
+ | Task | Hours | Notes |
43
+ |------|-------|-------|
44
+ | Train low-fi PPO | 2-3 | PPO smoke already passed. 24 discrete actions, 6-step budget. Target: visible score improvement across seeds. Run on Northflank H100. |
45
+ | HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. Debug dependency issues. |
46
+ | Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
47
+ | Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
48
+ | README and repo polish | 1 | Last step. Only after artifacts exist. |
49
+ | Buffer | 2-3 | Deployment issues, training bugs, unexpected blockers. |
50
+ | **Total** | **~11-14** | |
51
+
52
+ ## Execution order
53
+
54
+ 1. **Training** β€” start first because it can run while doing other work.
55
+ Use the existing `training/ppo_smoke.py` as the base. Train on all 3 seeds.
56
+ Stop when a few trajectories show clear repair-arc behavior (cross
57
+ feasibility, improve score). Do not overfit to one seed.
58
+
59
+ 2. **HF Space** β€” deploy while training runs or immediately after. The server
60
+ is already in `server/app.py`. Need to verify dependencies resolve on HF
61
+ infra and that one reset-step-submit cycle completes cleanly.
62
+
63
+ 3. **Colab notebook** β€” wire to the live HF Space endpoint. Load the trained
64
+ checkpoint. Run a short episode. Add minimal narrative connecting the
65
+ environment design to the trajectory evidence.
66
+
67
+ 4. **Demo video** β€” 1 minute. Structure:
68
+ - the problem (stellarator design is hard)
69
+ - the environment (narrow, human-playable, real verifier)
70
+ - human playtest (replay output showing legible reward)
71
+ - trained agent (trajectory with visible improvement)
72
+ - honest findings (cross-fidelity gap as a real insight)
73
+
74
+ 5. **README polish** β€” update with links to HF Space, Colab, and video.
75
+ Keep claims conservative. Reference the evidence docs.
76
+
77
+ ## What to cut if time runs short
78
+
79
+ Priority order (cut from the bottom):
80
+
81
+ 1. Colab polish β€” minimal working notebook is enough
82
+ 2. Training length β€” a few readable improving trajectories over a long run
83
+ 3. README depth β€” link to docs, keep top-level short
84
+ 4. Do NOT cut HF Space β€” hard requirement
85
+ 5. Do NOT cut demo video β€” primary judge-facing artifact
86
+
87
+ ## Submission narrative
88
+
89
+ Frame the cross-fidelity gap as a strength, not a failure:
90
+
91
+ > The environment is instrumented well enough to reveal that low-fidelity
92
+ > feasibility does not guarantee high-fidelity success. This is a real
93
+ > challenge in multi-fidelity scientific design and exactly the kind of
94
+ > insight a well-built environment should surface.
95
+
96
+ The story is:
97
+
98
+ 1. we built a clear, narrow environment for one constrained design task
99
+ 2. we tested it thoroughly (sweep, baselines, replay playtest, 12/13 reward branches)
100
+ 3. we trained a policy that learns the low-fi repair arc
101
+ 4. we discovered and documented an honest cross-fidelity gap
102
+ 5. the environment is the product; the policy is evidence that it works
103
+
104
+ ## Risk assessment
105
+
106
+ | Risk | Likelihood | Mitigation |
107
+ |------|-----------|------------|
108
+ | Training does not converge | Low β€” smoke already passed | Show the smoke trajectories as evidence. Document what was tried. |
109
+ | HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
110
+ | High-fi submit never works | Already confirmed | Frame as documented finding. Do not promise high-fi results. |
111
+ | Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |