Commit Β·
a02ffad
1
Parent(s): bca349b
docs: add 14-hour submission plan with time allocation and cut priorities
Browse files- docs/SUBMISSION_PLAN_14H.md +111 -0
docs/SUBMISSION_PLAN_14H.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Submission Plan β 14 Hours Remaining
|
| 2 |
+
|
| 3 |
+
Date: 2026-03-07
|
| 4 |
+
|
| 5 |
+
## Current state
|
| 6 |
+
|
| 7 |
+
Done:
|
| 8 |
+
|
| 9 |
+
- environment contract locked and stable
|
| 10 |
+
- official constellaration verifier wired (low-fi run, high-fi submit)
|
| 11 |
+
- 3 frozen reset seeds validated by measured sweep
|
| 12 |
+
- reward V0 tested across 12 of 13 branches (replay playtest report)
|
| 13 |
+
- random and heuristic baselines committed
|
| 14 |
+
- PPO smoke passed on Northflank H100 (plumbing gate)
|
| 15 |
+
- replay playtest script committed with full trace output
|
| 16 |
+
|
| 17 |
+
Not done:
|
| 18 |
+
|
| 19 |
+
- [ ] trained policy with demo-quality trajectories
|
| 20 |
+
- [ ] HF Space deployment
|
| 21 |
+
- [ ] Colab notebook
|
| 22 |
+
- [ ] 1-minute demo video
|
| 23 |
+
- [ ] README polish
|
| 24 |
+
- [ ] paired high-fidelity fixture checks
|
| 25 |
+
- [ ] submit-side manual playtest with successful high-fi outcome
|
| 26 |
+
|
| 27 |
+
## Key finding: cross-fidelity gap
|
| 28 |
+
|
| 29 |
+
The replay playtest (episode 5) confirmed that the canonical low-fi repair
|
| 30 |
+
path from seed 0 crashes at high-fidelity evaluation. The state
|
| 31 |
+
`(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` is low-fi feasible but high-fi
|
| 32 |
+
VMEC failure.
|
| 33 |
+
|
| 34 |
+
Decision: **do not attempt to fix this in the remaining time**. Reasons:
|
| 35 |
+
|
| 36 |
+
1. finding a high-fi-safe path requires sweep work with no time guarantee
|
| 37 |
+
2. the plan doc already frames the trained policy as supporting evidence, not the product
|
| 38 |
+
3. the gap itself is a strong honest finding for the submission narrative
|
| 39 |
+
|
| 40 |
+
## Time allocation
|
| 41 |
+
|
| 42 |
+
| Task | Hours | Notes |
|
| 43 |
+
|------|-------|-------|
|
| 44 |
+
| Train low-fi PPO | 2-3 | PPO smoke already passed. 24 discrete actions, 6-step budget. Target: visible score improvement across seeds. Run on Northflank H100. |
|
| 45 |
+
| HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. Debug dependency issues. |
|
| 46 |
+
| Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
|
| 47 |
+
| Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
|
| 48 |
+
| README and repo polish | 1 | Last step. Only after artifacts exist. |
|
| 49 |
+
| Buffer | 2-3 | Deployment issues, training bugs, unexpected blockers. |
|
| 50 |
+
| **Total** | **~11-14** | |
|
| 51 |
+
|
| 52 |
+
## Execution order
|
| 53 |
+
|
| 54 |
+
1. **Training** β start first because it can run while doing other work.
|
| 55 |
+
Use the existing `training/ppo_smoke.py` as the base. Train on all 3 seeds.
|
| 56 |
+
Stop when a few trajectories show clear repair-arc behavior (cross
|
| 57 |
+
feasibility, improve score). Do not overfit to one seed.
|
| 58 |
+
|
| 59 |
+
2. **HF Space** β deploy while training runs or immediately after. The server
|
| 60 |
+
is already in `server/app.py`. Need to verify dependencies resolve on HF
|
| 61 |
+
infra and that one reset-step-submit cycle completes cleanly.
|
| 62 |
+
|
| 63 |
+
3. **Colab notebook** β wire to the live HF Space endpoint. Load the trained
|
| 64 |
+
checkpoint. Run a short episode. Add minimal narrative connecting the
|
| 65 |
+
environment design to the trajectory evidence.
|
| 66 |
+
|
| 67 |
+
4. **Demo video** β 1 minute. Structure:
|
| 68 |
+
- the problem (stellarator design is hard)
|
| 69 |
+
- the environment (narrow, human-playable, real verifier)
|
| 70 |
+
- human playtest (replay output showing legible reward)
|
| 71 |
+
- trained agent (trajectory with visible improvement)
|
| 72 |
+
- honest findings (cross-fidelity gap as a real insight)
|
| 73 |
+
|
| 74 |
+
5. **README polish** β update with links to HF Space, Colab, and video.
|
| 75 |
+
Keep claims conservative. Reference the evidence docs.
|
| 76 |
+
|
| 77 |
+
## What to cut if time runs short
|
| 78 |
+
|
| 79 |
+
Priority order (cut from the bottom):
|
| 80 |
+
|
| 81 |
+
1. Colab polish β minimal working notebook is enough
|
| 82 |
+
2. Training length β a few readable improving trajectories over a long run
|
| 83 |
+
3. README depth β link to docs, keep top-level short
|
| 84 |
+
4. Do NOT cut HF Space β hard requirement
|
| 85 |
+
5. Do NOT cut demo video β primary judge-facing artifact
|
| 86 |
+
|
| 87 |
+
## Submission narrative
|
| 88 |
+
|
| 89 |
+
Frame the cross-fidelity gap as a strength, not a failure:
|
| 90 |
+
|
| 91 |
+
> The environment is instrumented well enough to reveal that low-fidelity
|
| 92 |
+
> feasibility does not guarantee high-fidelity success. This is a real
|
| 93 |
+
> challenge in multi-fidelity scientific design and exactly the kind of
|
| 94 |
+
> insight a well-built environment should surface.
|
| 95 |
+
|
| 96 |
+
The story is:
|
| 97 |
+
|
| 98 |
+
1. we built a clear, narrow environment for one constrained design task
|
| 99 |
+
2. we tested it thoroughly (sweep, baselines, replay playtest, 12/13 reward branches)
|
| 100 |
+
3. we trained a policy that learns the low-fi repair arc
|
| 101 |
+
4. we discovered and documented an honest cross-fidelity gap
|
| 102 |
+
5. the environment is the product; the policy is evidence that it works
|
| 103 |
+
|
| 104 |
+
## Risk assessment
|
| 105 |
+
|
| 106 |
+
| Risk | Likelihood | Mitigation |
|
| 107 |
+
|------|-----------|------------|
|
| 108 |
+
| Training does not converge | Low β smoke already passed | Show the smoke trajectories as evidence. Document what was tried. |
|
| 109 |
+
| HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
|
| 110 |
+
| High-fi submit never works | Already confirmed | Frame as documented finding. Do not promise high-fi results. |
|
| 111 |
+
| Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |
|