Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit Β·
25f0cfa
1
Parent(s): 9d7dc15
docs: archive submission plan with SSOT-aligned fixes
Browse filesMove to docs/archive/ since it is a tactical checklist, not a planning
SSOT. Fix training-optimism, align fallback language to SSOT section 10,
split training milestones, and soften reward branch claim.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docs/SUBMISSION_PLAN_14H.md +0 -111
- docs/archive/SUBMISSION_PLAN_14H.md +137 -0
docs/SUBMISSION_PLAN_14H.md
DELETED
|
@@ -1,111 +0,0 @@
|
|
| 1 |
-
# Submission Plan β 14 Hours Remaining
|
| 2 |
-
|
| 3 |
-
Date: 2026-03-07
|
| 4 |
-
|
| 5 |
-
## Current state
|
| 6 |
-
|
| 7 |
-
Done:
|
| 8 |
-
|
| 9 |
-
- environment contract locked and stable
|
| 10 |
-
- official constellaration verifier wired (low-fi run, high-fi submit)
|
| 11 |
-
- 3 frozen reset seeds validated by measured sweep
|
| 12 |
-
- reward V0 tested across 12 of 13 branches (replay playtest report)
|
| 13 |
-
- random and heuristic baselines committed
|
| 14 |
-
- PPO smoke passed on Northflank H100 (plumbing gate)
|
| 15 |
-
- replay playtest script committed with full trace output
|
| 16 |
-
|
| 17 |
-
Not done:
|
| 18 |
-
|
| 19 |
-
- [ ] trained policy with demo-quality trajectories
|
| 20 |
-
- [ ] HF Space deployment
|
| 21 |
-
- [ ] Colab notebook
|
| 22 |
-
- [ ] 1-minute demo video
|
| 23 |
-
- [ ] README polish
|
| 24 |
-
- [ ] paired high-fidelity fixture checks
|
| 25 |
-
- [ ] submit-side manual playtest with successful high-fi outcome
|
| 26 |
-
|
| 27 |
-
## Key finding: cross-fidelity gap
|
| 28 |
-
|
| 29 |
-
The replay playtest (episode 5) confirmed that the canonical low-fi repair
|
| 30 |
-
path from seed 0 crashes at high-fidelity evaluation. The state
|
| 31 |
-
`(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` is low-fi feasible but high-fi
|
| 32 |
-
VMEC failure.
|
| 33 |
-
|
| 34 |
-
Decision: **do not attempt to fix this in the remaining time**. Reasons:
|
| 35 |
-
|
| 36 |
-
1. finding a high-fi-safe path requires sweep work with no time guarantee
|
| 37 |
-
2. the plan doc already frames the trained policy as supporting evidence, not the product
|
| 38 |
-
3. the gap itself is a strong honest finding for the submission narrative
|
| 39 |
-
|
| 40 |
-
## Time allocation
|
| 41 |
-
|
| 42 |
-
| Task | Hours | Notes |
|
| 43 |
-
|------|-------|-------|
|
| 44 |
-
| Train low-fi PPO | 2-3 | PPO smoke already passed. 24 discrete actions, 6-step budget. Target: visible score improvement across seeds. Run on Northflank H100. |
|
| 45 |
-
| HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. Debug dependency issues. |
|
| 46 |
-
| Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
|
| 47 |
-
| Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
|
| 48 |
-
| README and repo polish | 1 | Last step. Only after artifacts exist. |
|
| 49 |
-
| Buffer | 2-3 | Deployment issues, training bugs, unexpected blockers. |
|
| 50 |
-
| **Total** | **~11-14** | |
|
| 51 |
-
|
| 52 |
-
## Execution order
|
| 53 |
-
|
| 54 |
-
1. **Training** β start first because it can run while doing other work.
|
| 55 |
-
Use the existing `training/ppo_smoke.py` as the base. Train on all 3 seeds.
|
| 56 |
-
Stop when a few trajectories show clear repair-arc behavior (cross
|
| 57 |
-
feasibility, improve score). Do not overfit to one seed.
|
| 58 |
-
|
| 59 |
-
2. **HF Space** β deploy while training runs or immediately after. The server
|
| 60 |
-
is already in `server/app.py`. Need to verify dependencies resolve on HF
|
| 61 |
-
infra and that one reset-step-submit cycle completes cleanly.
|
| 62 |
-
|
| 63 |
-
3. **Colab notebook** β wire to the live HF Space endpoint. Load the trained
|
| 64 |
-
checkpoint. Run a short episode. Add minimal narrative connecting the
|
| 65 |
-
environment design to the trajectory evidence.
|
| 66 |
-
|
| 67 |
-
4. **Demo video** β 1 minute. Structure:
|
| 68 |
-
- the problem (stellarator design is hard)
|
| 69 |
-
- the environment (narrow, human-playable, real verifier)
|
| 70 |
-
- human playtest (replay output showing legible reward)
|
| 71 |
-
- trained agent (trajectory with visible improvement)
|
| 72 |
-
- honest findings (cross-fidelity gap as a real insight)
|
| 73 |
-
|
| 74 |
-
5. **README polish** β update with links to HF Space, Colab, and video.
|
| 75 |
-
Keep claims conservative. Reference the evidence docs.
|
| 76 |
-
|
| 77 |
-
## What to cut if time runs short
|
| 78 |
-
|
| 79 |
-
Priority order (cut from the bottom):
|
| 80 |
-
|
| 81 |
-
1. Colab polish β minimal working notebook is enough
|
| 82 |
-
2. Training length β a few readable improving trajectories over a long run
|
| 83 |
-
3. README depth β link to docs, keep top-level short
|
| 84 |
-
4. Do NOT cut HF Space β hard requirement
|
| 85 |
-
5. Do NOT cut demo video β primary judge-facing artifact
|
| 86 |
-
|
| 87 |
-
## Submission narrative
|
| 88 |
-
|
| 89 |
-
Frame the cross-fidelity gap as a strength, not a failure:
|
| 90 |
-
|
| 91 |
-
> The environment is instrumented well enough to reveal that low-fidelity
|
| 92 |
-
> feasibility does not guarantee high-fidelity success. This is a real
|
| 93 |
-
> challenge in multi-fidelity scientific design and exactly the kind of
|
| 94 |
-
> insight a well-built environment should surface.
|
| 95 |
-
|
| 96 |
-
The story is:
|
| 97 |
-
|
| 98 |
-
1. we built a clear, narrow environment for one constrained design task
|
| 99 |
-
2. we tested it thoroughly (sweep, baselines, replay playtest, 12/13 reward branches)
|
| 100 |
-
3. we trained a policy that learns the low-fi repair arc
|
| 101 |
-
4. we discovered and documented an honest cross-fidelity gap
|
| 102 |
-
5. the environment is the product; the policy is evidence that it works
|
| 103 |
-
|
| 104 |
-
## Risk assessment
|
| 105 |
-
|
| 106 |
-
| Risk | Likelihood | Mitigation |
|
| 107 |
-
|------|-----------|------------|
|
| 108 |
-
| Training does not converge | Low β smoke already passed | Show the smoke trajectories as evidence. Document what was tried. |
|
| 109 |
-
| HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
|
| 110 |
-
| High-fi submit never works | Already confirmed | Frame as documented finding. Do not promise high-fi results. |
|
| 111 |
-
| Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/archive/SUBMISSION_PLAN_14H.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Submission Plan β 14 Hours Remaining
|
| 2 |
+
|
| 3 |
+
Date: 2026-03-07
|
| 4 |
+
|
| 5 |
+
Status: late-stage submission checklist, not a planning SSOT. The planning
|
| 6 |
+
SSOT is `FUSION_DESIGN_LAB_PLAN_V2.md`. This file captures time allocation
|
| 7 |
+
and cut priorities for the final push.
|
| 8 |
+
|
| 9 |
+
## Current state (aligned to SSOT)
|
| 10 |
+
|
| 11 |
+
Done:
|
| 12 |
+
|
| 13 |
+
- environment contract locked and stable
|
| 14 |
+
- official constellaration verifier wired (low-fi run, high-fi submit)
|
| 15 |
+
- 3 frozen reset seeds validated by measured sweep
|
| 16 |
+
- reward V0 replay playtest committed (see `P1_REPLAY_PLAYTEST_REPORT.md` for branch-level detail)
|
| 17 |
+
- random and heuristic baselines committed
|
| 18 |
+
- paired high-fidelity fixture checks complete (all 3 fixtures)
|
| 19 |
+
- one successful submit-side manual trace recorded (Episode C in playtest log)
|
| 20 |
+
- PPO smoke completed as a plumbing/debugging gate (exposed repeated-action collapse, not training success)
|
| 21 |
+
- replay playtest script committed with full trace output
|
| 22 |
+
|
| 23 |
+
Not done (per SSOT execution order):
|
| 24 |
+
|
| 25 |
+
- [ ] reset-seed pool decision from paired checks
|
| 26 |
+
- [ ] heuristic baseline refresh on repaired-family evidence
|
| 27 |
+
- [ ] trained policy with non-degenerate trajectories (current smoke collapses to a single repeated action)
|
| 28 |
+
- [ ] if non-degenerate: push toward demo-quality trajectories showing feasibility crossing
|
| 29 |
+
- [ ] HF Space deployment
|
| 30 |
+
- [ ] Colab notebook
|
| 31 |
+
- [ ] 1-minute demo video
|
| 32 |
+
- [ ] README polish
|
| 33 |
+
|
| 34 |
+
## Cross-fidelity status
|
| 35 |
+
|
| 36 |
+
The cross-fidelity picture is nuanced, not a blanket failure:
|
| 37 |
+
|
| 38 |
+
- `lowfi_feasible_local.json` shows a successful paired high-fi evaluation at
|
| 39 |
+
`(ar=3.6, elong=1.4, rt=1.6, tri=0.60)` β constraints satisfied, score=0.292
|
| 40 |
+
- Episode C manual trace shows a successful high-fi submit from seed 0 with
|
| 41 |
+
score=0.296, constraints satisfied
|
| 42 |
+
- the replay playtest episode 5 crashed at high-fi from
|
| 43 |
+
`(ar=3.6, elong=1.35, rt=1.6, tri=0.60)` β one elongation decrease away
|
| 44 |
+
|
| 45 |
+
So: high-fi works for some feasible states but not all. The gap is
|
| 46 |
+
**path-dependent**, not universal. States closer to the original params
|
| 47 |
+
(elong=1.4) survive high-fi; states with decreased elongation (elong=1.35)
|
| 48 |
+
may not. This is a real multi-fidelity challenge, not a total blocker.
|
| 49 |
+
|
| 50 |
+
Decision: do not spend time trying to map the full high-fi-safe region.
|
| 51 |
+
Use the known-good submit path for the demo. Document the path-dependent
|
| 52 |
+
gap honestly.
|
| 53 |
+
|
| 54 |
+
## Time allocation
|
| 55 |
+
|
| 56 |
+
| Task | Hours | Notes |
|
| 57 |
+
|------|-------|-------|
|
| 58 |
+
| Heuristic refresh + reset-seed decision | 1 | Per SSOT, this is the next execution step. Quick because the sweep and fixture evidence already exist. |
|
| 59 |
+
| Train low-fi PPO | 2-3 | Smoke passed as plumbing only. 25 discrete actions (24 run + restore_best), 6-step budget. The repeated-action collapse needs more timesteps or reward tuning. Convergence risk is real β the smoke exposed a failure mode, not success. |
|
| 60 |
+
| HF Space deployment | 2-3 | Hard requirement. Deploy FastAPI server, prove one clean remote episode. |
|
| 61 |
+
| Colab notebook | 1-2 | Connect to HF Space, run trained policy, show trajectory. Minimal but working. |
|
| 62 |
+
| Demo video | 1 | Script around: environment clarity, human playability, trained agent, reward story. |
|
| 63 |
+
| README and repo polish | 1 | Last step. Only after artifacts exist. |
|
| 64 |
+
| Buffer | 2-3 | Deployment issues, training tuning, unexpected blockers. |
|
| 65 |
+
| **Total** | **~11-14** | |
|
| 66 |
+
|
| 67 |
+
## Execution order (aligned to SSOT)
|
| 68 |
+
|
| 69 |
+
1. **Heuristic refresh + reset-seed decision** β per SSOT, this comes before
|
| 70 |
+
broader training. The measured sweep and paired fixture evidence already
|
| 71 |
+
exist. Decide whether any seed should move, refresh the heuristic to use
|
| 72 |
+
the repair path from the playtest log, and save one comparison trace.
|
| 73 |
+
|
| 74 |
+
2. **Training** β start after the heuristic is refreshed so training runs
|
| 75 |
+
against a confirmed environment configuration. Use `training/ppo_smoke.py`
|
| 76 |
+
as the base but increase timesteps significantly. The smoke ran 64
|
| 77 |
+
timesteps and collapsed to a single repeated action β this is the expected
|
| 78 |
+
outcome of a plumbing gate, not evidence that training will converge
|
| 79 |
+
easily. First milestone: non-degenerate trajectories (varied actions,
|
| 80 |
+
not single-action collapse). Second milestone: feasibility crossing in
|
| 81 |
+
at least one evaluation episode. Do not assume demo-quality trajectories
|
| 82 |
+
are reachable without tuning. Can run on Northflank H100 in the background.
|
| 83 |
+
**Do not block HF Space, notebook, or video on training success.**
|
| 84 |
+
|
| 85 |
+
3. **HF Space** β deploy while training runs. The server is in `server/app.py`.
|
| 86 |
+
Verify dependencies, prove one clean remote episode.
|
| 87 |
+
|
| 88 |
+
4. **Colab notebook** β wire to the live HF Space endpoint. Load trained
|
| 89 |
+
checkpoint if available; otherwise show the heuristic or manual
|
| 90 |
+
trajectory as evidence.
|
| 91 |
+
|
| 92 |
+
5. **Demo video** β 1 minute. Structure:
|
| 93 |
+
- the problem (stellarator design is hard)
|
| 94 |
+
- the environment (narrow, human-playable, real verifier)
|
| 95 |
+
- the evidence (successful submit trace, replay playtest coverage)
|
| 96 |
+
- trained agent if available (trajectory with visible improvement)
|
| 97 |
+
- honest findings (path-dependent cross-fidelity gap)
|
| 98 |
+
|
| 99 |
+
6. **README polish** β update with links to HF Space, Colab, and video.
|
| 100 |
+
Keep claims conservative. Reference the evidence docs.
|
| 101 |
+
|
| 102 |
+
## What to cut if time runs short
|
| 103 |
+
|
| 104 |
+
Priority order (cut from the bottom):
|
| 105 |
+
|
| 106 |
+
1. Colab polish β minimal working notebook is enough
|
| 107 |
+
2. Training length β a few readable trajectories over a long run
|
| 108 |
+
3. README depth β link to docs, keep top-level short
|
| 109 |
+
4. Reset-seed decision β keep current seeds if evidence is ambiguous
|
| 110 |
+
5. Do NOT cut HF Space β hard requirement
|
| 111 |
+
6. Do NOT cut demo video β primary judge-facing artifact
|
| 112 |
+
|
| 113 |
+
If training remains weak or degenerate:
|
| 114 |
+
|
| 115 |
+
- still ship the trained-policy demonstration, even if it only shows collapse or weak behavior
|
| 116 |
+
- supplement with the heuristic baseline or manual playtest Episode C as the primary evidence of environment usability
|
| 117 |
+
- document the training limitations plainly in the video and README
|
| 118 |
+
- per SSOT fallback rules (`FUSION_DESIGN_LAB_PLAN_V2.md` section 10): "keep claims conservative about policy quality" and "still ship a trained-policy demonstration and document its limitations plainly"
|
| 119 |
+
- do NOT wait for strong PPO before shipping HF Space, notebook, and video
|
| 120 |
+
|
| 121 |
+
## Submission narrative
|
| 122 |
+
|
| 123 |
+
The story is:
|
| 124 |
+
|
| 125 |
+
1. we built a clear, narrow environment for one constrained design task
|
| 126 |
+
2. we tested it thoroughly (sweep, baselines, replay playtest with broad reward branch coverage)
|
| 127 |
+
3. the environment has a known-good submit path (Episode C: successful high-fi, score=0.296)
|
| 128 |
+
4. we discovered a path-dependent cross-fidelity gap (some low-fi feasible states crash at high-fi)
|
| 129 |
+
5. the environment is the product; the policy is evidence that it works
|
| 130 |
+
|
| 131 |
+
## Risk assessment
|
| 132 |
+
|
| 133 |
+
| Risk | Likelihood | Mitigation |
|
| 134 |
+
|------|-----------|------------|
|
| 135 |
+
| Training does not converge | Medium β smoke exposed collapse, not success | Show the heuristic trajectory or manual playtest as fallback evidence. Document what was tried. Keep claims conservative per SSOT fallback rules. |
|
| 136 |
+
| HF Space dependency issues | Medium | Start deployment early. Have a local-only fallback with screen recording. |
|
| 137 |
+
| Run out of time | Medium | Follow cut order above. Prioritize video and HF Space over polish. |
|