Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
513a2e2
1
Parent(s): 8bf0155
docs: codify multifidelity training policy
Browse files- README.md +2 -0
- TODO.md +3 -1
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +3 -0
- docs/P1_ENV_CONTRACT_V1.md +6 -0
- docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md +4 -3
- training/README.md +6 -0
- training/notebooks/README.md +2 -0
README.md
CHANGED
|
@@ -62,6 +62,7 @@ Implementation status:
|
|
| 62 |
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
|
| 63 |
- The tracked fixtures in `server/data/p1/` are currently low-fidelity-calibrated. Do not narrate them as fully paired low-fi/high-fi references until the submit-side spot checks land.
|
| 64 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
|
|
|
| 65 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 66 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
| 67 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
|
@@ -130,6 +131,7 @@ uv sync --extra notebooks
|
|
| 130 |
- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
|
| 131 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 132 |
- [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
|
|
|
|
| 133 |
- [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
|
| 134 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 135 |
- [ ] Deploy the environment to HF Space.
|
|
|
|
| 62 |
- The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
|
| 63 |
- The tracked fixtures in `server/data/p1/` are currently low-fidelity-calibrated. Do not narrate them as fully paired low-fi/high-fi references until the submit-side spot checks land.
|
| 64 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 65 |
+
- High-fidelity VMEC-backed `submit` is too expensive to serve as the normal RL inner loop. Keep training rollouts on low-fidelity `run`, then use high-fidelity calls for paired fixtures, submit-side traces, sparse checkpoint evaluation, and final evidence.
|
| 66 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 67 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
| 68 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
|
|
|
| 131 |
- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
|
| 132 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 133 |
- [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
|
| 134 |
+
- [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
|
| 135 |
- [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
|
| 136 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 137 |
- [ ] Deploy the environment to HF Space.
|
TODO.md
CHANGED
|
@@ -198,6 +198,7 @@ flowchart TD
|
|
| 198 |
treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
|
| 199 |
stop after a few readable trajectories or one clear failure mode
|
| 200 |
paired high-fidelity fixture checks must happen immediately after this smoke pass
|
|
|
|
| 201 |
|
| 202 |
- [ ] Manual-playtest 5-10 episodes
|
| 203 |
Goal:
|
|
@@ -270,7 +271,7 @@ flowchart TD
|
|
| 270 |
Files:
|
| 271 |
[README.md](README.md)
|
| 272 |
|
| 273 |
-
- [ ] Only
|
| 274 |
Related:
|
| 275 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 276 |
[Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
|
@@ -283,6 +284,7 @@ flowchart TD
|
|
| 283 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 284 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 285 |
- [ ] Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
|
|
|
|
| 286 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 287 |
- [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
|
| 288 |
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
|
|
|
|
| 198 |
treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
|
| 199 |
stop after a few readable trajectories or one clear failure mode
|
| 200 |
paired high-fidelity fixture checks must happen immediately after this smoke pass
|
| 201 |
+
high-fidelity VMEC-backed `submit` should stay out of the normal RL inner loop
|
| 202 |
|
| 203 |
- [ ] Manual-playtest 5-10 episodes
|
| 204 |
Goal:
|
|
|
|
| 271 |
Files:
|
| 272 |
[README.md](README.md)
|
| 273 |
|
| 274 |
+
- [ ] Only treat training evidence as submission-ready if low-fidelity gains survive sparse high-fidelity evaluation
|
| 275 |
Related:
|
| 276 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 277 |
[Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
|
|
|
| 284 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 285 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 286 |
- [ ] Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
|
| 287 |
+
- [ ] Do not move high-fidelity VMEC-backed `submit` into the normal RL inner loop
|
| 288 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 289 |
- [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
|
| 290 |
- [ ] Do not describe the current baseline reset state as feasible or near-feasible
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -50,6 +50,7 @@ Current caution:
|
|
| 50 |
|
| 51 |
- do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
|
| 52 |
- do not narrate low-fidelity rollout metrics as final submission truth
|
|
|
|
| 53 |
|
| 54 |
## 3. Locked Decisions
|
| 55 |
|
|
@@ -83,6 +84,7 @@ Practical fail-fast rule:
|
|
| 83 |
- stop after a few readable trajectories or one clear failure mode
|
| 84 |
- run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
|
| 85 |
- do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
|
|
|
|
| 86 |
|
| 87 |
## 5. Document Roles
|
| 88 |
|
|
@@ -107,6 +109,7 @@ Compute surfaces:
|
|
| 107 |
- Northflank is the main compute workspace for verifier-heavy work
|
| 108 |
- HF Space is the hosted environment surface
|
| 109 |
- Colab is the required public artifact and should show trained-policy behavior against the live environment
|
|
|
|
| 110 |
|
| 111 |
Evidence order:
|
| 112 |
|
|
|
|
| 50 |
|
| 51 |
- do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
|
| 52 |
- do not narrate low-fidelity rollout metrics as final submission truth
|
| 53 |
+
- do not move high-fidelity VMEC-backed `submit` into the normal RL inner loop; keep it for truth checks and sparse evaluation
|
| 54 |
|
| 55 |
## 3. Locked Decisions
|
| 56 |
|
|
|
|
| 84 |
- stop after a few readable trajectories or one clear failure mode
|
| 85 |
- run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
|
| 86 |
- do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
|
| 87 |
+
- keep any checkpoint high-fidelity evaluation sparse enough that it does not replace the low-fidelity inner loop
|
| 88 |
|
| 89 |
## 5. Document Roles
|
| 90 |
|
|
|
|
| 109 |
- Northflank is the main compute workspace for verifier-heavy work
|
| 110 |
- HF Space is the hosted environment surface
|
| 111 |
- Colab is the required public artifact and should show trained-policy behavior against the live environment
|
| 112 |
+
- trained-policy work should still iterate on low-fidelity `run`; use high-fidelity `submit` only for sparse checkpoint evaluation and final evidence
|
| 113 |
|
| 114 |
Evidence order:
|
| 115 |
|
docs/P1_ENV_CONTRACT_V1.md
CHANGED
|
@@ -163,6 +163,12 @@ The verifier should stay boundary-based:
|
|
| 163 |
|
| 164 |
Do not treat parameterization-specific logic as verifier truth.
|
| 165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
## 9. Reward V0
|
| 167 |
|
| 168 |
`Reward V0` is the live reward contract until playtesting proves a concrete pathology.
|
|
|
|
| 163 |
|
| 164 |
Do not treat parameterization-specific logic as verifier truth.
|
| 165 |
|
| 166 |
+
Training and evaluation rule:
|
| 167 |
+
|
| 168 |
+
- use low-fidelity `run` as the RL inner-loop surface
|
| 169 |
+
- keep high-fidelity `submit` for terminal truth, paired fixture checks, submit-side manual traces, and sparse checkpoint evaluation
|
| 170 |
+
- do not move high-fidelity VMEC-backed evaluation into every training step unless the contract is deliberately redefined
|
| 171 |
+
|
| 172 |
## 9. Reward V0
|
| 173 |
|
| 174 |
`Reward V0` is the live reward contract until playtesting proves a concrete pathology.
|
docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -17,6 +17,7 @@ Current execution priority remains:
|
|
| 17 |
2. tiny PPO smoke pass as a diagnostic-only check
|
| 18 |
3. tracked fixtures with paired high-fidelity submit checks
|
| 19 |
4. one submit-side manual trace, then broader manual playtest
|
| 20 |
-
5.
|
| 21 |
-
6.
|
| 22 |
-
7.
|
|
|
|
|
|
| 17 |
2. tiny PPO smoke pass as a diagnostic-only check
|
| 18 |
3. tracked fixtures with paired high-fidelity submit checks
|
| 19 |
4. one submit-side manual trace, then broader manual playtest
|
| 20 |
+
5. sparse checkpoint high-fidelity evaluation to confirm low-fidelity gains survive `submit`
|
| 21 |
+
6. heuristic baseline refresh
|
| 22 |
+
7. HF Space proof
|
| 23 |
+
8. notebook, demo, and repo polish
|
training/README.md
CHANGED
|
@@ -2,6 +2,12 @@ Training and evaluation notebooks belong here.
|
|
| 2 |
|
| 3 |
This repository treats notebooks and trained-policy runs as supporting evidence for the environment, not the primary product.
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
## Status
|
| 6 |
|
| 7 |
- [ ] Northflank notebook artifacts saved
|
|
|
|
| 2 |
|
| 3 |
This repository treats notebooks and trained-policy runs as supporting evidence for the environment, not the primary product.
|
| 4 |
|
| 5 |
+
Training policy:
|
| 6 |
+
|
| 7 |
+
- train on the low-fidelity `run` surface for the normal RL inner loop
|
| 8 |
+
- use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, submit-side traces, and final evidence
|
| 9 |
+
- if low-fidelity gains do not survive high-fidelity `submit`, stop training and fix the environment or reward before pushing further
|
| 10 |
+
|
| 11 |
## Status
|
| 12 |
|
| 13 |
- [ ] Northflank notebook artifacts saved
|
training/notebooks/README.md
CHANGED
|
@@ -25,6 +25,8 @@ Operational defaults:
|
|
| 25 |
|
| 26 |
- use the same Python dependency set as the repo runtime
|
| 27 |
- keep heavy verifier and training work on Northflank
|
|
|
|
|
|
|
| 28 |
- keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
|
| 29 |
- prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
|
| 30 |
|
|
|
|
| 25 |
|
| 26 |
- use the same Python dependency set as the repo runtime
|
| 27 |
- keep heavy verifier and training work on Northflank
|
| 28 |
+
- keep low-fidelity `run` as the training inner loop; do not put high-fidelity `submit` in every RL step
|
| 29 |
+
- use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, manual traces, and final evidence
|
| 30 |
- keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
|
| 31 |
- prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
|
| 32 |
|