CreativeEngineer commited on
Commit
513a2e2
·
1 Parent(s): 8bf0155

docs: codify multifidelity training policy

Browse files
README.md CHANGED
@@ -62,6 +62,7 @@ Implementation status:
62
  - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
63
  - The tracked fixtures in `server/data/p1/` are currently low-fidelity-calibrated. Do not narrate them as fully paired low-fi/high-fi references until the submit-side spot checks land.
64
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 
65
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
66
  - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
67
  - Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
@@ -130,6 +131,7 @@ uv sync --extra notebooks
130
  - [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
131
  - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
132
  - [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
 
133
  - [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
134
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
135
  - [ ] Deploy the environment to HF Space.
 
62
  - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
63
  - The tracked fixtures in `server/data/p1/` are currently low-fidelity-calibrated. Do not narrate them as fully paired low-fi/high-fi references until the submit-side spot checks land.
64
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
65
+ - High-fidelity VMEC-backed `submit` is too expensive to serve as the normal RL inner loop. Keep training rollouts on low-fidelity `run`, then use high-fidelity calls for paired fixtures, submit-side traces, sparse checkpoint evaluation, and final evidence.
66
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
67
  - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
68
  - Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
 
131
  - [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
132
  - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
133
  - [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
134
+ - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
135
  - [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
136
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
137
  - [ ] Deploy the environment to HF Space.
TODO.md CHANGED
@@ -198,6 +198,7 @@ flowchart TD
198
  treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
199
  stop after a few readable trajectories or one clear failure mode
200
  paired high-fidelity fixture checks must happen immediately after this smoke pass
 
201
 
202
  - [ ] Manual-playtest 5-10 episodes
203
  Goal:
@@ -270,7 +271,7 @@ flowchart TD
270
  Files:
271
  [README.md](README.md)
272
 
273
- - [ ] Only add training evidence if it is actually persuasive
274
  Related:
275
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
276
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
@@ -283,6 +284,7 @@ flowchart TD
283
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
284
  - [ ] Do not let notebook or demo work outrun environment evidence
285
  - [ ] Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
 
286
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
287
  - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
288
  - [ ] Do not describe the current baseline reset state as feasible or near-feasible
 
198
  treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
199
  stop after a few readable trajectories or one clear failure mode
200
  paired high-fidelity fixture checks must happen immediately after this smoke pass
201
+ high-fidelity VMEC-backed `submit` should stay out of the normal RL inner loop
202
 
203
  - [ ] Manual-playtest 5-10 episodes
204
  Goal:
 
271
  Files:
272
  [README.md](README.md)
273
 
274
+ - [ ] Only treat training evidence as submission-ready if low-fidelity gains survive sparse high-fidelity evaluation
275
  Related:
276
  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
277
  [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 
284
  - [ ] Do not port the old `ai-sci-feasible-designs` harness
285
  - [ ] Do not let notebook or demo work outrun environment evidence
286
  - [ ] Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
287
+ - [ ] Do not move high-fidelity VMEC-backed `submit` into the normal RL inner loop
288
  - [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
289
  - [x] Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
290
  - [ ] Do not describe the current baseline reset state as feasible or near-feasible
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -50,6 +50,7 @@ Current caution:
50
 
51
  - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
52
  - do not narrate low-fidelity rollout metrics as final submission truth
 
53
 
54
  ## 3. Locked Decisions
55
 
@@ -83,6 +84,7 @@ Practical fail-fast rule:
83
  - stop after a few readable trajectories or one clear failure mode
84
  - run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
85
  - do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
 
86
 
87
  ## 5. Document Roles
88
 
@@ -107,6 +109,7 @@ Compute surfaces:
107
  - Northflank is the main compute workspace for verifier-heavy work
108
  - HF Space is the hosted environment surface
109
  - Colab is the required public artifact and should show trained-policy behavior against the live environment
 
110
 
111
  Evidence order:
112
 
 
50
 
51
  - do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
52
  - do not narrate low-fidelity rollout metrics as final submission truth
53
+ - do not move high-fidelity VMEC-backed `submit` into the normal RL inner loop; keep it for truth checks and sparse evaluation
54
 
55
  ## 3. Locked Decisions
56
 
 
84
  - stop after a few readable trajectories or one clear failure mode
85
  - run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
86
  - do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
87
+ - keep any checkpoint high-fidelity evaluation sparse enough that it does not replace the low-fidelity inner loop
88
 
89
  ## 5. Document Roles
90
 
 
109
  - Northflank is the main compute workspace for verifier-heavy work
110
  - HF Space is the hosted environment surface
111
  - Colab is the required public artifact and should show trained-policy behavior against the live environment
112
+ - trained-policy work should still iterate on low-fidelity `run`; use high-fidelity `submit` only for sparse checkpoint evaluation and final evidence
113
 
114
  Evidence order:
115
 
docs/P1_ENV_CONTRACT_V1.md CHANGED
@@ -163,6 +163,12 @@ The verifier should stay boundary-based:
163
 
164
  Do not treat parameterization-specific logic as verifier truth.
165
 
 
 
 
 
 
 
166
  ## 9. Reward V0
167
 
168
  `Reward V0` is the live reward contract until playtesting proves a concrete pathology.
 
163
 
164
  Do not treat parameterization-specific logic as verifier truth.
165
 
166
+ Training and evaluation rule:
167
+
168
+ - use low-fidelity `run` as the RL inner-loop surface
169
+ - keep high-fidelity `submit` for terminal truth, paired fixture checks, submit-side manual traces, and sparse checkpoint evaluation
170
+ - do not move high-fidelity VMEC-backed evaluation into every training step unless the contract is deliberately redefined
171
+
172
  ## 9. Reward V0
173
 
174
  `Reward V0` is the live reward contract until playtesting proves a concrete pathology.
docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED
@@ -17,6 +17,7 @@ Current execution priority remains:
17
  2. tiny PPO smoke pass as a diagnostic-only check
18
  3. tracked fixtures with paired high-fidelity submit checks
19
  4. one submit-side manual trace, then broader manual playtest
20
- 5. heuristic baseline refresh
21
- 6. HF Space proof
22
- 7. notebook, demo, and repo polish
 
 
17
  2. tiny PPO smoke pass as a diagnostic-only check
18
  3. tracked fixtures with paired high-fidelity submit checks
19
  4. one submit-side manual trace, then broader manual playtest
20
+ 5. sparse checkpoint high-fidelity evaluation to confirm low-fidelity gains survive `submit`
21
+ 6. heuristic baseline refresh
22
+ 7. HF Space proof
23
+ 8. notebook, demo, and repo polish
training/README.md CHANGED
@@ -2,6 +2,12 @@ Training and evaluation notebooks belong here.
2
 
3
  This repository treats notebooks and trained-policy runs as supporting evidence for the environment, not the primary product.
4
 
 
 
 
 
 
 
5
  ## Status
6
 
7
  - [ ] Northflank notebook artifacts saved
 
2
 
3
  This repository treats notebooks and trained-policy runs as supporting evidence for the environment, not the primary product.
4
 
5
+ Training policy:
6
+
7
+ - train on the low-fidelity `run` surface for the normal RL inner loop
8
+ - use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, submit-side traces, and final evidence
9
+ - if low-fidelity gains do not survive high-fidelity `submit`, stop training and fix the environment or reward before pushing further
10
+
11
  ## Status
12
 
13
  - [ ] Northflank notebook artifacts saved
training/notebooks/README.md CHANGED
@@ -25,6 +25,8 @@ Operational defaults:
25
 
26
  - use the same Python dependency set as the repo runtime
27
  - keep heavy verifier and training work on Northflank
 
 
28
  - keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
29
  - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
30
 
 
25
 
26
  - use the same Python dependency set as the repo runtime
27
  - keep heavy verifier and training work on Northflank
28
+ - keep low-fidelity `run` as the training inner loop; do not put high-fidelity `submit` in every RL step
29
+ - use high-fidelity `submit` only for sparse checkpoint evaluation, paired fixture checks, manual traces, and final evidence
30
  - keep the Colab notebook focused on connecting to the deployed HF Space and exporting visible traces
31
  - prefer a public HF Space for the hackathon; if private, document the token setup directly in the notebook
32