CreativeEngineer commited on
Commit
5271cce
·
1 Parent(s): 3270c54

docs: clarify repaired env runtime status

Browse files
README.md CHANGED
@@ -53,19 +53,20 @@ Implementation status:
53
 
54
  ## Known Gaps
55
 
56
- - The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
57
  - The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
58
  - The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
59
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
60
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
61
  - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
 
62
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
63
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
64
 
65
  Current mode:
66
 
67
  - strategic task choice is already locked
68
- - the next work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
69
  - new planning text should only appear when a real blocker forces a decision change
70
 
71
  ## Planned Repository Layout
 
53
 
54
  ## Known Gaps
55
 
56
+ - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
57
  - The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
58
  - The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
59
  - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
60
  - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
61
  - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
62
+ - `best_score` and `best_feasibility` are currently context-dependent in observations: run-time views reflect low-fidelity rollout state, while submit-time views can reflect high-fidelity best state. Keep that distinction explicit in docs, traces, and baseline interpretation until the contract is simplified further.
63
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
64
  - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
65
 
66
  Current mode:
67
 
68
  - strategic task choice is already locked
69
+ - the next work is measured sweep validation, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
70
  - new planning text should only appear when a real blocker forces a decision change
71
 
72
  ## Planned Repository Layout
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -251,6 +251,13 @@ The observation should expose:
251
 
252
  The observation must be interpretable by a human without additional hidden state.
253
 
 
 
 
 
 
 
 
254
  ### Action Space
255
 
256
  The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.
 
251
 
252
  The observation must be interpretable by a human without additional hidden state.
253
 
254
+ Current runtime note:
255
+
256
+ - `best_score` and `best_feasibility` are not yet fully split by fidelity in the observation schema
257
+ - low-fidelity run observations display rollout best state
258
+ - high-fidelity submit observations may display high-fidelity best state instead
259
+ - keep that distinction explicit in docs and traces until the contract is simplified further
260
+
261
  ### Action Space
262
 
263
  The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.
docs/P1_ENV_CONTRACT_V1.md CHANGED
@@ -170,6 +170,7 @@ Add clarity about fidelity:
170
  - low-fidelity step-time metrics should be labeled as such
171
  - high-fidelity submit-time metrics should be labeled as such
172
  - do not expose them as if they are the same truth surface
 
173
 
174
  This can be done either by:
175
 
 
170
  - low-fidelity step-time metrics should be labeled as such
171
  - high-fidelity submit-time metrics should be labeled as such
172
  - do not expose them as if they are the same truth surface
173
+ - in the current runtime, `best_score` and `best_feasibility` can switch meaning with fidelity context, so traces and baselines should not treat them as one invariant metric yet
174
 
175
  This can be done either by:
176
 
docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED
@@ -155,6 +155,8 @@ target_spec: str
155
  Current requirement:
156
 
157
  - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
 
 
158
 
159
  ### Reward V0
160
 
 
155
  Current requirement:
156
 
157
  - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
158
+ - in the current runtime, `best_score` and `best_feasibility` may reflect low-fidelity rollout state during `run` and high-fidelity best state during `submit`
159
+ - do not narrate those fields as one fidelity-independent quantity until the contract is simplified further
160
 
161
  ### Reward V0
162