Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 8

Commit

5271cce

1 Parent(s): 3270c54

docs: clarify repaired env runtime status

Browse files

Files changed (4) hide show

README.md +3 -2
docs/FUSION_DESIGN_LAB_PLAN_V2.md +7 -0
docs/P1_ENV_CONTRACT_V1.md +1 -0
docs/PIVOT_P1_ROTATING_ELLIPSE.md +2 -0

README.md CHANGED Viewed

@@ -53,19 +53,20 @@ Implementation status:
 ## Known Gaps
-- The current 3-knob family is structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That means reward tuning is secondary until the parameterization is repaired.
 - The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
 - The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
 Current mode:
 - strategic task choice is already locked
-- the next work is parameterization repair, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout

 ## Known Gaps
+- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
 - The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
 - The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
+- `best_score` and `best_feasibility` are currently context-dependent in observations: run-time views reflect low-fidelity rollout state, while submit-time views can reflect high-fidelity best state. Keep that distinction explicit in docs, traces, and baseline interpretation until the contract is simplified further.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
 Current mode:
 - strategic task choice is already locked
+- the next work is measured sweep validation, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -251,6 +251,13 @@ The observation should expose:
 The observation must be interpretable by a human without additional hidden state.
 ### Action Space
 The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.

 The observation must be interpretable by a human without additional hidden state.
+Current runtime note:
+- `best_score` and `best_feasibility` are not yet fully split by fidelity in the observation schema
+- low-fidelity run observations display rollout best state
+- high-fidelity submit observations may display high-fidelity best state instead
+- keep that distinction explicit in docs and traces until the contract is simplified further
 ### Action Space
 The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.

docs/P1_ENV_CONTRACT_V1.md CHANGED Viewed

@@ -170,6 +170,7 @@ Add clarity about fidelity:
 - low-fidelity step-time metrics should be labeled as such
 - high-fidelity submit-time metrics should be labeled as such
 - do not expose them as if they are the same truth surface
 This can be done either by:

 - low-fidelity step-time metrics should be labeled as such
 - high-fidelity submit-time metrics should be labeled as such
 - do not expose them as if they are the same truth surface
+- in the current runtime, `best_score` and `best_feasibility` can switch meaning with fidelity context, so traces and baselines should not treat them as one invariant metric yet
 This can be done either by:

docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED Viewed

@@ -155,6 +155,8 @@ target_spec: str
 Current requirement:
 - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
 ### Reward V0

 Current requirement:
 - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
+- in the current runtime, `best_score` and `best_feasibility` may reflect low-fidelity rollout state during `run` and high-fidelity best state during `submit`
+- do not narrate those fields as one fidelity-independent quantity until the contract is simplified further
 ### Reward V0