Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
5271cce
1
Parent(s): 3270c54
docs: clarify repaired env runtime status
Browse files- README.md +3 -2
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +7 -0
- docs/P1_ENV_CONTRACT_V1.md +1 -0
- docs/PIVOT_P1_ROTATING_ELLIPSE.md +2 -0
README.md
CHANGED
|
@@ -53,19 +53,20 @@ Implementation status:
|
|
| 53 |
|
| 54 |
## Known Gaps
|
| 55 |
|
| 56 |
-
-
|
| 57 |
- The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
|
| 58 |
- The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
|
| 59 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 60 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 61 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
|
|
|
| 62 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 63 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 64 |
|
| 65 |
Current mode:
|
| 66 |
|
| 67 |
- strategic task choice is already locked
|
| 68 |
-
- the next work is
|
| 69 |
- new planning text should only appear when a real blocker forces a decision change
|
| 70 |
|
| 71 |
## Planned Repository Layout
|
|
|
|
| 53 |
|
| 54 |
## Known Gaps
|
| 55 |
|
| 56 |
+
- Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
|
| 57 |
- The repaired family now uses frozen exact seeds with explicit triangularity control. Those seeds are near-boundary references, not yet tracked fixtures.
|
| 58 |
- The repaired low-dimensional family still needs measured ranges and deltas. Do not narrate guessed `rotational_transform` bounds, `triangularity_scale` deltas, or a larger budget as validated facts until they are measured on the repaired environment.
|
| 59 |
- `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
|
| 60 |
- VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
|
| 61 |
- Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
|
| 62 |
+
- `best_score` and `best_feasibility` are currently context-dependent in observations: run-time views reflect low-fidelity rollout state, while submit-time views can reflect high-fidelity best state. Keep that distinction explicit in docs, traces, and baseline interpretation until the contract is simplified further.
|
| 63 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 64 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 65 |
|
| 66 |
Current mode:
|
| 67 |
|
| 68 |
- strategic task choice is already locked
|
| 69 |
+
- the next work is measured sweep validation, then fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
|
| 70 |
- new planning text should only appear when a real blocker forces a decision change
|
| 71 |
|
| 72 |
## Planned Repository Layout
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -251,6 +251,13 @@ The observation should expose:
|
|
| 251 |
|
| 252 |
The observation must be interpretable by a human without additional hidden state.
|
| 253 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 254 |
### Action Space
|
| 255 |
|
| 256 |
The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.
|
|
|
|
| 251 |
|
| 252 |
The observation must be interpretable by a human without additional hidden state.
|
| 253 |
|
| 254 |
+
Current runtime note:
|
| 255 |
+
|
| 256 |
+
- `best_score` and `best_feasibility` are not yet fully split by fidelity in the observation schema
|
| 257 |
+
- low-fidelity run observations display rollout best state
|
| 258 |
+
- high-fidelity submit observations may display high-fidelity best state instead
|
| 259 |
+
- keep that distinction explicit in docs and traces until the contract is simplified further
|
| 260 |
+
|
| 261 |
### Action Space
|
| 262 |
|
| 263 |
The live action space stays intentionally small and discrete while exposing the repaired 4-knob low-dimensional family.
|
docs/P1_ENV_CONTRACT_V1.md
CHANGED
|
@@ -170,6 +170,7 @@ Add clarity about fidelity:
|
|
| 170 |
- low-fidelity step-time metrics should be labeled as such
|
| 171 |
- high-fidelity submit-time metrics should be labeled as such
|
| 172 |
- do not expose them as if they are the same truth surface
|
|
|
|
| 173 |
|
| 174 |
This can be done either by:
|
| 175 |
|
|
|
|
| 170 |
- low-fidelity step-time metrics should be labeled as such
|
| 171 |
- high-fidelity submit-time metrics should be labeled as such
|
| 172 |
- do not expose them as if they are the same truth surface
|
| 173 |
+
- in the current runtime, `best_score` and `best_feasibility` can switch meaning with fidelity context, so traces and baselines should not treat them as one invariant metric yet
|
| 174 |
|
| 175 |
This can be done either by:
|
| 176 |
|
docs/PIVOT_P1_ROTATING_ELLIPSE.md
CHANGED
|
@@ -155,6 +155,8 @@ target_spec: str
|
|
| 155 |
Current requirement:
|
| 156 |
|
| 157 |
- the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
|
|
|
|
|
|
|
| 158 |
|
| 159 |
### Reward V0
|
| 160 |
|
|
|
|
| 155 |
Current requirement:
|
| 156 |
|
| 157 |
- the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
|
| 158 |
+
- in the current runtime, `best_score` and `best_feasibility` may reflect low-fidelity rollout state during `run` and high-fidelity best state during `submit`
|
| 159 |
+
- do not narrate those fields as one fidelity-independent quantity until the contract is simplified further
|
| 160 |
|
| 161 |
### Reward V0
|
| 162 |
|