Commit ·
567ff67
1
Parent(s): 1c1f314
docs: align hybrid validation gates
Browse files- README.md +1 -1
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +5 -5
README.md
CHANGED
|
@@ -52,7 +52,7 @@ Implementation status:
|
|
| 52 |
- [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
|
| 53 |
- [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
|
| 54 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 55 |
-
- [ ] Run
|
| 56 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
| 57 |
- [ ] Deploy the real environment to HF Space
|
| 58 |
|
|
|
|
| 52 |
- [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
|
| 53 |
- [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
|
| 54 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 55 |
+
- [ ] Run a tiny low-fi PPO smoke run, then record at least one submit-side manual trace and the first real reward pathology
|
| 56 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
| 57 |
- [ ] Deploy the real environment to HF Space
|
| 58 |
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -151,15 +151,15 @@ Gate 1: measured sweep exists
|
|
| 151 |
|
| 152 |
- repaired-family ranges, deltas, and reset seeds are justified by recorded evidence
|
| 153 |
|
| 154 |
-
Gate 2:
|
| 155 |
-
|
| 156 |
-
- good, boundary, and bad references behave as expected
|
| 157 |
-
|
| 158 |
-
Gate 3: tiny PPO smoke is sane
|
| 159 |
|
| 160 |
- a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
|
| 161 |
- trajectories are readable enough to debug
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
Gate 4: manual playtest passes
|
| 164 |
|
| 165 |
- a human can read the observation
|
|
|
|
| 151 |
|
| 152 |
- repaired-family ranges, deltas, and reset seeds are justified by recorded evidence
|
| 153 |
|
| 154 |
+
Gate 2: tiny PPO smoke is sane
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
- a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
|
| 157 |
- trajectories are readable enough to debug
|
| 158 |
|
| 159 |
+
Gate 3: fixture checks pass
|
| 160 |
+
|
| 161 |
+
- good, boundary, and bad references behave as expected
|
| 162 |
+
|
| 163 |
Gate 4: manual playtest passes
|
| 164 |
|
| 165 |
- a human can read the observation
|