CreativeEngineer commited on
Commit
567ff67
·
1 Parent(s): 1c1f314

docs: align hybrid validation gates

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. docs/FUSION_DESIGN_LAB_PLAN_V2.md +5 -5
README.md CHANGED
@@ -52,7 +52,7 @@ Implementation status:
52
  - [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
53
  - [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
54
  - [x] Add tracked `P1` fixtures under `server/data/p1/`
55
- - [ ] Run manual playtesting and record the first reward pathology
56
  - [ ] Refresh the heuristic baseline for the real verifier path
57
  - [ ] Deploy the real environment to HF Space
58
 
 
52
  - [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
53
  - [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
54
  - [x] Add tracked `P1` fixtures under `server/data/p1/`
55
+ - [ ] Run a tiny low-fi PPO smoke run, then record at least one submit-side manual trace and the first real reward pathology
56
  - [ ] Refresh the heuristic baseline for the real verifier path
57
  - [ ] Deploy the real environment to HF Space
58
 
docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED
@@ -151,15 +151,15 @@ Gate 1: measured sweep exists
151
 
152
  - repaired-family ranges, deltas, and reset seeds are justified by recorded evidence
153
 
154
- Gate 2: fixture checks pass
155
-
156
- - good, boundary, and bad references behave as expected
157
-
158
- Gate 3: tiny PPO smoke is sane
159
 
160
  - a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
161
  - trajectories are readable enough to debug
162
 
 
 
 
 
163
  Gate 4: manual playtest passes
164
 
165
  - a human can read the observation
 
151
 
152
  - repaired-family ranges, deltas, and reset seeds are justified by recorded evidence
153
 
154
+ Gate 2: tiny PPO smoke is sane
 
 
 
 
155
 
156
  - a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
157
  - trajectories are readable enough to debug
158
 
159
+ Gate 3: fixture checks pass
160
+
161
+ - good, boundary, and bad references behave as expected
162
+
163
  Gate 4: manual playtest passes
164
 
165
  - a human can read the observation