Spaces:
Sleeping
Sleeping
| ## Checkpoints (sync-or-die contract) | |
| Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: `../prd.md` 12. | |
| ### Checkpoint 1 Midnight (00:00 IST) scope freeze + Phase 1 gate | |
| **Everyone must demonstrate (live, locally or on Space):** | |
| - **Env server runs** and responds to `GET /health` | |
| - **OpenEnv loop works**: `reset` `step` done, without crashing | |
| - **Action parser is robust**: malformed XML doesnt crash; returns safe error | |
| - **No-leak invariant**: observation contains no ground truth fields | |
| **Role deliverables:** | |
| - **Env/Server owner**: endpoints exist (`/health`, `/reset`, `/step`, `/state`, `/docs`) | |
| - **Reward owner**: reward function wired and deterministic on handcrafted cases | |
| - **Training owner**: mock training loop can call env repeatedly (even if reward is dummy) | |
| **If any of these are red, trigger a scope cut immediately:** | |
| - 3-action env incomplete cut to 2-action env (analyze + verdict) | |
| - Tiered reward unstable cut to binary reward only | |
| **After this checkpoint:** | |
| - **Scope freeze is active.** New features go to `.agent/FUTURE_WORK.md` only. | |
| ### Checkpoint 2 9:00 AM Sunday training evidence gate | |
| **Everyone must demonstrate:** | |
| - Training run launched (HF Jobs A10G preferred) or fallback running | |
| - Wandb logging works (reward curve visible) | |
| - Evaluation script/notebook can run 100 held-out samples | |
| **Scope-cut triggers:** | |
| - Training blocked by infra >30 min move to GCP A10G fallback | |
| - Training curve still flat by 10:00 AM commit to qualitative narrative (no more training tweaks) | |
| **What gets cut first (in order):** | |
| 1. P2 items (web UI polish, blog post) | |
| 2. Per-CWE breakdown (keep overall accuracy) | |
| 3. Exploit sketch bonus (keep binary + CWE if stable) | |
| 4. CWE classification bonus (keep binary only) | |
| ### Checkpoint 3 3:00 PM Sunday feature freeze gate | |
| **Everyone must demonstrate:** | |
| - HF Space is live and stable; `/health` 200; `/docs` loads | |
| - `tests/` pass (see `.agent/test_contracts.md`) | |
| - Demo artifact path is locked (video or text-trace fallback) | |
| - README has all submission links (Space, notebook, video, wandb, repo) | |
| **Hard rule:** | |
| - **No changes after 3:00 PM** except emergency fixes that prevent submission failure. | |
| **Final scope cuts (if needed to protect submission):** | |
| 1. Video text trace in README | |
| 2. Training curve single plot + narrative | |
| 3. Held-out eval small N sanity check | |