commitguard-env / .agent /checkpoints.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b
## Checkpoints (sync-or-die contract)
Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: `../prd.md` 12.
### Checkpoint 1 Midnight (00:00 IST) scope freeze + Phase 1 gate
**Everyone must demonstrate (live, locally or on Space):**
- **Env server runs** and responds to `GET /health`
- **OpenEnv loop works**: `reset` `step` done, without crashing
- **Action parser is robust**: malformed XML doesnt crash; returns safe error
- **No-leak invariant**: observation contains no ground truth fields
**Role deliverables:**
- **Env/Server owner**: endpoints exist (`/health`, `/reset`, `/step`, `/state`, `/docs`)
- **Reward owner**: reward function wired and deterministic on handcrafted cases
- **Training owner**: mock training loop can call env repeatedly (even if reward is dummy)
**If any of these are red, trigger a scope cut immediately:**
- 3-action env incomplete cut to 2-action env (analyze + verdict)
- Tiered reward unstable cut to binary reward only
**After this checkpoint:**
- **Scope freeze is active.** New features go to `.agent/FUTURE_WORK.md` only.
### Checkpoint 2 9:00 AM Sunday training evidence gate
**Everyone must demonstrate:**
- Training run launched (HF Jobs A10G preferred) or fallback running
- Wandb logging works (reward curve visible)
- Evaluation script/notebook can run 100 held-out samples
**Scope-cut triggers:**
- Training blocked by infra >30 min move to GCP A10G fallback
- Training curve still flat by 10:00 AM commit to qualitative narrative (no more training tweaks)
**What gets cut first (in order):**
1. P2 items (web UI polish, blog post)
2. Per-CWE breakdown (keep overall accuracy)
3. Exploit sketch bonus (keep binary + CWE if stable)
4. CWE classification bonus (keep binary only)
### Checkpoint 3 3:00 PM Sunday feature freeze gate
**Everyone must demonstrate:**
- HF Space is live and stable; `/health` 200; `/docs` loads
- `tests/` pass (see `.agent/test_contracts.md`)
- Demo artifact path is locked (video or text-trace fallback)
- README has all submission links (Space, notebook, video, wandb, repo)
**Hard rule:**
- **No changes after 3:00 PM** except emergency fixes that prevent submission failure.
**Final scope cuts (if needed to protect submission):**
1. Video text trace in README
2. Training curve single plot + narrative
3. Held-out eval small N sanity check