commitguard-env / .agent /checkpoints.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b

Checkpoints (sync-or-die contract)

Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: ../prd.md 12.

Checkpoint 1 Midnight (00:00 IST) scope freeze + Phase 1 gate

Everyone must demonstrate (live, locally or on Space):

  • Env server runs and responds to GET /health

  • OpenEnv loop works: reset step done, without crashing

  • Action parser is robust: malformed XML doesnt crash; returns safe error

  • No-leak invariant: observation contains no ground truth fields

Role deliverables:

  • Env/Server owner: endpoints exist (/health, /reset, /step, /state, /docs)

  • Reward owner: reward function wired and deterministic on handcrafted cases

  • Training owner: mock training loop can call env repeatedly (even if reward is dummy)

If any of these are red, trigger a scope cut immediately:

  • 3-action env incomplete cut to 2-action env (analyze + verdict)

  • Tiered reward unstable cut to binary reward only

After this checkpoint:

  • Scope freeze is active. New features go to .agent/FUTURE_WORK.md only.

Checkpoint 2 9:00 AM Sunday training evidence gate

Everyone must demonstrate:

  • Training run launched (HF Jobs A10G preferred) or fallback running

  • Wandb logging works (reward curve visible)

  • Evaluation script/notebook can run 100 held-out samples

Scope-cut triggers:

  • Training blocked by infra >30 min move to GCP A10G fallback

  • Training curve still flat by 10:00 AM commit to qualitative narrative (no more training tweaks)

What gets cut first (in order):

  1. P2 items (web UI polish, blog post)

  2. Per-CWE breakdown (keep overall accuracy)

  3. Exploit sketch bonus (keep binary + CWE if stable)

  4. CWE classification bonus (keep binary only)

Checkpoint 3 3:00 PM Sunday feature freeze gate

Everyone must demonstrate:

  • HF Space is live and stable; /health 200; /docs loads

  • tests/ pass (see .agent/test_contracts.md)

  • Demo artifact path is locked (video or text-trace fallback)

  • README has all submission links (Space, notebook, video, wandb, repo)

Hard rule:

  • No changes after 3:00 PM except emergency fixes that prevent submission failure.

Final scope cuts (if needed to protect submission):

  1. Video text trace in README

  2. Training curve single plot + narrative

  3. Held-out eval small N sanity check