commitguard-env / .agent /test_contracts.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b
## Test contracts (merge blockers)
These tests are **merge gates**. If any fails, do not merge to `main`. See `git_workflow.md`.
Owners are initial; if you touch the area, you own the test too.
### `tests/test_no_leak.py`
- **Asserts**:
- `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`).
- Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels.
- **Owner**: Niti (env integrity)
- **Blocking condition**: Any leakage is a submission-killer. Must be fixed immediately.
### `tests/test_reward.py`
- **Asserts**: `compute_reward(...)` returns expected values for **5 handcrafted cases**:
1. True positive + correct CWE + exploit match
2. True positive + wrong CWE
3. False positive
4. False negative
5. Malformed action penalty (and no crash)
- **Owner**: Deepak (reward design)
- **Blocking condition**: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`).
### `tests/test_action_parser.py`
- **Asserts**:
- XML action parsing works for all 3 action types.
- Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
- Parser never throws; returns a safe Action + error info.
- **Owner**: Divyank (agent I/O contract)
- **Blocking condition**: Any parser crash blocks training and demo; fix before anything else.
### `tests/test_env_smoke.py`
- **Asserts**:
- 100 random episodes do not crash.
- `reset`/`step` latency stays reasonable and budget cap terminates episodes.
- Malformed actions do not crash and return done when appropriate.
- **Owner**: Niti (env reliability)
- **Blocking condition**: If smoke test fails, training is not allowed to run.
## Required behavior under failure
- If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features.
- If a failure requires a new decision, log it in `decision_log.md` with timestamp + author.