Spaces:
Sleeping
Sleeping
| ## Test contracts (merge blockers) | |
| These tests are **merge gates**. If any fails, do not merge to `main`. See `git_workflow.md`. | |
| Owners are initial; if you touch the area, you own the test too. | |
| ### `tests/test_no_leak.py` | |
| - **Asserts**: | |
| - `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`). | |
| - Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels. | |
| - **Owner**: Niti (env integrity) | |
| - **Blocking condition**: Any leakage is a submission-killer. Must be fixed immediately. | |
| ### `tests/test_reward.py` | |
| - **Asserts**: `compute_reward(...)` returns expected values for **5 handcrafted cases**: | |
| 1. True positive + correct CWE + exploit match | |
| 2. True positive + wrong CWE | |
| 3. False positive | |
| 4. False negative | |
| 5. Malformed action penalty (and no crash) | |
| - **Owner**: Deepak (reward design) | |
| - **Blocking condition**: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`). | |
| ### `tests/test_action_parser.py` | |
| - **Asserts**: | |
| - XML action parsing works for all 3 action types. | |
| - Parser is robust to malformed inputs (missing tags, invalid XML, extra text). | |
| - Parser never throws; returns a safe Action + error info. | |
| - **Owner**: Divyank (agent I/O contract) | |
| - **Blocking condition**: Any parser crash blocks training and demo; fix before anything else. | |
| ### `tests/test_env_smoke.py` | |
| - **Asserts**: | |
| - 100 random episodes do not crash. | |
| - `reset`/`step` latency stays reasonable and budget cap terminates episodes. | |
| - Malformed actions do not crash and return done when appropriate. | |
| - **Owner**: Niti (env reliability) | |
| - **Blocking condition**: If smoke test fails, training is not allowed to run. | |
| ## Required behavior under failure | |
| - If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features. | |
| - If a failure requires a new decision, log it in `decision_log.md` with timestamp + author. | |