Spaces:
Sleeping
Test contracts (merge blockers)
These tests are merge gates. If any fails, do not merge to main. See git_workflow.md.
Owners are initial; if you touch the area, you own the test too.
tests/test_no_leak.py
Asserts:
Observationserialization never includes ground-truth fields (e.g.,is_vulnerable,ground_truth,label,cwe_type).Response payloads from
/resetand/stepdo not contain forbidden keys or suspicious strings that imply labels.
Owner: Niti (env integrity)
Blocking condition: Any leakage is a submission-killer. Must be fixed immediately.
tests/test_reward.py
Asserts:
compute_reward(...)returns expected values for 5 handcrafted cases:True positive + correct CWE + exploit match
True positive + wrong CWE
False positive
False negative
Malformed action penalty (and no crash)
Owner: Deepak (reward design)
Blocking condition: If tiered reward is flaky, trigger fallback to binary reward (log in
decision_log.md).
tests/test_action_parser.py
Asserts:
XML action parsing works for all 3 action types.
Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
Parser never throws; returns a safe Action + error info.
Owner: Divyank (agent I/O contract)
Blocking condition: Any parser crash blocks training and demo; fix before anything else.
tests/test_env_smoke.py
Asserts:
100 random episodes do not crash.
reset/steplatency stays reasonable and budget cap terminates episodes.Malformed actions do not crash and return done when appropriate.
Owner: Niti (env reliability)
Blocking condition: If smoke test fails, training is not allowed to run.
Required behavior under failure
If a test reveals a scope-level failure, use a PRD-approved fallback (see
project_context.md) rather than inventing new features.If a failure requires a new decision, log it in
decision_log.mdwith timestamp + author.