commitguard-env / .agent /test_contracts.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b

Test contracts (merge blockers)

These tests are merge gates. If any fails, do not merge to main. See git_workflow.md.

Owners are initial; if you touch the area, you own the test too.

tests/test_no_leak.py

  • Asserts:

    • Observation serialization never includes ground-truth fields (e.g., is_vulnerable, ground_truth, label, cwe_type).

    • Response payloads from /reset and /step do not contain forbidden keys or suspicious strings that imply labels.

  • Owner: Niti (env integrity)

  • Blocking condition: Any leakage is a submission-killer. Must be fixed immediately.

tests/test_reward.py

  • Asserts: compute_reward(...) returns expected values for 5 handcrafted cases:

    1. True positive + correct CWE + exploit match

    2. True positive + wrong CWE

    3. False positive

    4. False negative

    5. Malformed action penalty (and no crash)

  • Owner: Deepak (reward design)

  • Blocking condition: If tiered reward is flaky, trigger fallback to binary reward (log in decision_log.md).

tests/test_action_parser.py

  • Asserts:

    • XML action parsing works for all 3 action types.

    • Parser is robust to malformed inputs (missing tags, invalid XML, extra text).

    • Parser never throws; returns a safe Action + error info.

  • Owner: Divyank (agent I/O contract)

  • Blocking condition: Any parser crash blocks training and demo; fix before anything else.

tests/test_env_smoke.py

  • Asserts:

    • 100 random episodes do not crash.

    • reset/step latency stays reasonable and budget cap terminates episodes.

    • Malformed actions do not crash and return done when appropriate.

  • Owner: Niti (env reliability)

  • Blocking condition: If smoke test fails, training is not allowed to run.

Required behavior under failure

  • If a test reveals a scope-level failure, use a PRD-approved fallback (see project_context.md) rather than inventing new features.

  • If a failure requires a new decision, log it in decision_log.md with timestamp + author.