Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

commitguard-env / .agent /test_contracts.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b about 1 month ago

preview code

raw

history blame contribute delete

2.14 kB

	## Test contracts (merge blockers)

	These tests are merge gates. If any fails, do not merge to `main`. See `git_workflow.md`.

	Owners are initial; if you touch the area, you own the test too.

	### `tests/test_no_leak.py`

	- Asserts:
	- `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`).
	- Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels.
	- Owner: Niti (env integrity)
	- Blocking condition: Any leakage is a submission-killer. Must be fixed immediately.

	### `tests/test_reward.py`

	- Asserts: `compute_reward(...)` returns expected values for 5 handcrafted cases:
	1. True positive + correct CWE + exploit match
	2. True positive + wrong CWE
	3. False positive
	4. False negative
	5. Malformed action penalty (and no crash)
	- Owner: Deepak (reward design)
	- Blocking condition: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`).

	### `tests/test_action_parser.py`

	- Asserts:
	- XML action parsing works for all 3 action types.
	- Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
	- Parser never throws; returns a safe Action + error info.
	- Owner: Divyank (agent I/O contract)
	- Blocking condition: Any parser crash blocks training and demo; fix before anything else.

	### `tests/test_env_smoke.py`

	- Asserts:
	- 100 random episodes do not crash.
	- `reset`/`step` latency stays reasonable and budget cap terminates episodes.
	- Malformed actions do not crash and return done when appropriate.
	- Owner: Niti (env reliability)
	- Blocking condition: If smoke test fails, training is not allowed to run.

	## Required behavior under failure

	- If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features.
	- If a failure requires a new decision, log it in `decision_log.md` with timestamp + author.