Spaces:

crazycrazypete
/

pots-shutdown-tracker

Running

App Files Files Community

pots-shutdown-tracker / docs /TESTING.md

github-actions

Deploy e9638c4ddc3ed29a18779b38f43922aa3139b311

611bfd9 about 1 month ago

preview code

raw

history blame contribute delete

9.18 kB

	# Testing Strategy

	Philosophy: tests must encode the product's conservative framing. The
	bar is higher for area-risk, trust summary, matching, and signal-family
	classification than for plumbing.

	## Test layout

	```
	backend/app/pots_shutdown_tracker/tests/
	fixtures/ # HTML/PDF/TXT carrier samples and
	# JSON area-risk scenarios
	test_ai.py
	test_api.py # end-to-end request/response
	test_area_risk_grading.py # NEW in P1-T1 / P3-T2
	test_bulk_lookup.py # xlsx parser, worker, cleanup
	test_config.py
	test_connectors.py
	test_crawler.py
	test_embeddings.py
	test_extractors.py
	test_locks.py
	test_matching.py
	test_parsers.py
	test_policy.py
	test_review.py
	test_runtime_startup.py
	test_scheduler.py
	test_signal_links.py # NEW in P2-T6
	test_smoke_hosted.py
	test_storage.py
	```

	Frontend tests live under `frontend/src/*/.test.tsx` (Vitest +
	Testing Library).

	## Minimum local gate before PR

	```
	# backend
	cd backend
	ruff check app
	ruff format --check app
	POTS_TRACKER_DB_URL=sqlite:///:memory: \
	POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
	POTS_TRACKER_ENABLE_AI=false \
	OPENAI_API_KEY= \
	pytest -q

	# frontend
	cd frontend
	npm run typecheck
	npm run lint
	npm run test
	npm run build
	```

	## Category-by-category requirements

	### Area-risk (highest-risk logic)

	Any change under `services/area_risk.py` or `services/coverage.py`
	must extend `test_area_risk_grading.py`. Required invariants the tests
	must encode:

	1. Past-target shutdowns are first-class evidence. They grade red
	permanently, regardless of `is_active`, and there is no separate
	historical-context channel anymore.
	2. `area_at_risk=True` for non-green grades. Green is the only
	false case.
	3. Grade mapping is stable:
	- `past_effective_date` → red
	- `scheduled_within_12mo` → orange
	- `scheduled_beyond_12mo` / `undated_shutdown` → yellow
	- `mac_freeze_only` → blue
	- `no_evidence_found` → green
	- `insufficient_confidence` stays green when the evidence is too
	weak to classify confidently
	4. Structured city evidence beats text-city when both are present.
	5. `parsed_state_fallback` is penalized in `status_confidence`.
	6. Geography conflict never changes grade but always emits a
	caveat.
	7. Airport sub-geographies like `Miami Airport` can promote to a
	direct match for the parent city.
	8. Nearby-municipality evidence uses the configured threshold (3 by
	default) and can nudge a direct green/blue result up to yellow, but
	never above yellow.
	9. Nearby evidence never downgrades a direct grade.
	10. MAC-Freeze notices grade blue on their own and remain
	follow-on context when shutdown evidence is present.
	11. State-level MAC-Freeze filings grade city searches as blue with
	`status_confidence=low` when no shutdown evidence conflicts; the
	caveat must name the state.

	### MAC-Freeze vs shutdown

	Any change under `parsers/source_specific.py` that touches
	`signal_family` assignment must add or update a test under
	`test_parsers.py` covering:

	- A pure shutdown notice stays `shutdown` even when prose contains
	weak MAC-Freeze keywords (`withdraw`, `grandfather`).
	- A notice with explicit restriction tokens and availability /
	grandfather language becomes `att_mac_freeze`.
	- A notice with one weak signal abstains (stays `shutdown`) and logs
	`classifier=att_mac_freeze_guard decision=abstain`.

	### Trust gate

	Any change under `services/trust.py` or the `require_queryable_corpus`
	dependency must cover:

	- Empty corpus → `is_queryable=false` → `/search`, `/area-risk`,
	`/match/address`, `/coverage` return 503.
	- Degraded due to missing coverage metadata → 503 on gated routes.
	- Healthy corpus → 200 on gated routes.
	- `/trust-summary`, `/notices/`, `/dashboard/`, `/healthz`, `/readyz`
	stay available regardless.

	### Bulk lookup

	Any change under `services/bulk_lookup.py`, the `/bulk-lookup/*`
	routes, or the frontend Bulk Lookup page must cover:

	- Parser accepts case-insensitive city/state headers and the supported
	aliases (`Location City`, `Province`, `town`, `municipality`).
	- Parser rejects non-xlsx uploads, missing required columns, and files
	over `bulk_lookup_max_rows`.
	- State normalization accepts both two-letter abbreviations and full
	state names, and flags unrecognized states without failing valid rows.
	- Partial-failure tolerance: valid rows still process when sibling rows
	are missing city/state or contain invalid states.
	- Output xlsx has exactly the original columns plus `color`,
	`grade_letter`, `as_of`, and `notes`, and includes a `Summary` sheet
	with color counts, flagged count, top carriers, and metadata.
	- Background jobs transition from queued to running to either completed
	or failed, using a fresh SQLAlchemy session inside the worker thread.
	- Expired jobs are swept by both scheduler wiring and the
	`cleanup-bulk-lookup-jobs` CLI path, nulling blob references after
	deletion.
	- All four `/bulk-lookup/*` endpoints remain protected by the same
	`require_queryable_corpus` trust gate used by Search and Area Risk.

	### Admin auth

	Changes touching any `/admin/*` endpoint must cover:

	- Disabled (no key configured) → 503.
	- Missing header → 401.
	- Wrong key → 401 (constant-time comparison).
	- Correct key → 2xx success path, and one audit row written.

	### Parsing

	Use fixture corpora — never assert on strings pulled from the live web.
	Fixtures live under `tests/fixtures/`. Every fixture should carry a
	short comment at the top of the file explaining which carrier / notice
	family it represents and which invariant it's there to protect.

	## Fixtures

	### Adding an area-risk fixture

	1. Add or update a scenario entry in
	`tests/fixtures/area_risk/regressions.json`:
	```json
	[
	{
	"scenario_id": "historical_past_target_returns_red",
	"request": { "city": "Chicago", "state": "IL" },
	"seed_notices": [
	{
	"city": "Chicago",
	"state": "IL",
	"title": "Archived Chicago copper retirement notice",
	"summary": "Archived Chicago copper retirement notice.",
	"impact_text": "Chicago legacy analog service was retired in a prior filing.",
	"source_excerpt": "Chicago historical analog shutdown evidence.",
	"notice_id": "CHICAGO-ARCHIVED",
	"issue_date": "2024-01-15",
	"target_date": "2024-01-15",
	"is_active": false
	}
	],
	"expected": {
	"grade_bucket": "red",
	"status": "past_effective_date",
	"area_at_risk": true,
	"earliest_past_target_date": "2024-01-15",
	"supporting_notice_count": 1
	}
	}
	]
	```
	2. The parametrized loader in `test_area_risk_grading.py` reads the
	scenario list, seeds the in-memory test DB from `seed_notices`, and
	asserts the `expected` block. Use the same keys that
	`seed_notice(...)` accepts, plus optional `parsed_states`,
	`location_state`, and `signal_family` for geography conflicts and
	MAC-Freeze exclusions. The regression projection also exposes
	`supporting_notice_count`, `supporting_match_sources`,
	`supporting_geography_conflicts`, and `nearby_municipality_count`
	for compact assertions.
	3. Update the
	[area-risk-city-qa-checklist.md](area-risk-city-qa-checklist.md) if
	the scenario should also be in the pre-release regression sweep.

	### Adding a parser fixture

	1. Place raw notice text under
	`tests/fixtures/<carrier>_<short_slug>.txt` (for HTML/PDF, serialize
	extracted text — the parser operates on text).
	2. Prepend a one-line comment describing the fixture.
	3. Add a `test_parsers.py` entry asserting the parsed
	`signal_family`, `notice_type`, `rule_family`,
	`restriction_types`, `states`, and any distinctive fields.

	## CI

	The CI workflow (`.github/workflows/ci.yml`) runs, per P3-T1:

	- Backend ruff + pytest against Postgres service container.
	- Frontend typecheck + lint + vitest + build.
	- Area-risk regression gate (targeted pytest + the
	`check-area-risk-conservative-framing.sh` script).

	## Hosted smoke

	`backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py` hits a real
	deployment's trust summary, search, and dashboard routes. It
	intentionally fails when `is_queryable=false`; that failure is the
	correct signal for an empty Space.

	Run it via:

	```
	python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
	--base-url "$BASE_URL" \
	--min-structured-results 1
	```

	## Coverage expectations

	No hard coverage threshold — instead, require that any new business
	logic comes with a named test that would fail if the logic regressed.
	Reviewers should push back on untested behavior change.

	## What we do not test

	- External network calls. Connectors are tested with recorded fixtures.
	- OpenAI responses. AI service tests use a fake client and exercise the
	deterministic grounded fallback.
	- HF dataset uploads. Mocked via `HfApi` fakes.
	- Neon-specific features in unit tests — integration coverage is at the
	hosted smoke level.