Testing Strategy
Philosophy: tests must encode the product's conservative framing. The bar is higher for area-risk, trust summary, matching, and signal-family classification than for plumbing.
Test layout
backend/app/pots_shutdown_tracker/tests/
fixtures/ # HTML/PDF/TXT carrier samples and
# JSON area-risk scenarios
test_ai.py
test_api.py # end-to-end request/response
test_area_risk_grading.py # NEW in P1-T1 / P3-T2
test_bulk_lookup.py # xlsx parser, worker, cleanup
test_config.py
test_connectors.py
test_crawler.py
test_embeddings.py
test_extractors.py
test_locks.py
test_matching.py
test_parsers.py
test_policy.py
test_review.py
test_runtime_startup.py
test_scheduler.py
test_signal_links.py # NEW in P2-T6
test_smoke_hosted.py
test_storage.py
Frontend tests live under frontend/src/**/*.test.tsx (Vitest +
Testing Library).
Minimum local gate before PR
# backend
cd backend
ruff check app
ruff format --check app
POTS_TRACKER_DB_URL=sqlite:///:memory: \
POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
POTS_TRACKER_ENABLE_AI=false \
OPENAI_API_KEY= \
pytest -q
# frontend
cd frontend
npm run typecheck
npm run lint
npm run test
npm run build
Category-by-category requirements
Area-risk (highest-risk logic)
Any change under services/area_risk.py or services/coverage.py
must extend test_area_risk_grading.py. Required invariants the tests
must encode:
- Past-target shutdowns are first-class evidence. They grade red
permanently, regardless of
is_active, and there is no separate historical-context channel anymore. area_at_risk=Truefor non-green grades. Green is the only false case.- Grade mapping is stable:
past_effective_dateβ redscheduled_within_12moβ orangescheduled_beyond_12mo/undated_shutdownβ yellowmac_freeze_onlyβ blueno_evidence_foundβ greeninsufficient_confidencestays green when the evidence is too weak to classify confidently
- Structured city evidence beats text-city when both are present.
parsed_state_fallbackis penalized instatus_confidence.- Geography conflict never changes grade but always emits a caveat.
- Airport sub-geographies like
Miami Airportcan promote to a direct match for the parent city. - Nearby-municipality evidence uses the configured threshold (3 by default) and can nudge a direct green/blue result up to yellow, but never above yellow.
- Nearby evidence never downgrades a direct grade.
- MAC-Freeze notices grade blue on their own and remain follow-on context when shutdown evidence is present.
- State-level MAC-Freeze filings grade city searches as blue with
status_confidence=lowwhen no shutdown evidence conflicts; the caveat must name the state.
MAC-Freeze vs shutdown
Any change under parsers/source_specific.py that touches
signal_family assignment must add or update a test under
test_parsers.py covering:
- A pure shutdown notice stays
shutdowneven when prose contains weak MAC-Freeze keywords (withdraw,grandfather). - A notice with explicit restriction tokens and availability /
grandfather language becomes
att_mac_freeze. - A notice with one weak signal abstains (stays
shutdown) and logsclassifier=att_mac_freeze_guard decision=abstain.
Trust gate
Any change under services/trust.py or the require_queryable_corpus
dependency must cover:
- Empty corpus β
is_queryable=falseβ/search,/area-risk,/match/address,/coveragereturn 503. - Degraded due to missing coverage metadata β 503 on gated routes.
- Healthy corpus β 200 on gated routes.
/trust-summary,/notices/*,/dashboard/*,/healthz,/readyzstay available regardless.
Bulk lookup
Any change under services/bulk_lookup.py, the /bulk-lookup/*
routes, or the frontend Bulk Lookup page must cover:
- Parser accepts case-insensitive city/state headers and the supported
aliases (
Location City,Province,town,municipality). - Parser rejects non-xlsx uploads, missing required columns, and files
over
bulk_lookup_max_rows. - State normalization accepts both two-letter abbreviations and full state names, and flags unrecognized states without failing valid rows.
- Partial-failure tolerance: valid rows still process when sibling rows are missing city/state or contain invalid states.
- Output xlsx has exactly the original columns plus
color,grade_letter,as_of, andnotes, and includes aSummarysheet with color counts, flagged count, top carriers, and metadata. - Background jobs transition from queued to running to either completed or failed, using a fresh SQLAlchemy session inside the worker thread.
- Expired jobs are swept by both scheduler wiring and the
cleanup-bulk-lookup-jobsCLI path, nulling blob references after deletion. - All four
/bulk-lookup/*endpoints remain protected by the samerequire_queryable_corpustrust gate used by Search and Area Risk.
Admin auth
Changes touching any /admin/* endpoint must cover:
- Disabled (no key configured) β 503.
- Missing header β 401.
- Wrong key β 401 (constant-time comparison).
- Correct key β 2xx success path, and one audit row written.
Parsing
Use fixture corpora β never assert on strings pulled from the live web.
Fixtures live under tests/fixtures/. Every fixture should carry a
short comment at the top of the file explaining which carrier / notice
family it represents and which invariant it's there to protect.
Fixtures
Adding an area-risk fixture
- Add or update a scenario entry in
tests/fixtures/area_risk/regressions.json:[ { "scenario_id": "historical_past_target_returns_red", "request": { "city": "Chicago", "state": "IL" }, "seed_notices": [ { "city": "Chicago", "state": "IL", "title": "Archived Chicago copper retirement notice", "summary": "Archived Chicago copper retirement notice.", "impact_text": "Chicago legacy analog service was retired in a prior filing.", "source_excerpt": "Chicago historical analog shutdown evidence.", "notice_id": "CHICAGO-ARCHIVED", "issue_date": "2024-01-15", "target_date": "2024-01-15", "is_active": false } ], "expected": { "grade_bucket": "red", "status": "past_effective_date", "area_at_risk": true, "earliest_past_target_date": "2024-01-15", "supporting_notice_count": 1 } } ] - The parametrized loader in
test_area_risk_grading.pyreads the scenario list, seeds the in-memory test DB fromseed_notices, and asserts theexpectedblock. Use the same keys thatseed_notice(...)accepts, plus optionalparsed_states,location_state, andsignal_familyfor geography conflicts and MAC-Freeze exclusions. The regression projection also exposessupporting_notice_count,supporting_match_sources,supporting_geography_conflicts, andnearby_municipality_countfor compact assertions. - Update the area-risk-city-qa-checklist.md if the scenario should also be in the pre-release regression sweep.
Adding a parser fixture
- Place raw notice text under
tests/fixtures/<carrier>_<short_slug>.txt(for HTML/PDF, serialize extracted text β the parser operates on text). - Prepend a one-line comment describing the fixture.
- Add a
test_parsers.pyentry asserting the parsedsignal_family,notice_type,rule_family,restriction_types,states, and any distinctive fields.
CI
The CI workflow (.github/workflows/ci.yml) runs, per P3-T1:
- Backend ruff + pytest against Postgres service container.
- Frontend typecheck + lint + vitest + build.
- Area-risk regression gate (targeted pytest + the
check-area-risk-conservative-framing.shscript).
Hosted smoke
backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py hits a real
deployment's trust summary, search, and dashboard routes. It
intentionally fails when is_queryable=false; that failure is the
correct signal for an empty Space.
Run it via:
python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
--base-url "$BASE_URL" \
--min-structured-results 1
Coverage expectations
No hard coverage threshold β instead, require that any new business logic comes with a named test that would fail if the logic regressed. Reviewers should push back on untested behavior change.
What we do not test
- External network calls. Connectors are tested with recorded fixtures.
- OpenAI responses. AI service tests use a fake client and exercise the deterministic grounded fallback.
- HF dataset uploads. Mocked via
HfApifakes. - Neon-specific features in unit tests β integration coverage is at the hosted smoke level.