# Testing Strategy Philosophy: tests must encode the product's conservative framing. The bar is higher for area-risk, trust summary, matching, and signal-family classification than for plumbing. ## Test layout ``` backend/app/pots_shutdown_tracker/tests/ fixtures/ # HTML/PDF/TXT carrier samples and # JSON area-risk scenarios test_ai.py test_api.py # end-to-end request/response test_area_risk_grading.py # NEW in P1-T1 / P3-T2 test_bulk_lookup.py # xlsx parser, worker, cleanup test_config.py test_connectors.py test_crawler.py test_embeddings.py test_extractors.py test_locks.py test_matching.py test_parsers.py test_policy.py test_review.py test_runtime_startup.py test_scheduler.py test_signal_links.py # NEW in P2-T6 test_smoke_hosted.py test_storage.py ``` Frontend tests live under `frontend/src/**/*.test.tsx` (Vitest + Testing Library). ## Minimum local gate before PR ``` # backend cd backend ruff check app ruff format --check app POTS_TRACKER_DB_URL=sqlite:///:memory: \ POTS_TRACKER_AUTO_CREATE_SCHEMA=true \ POTS_TRACKER_ENABLE_AI=false \ OPENAI_API_KEY= \ pytest -q # frontend cd frontend npm run typecheck npm run lint npm run test npm run build ``` ## Category-by-category requirements ### Area-risk (highest-risk logic) Any change under `services/area_risk.py` or `services/coverage.py` must extend `test_area_risk_grading.py`. Required invariants the tests must encode: 1. **Past-target shutdowns are first-class evidence.** They grade red permanently, regardless of `is_active`, and there is no separate historical-context channel anymore. 2. **`area_at_risk=True` for non-green grades.** Green is the only false case. 3. **Grade mapping is stable:** - `past_effective_date` → red - `scheduled_within_12mo` → orange - `scheduled_beyond_12mo` / `undated_shutdown` → yellow - `mac_freeze_only` → blue - `no_evidence_found` → green - `insufficient_confidence` stays green when the evidence is too weak to classify confidently 4. **Structured city evidence beats text-city** when both are present. 5. **`parsed_state_fallback` is penalized** in `status_confidence`. 6. **Geography conflict never changes grade** but always emits a caveat. 7. **Airport sub-geographies** like `Miami Airport` can promote to a direct match for the parent city. 8. **Nearby-municipality evidence** uses the configured threshold (3 by default) and can nudge a direct green/blue result up to yellow, but never above yellow. 9. **Nearby evidence never downgrades** a direct grade. 10. **MAC-Freeze notices** grade blue on their own and remain follow-on context when shutdown evidence is present. 11. **State-level MAC-Freeze filings** grade city searches as blue with `status_confidence=low` when no shutdown evidence conflicts; the caveat must name the state. ### MAC-Freeze vs shutdown Any change under `parsers/source_specific.py` that touches `signal_family` assignment must add or update a test under `test_parsers.py` covering: - A pure shutdown notice stays `shutdown` even when prose contains weak MAC-Freeze keywords (`withdraw`, `grandfather`). - A notice with explicit restriction tokens **and** availability / grandfather language becomes `att_mac_freeze`. - A notice with one weak signal abstains (stays `shutdown`) and logs `classifier=att_mac_freeze_guard decision=abstain`. ### Trust gate Any change under `services/trust.py` or the `require_queryable_corpus` dependency must cover: - Empty corpus → `is_queryable=false` → `/search`, `/area-risk`, `/match/address`, `/coverage` return 503. - Degraded due to missing coverage metadata → 503 on gated routes. - Healthy corpus → 200 on gated routes. - `/trust-summary`, `/notices/*`, `/dashboard/*`, `/healthz`, `/readyz` stay available regardless. ### Bulk lookup Any change under `services/bulk_lookup.py`, the `/bulk-lookup/*` routes, or the frontend Bulk Lookup page must cover: - Parser accepts case-insensitive city/state headers and the supported aliases (`Location City`, `Province`, `town`, `municipality`). - Parser rejects non-xlsx uploads, missing required columns, and files over `bulk_lookup_max_rows`. - State normalization accepts both two-letter abbreviations and full state names, and flags unrecognized states without failing valid rows. - Partial-failure tolerance: valid rows still process when sibling rows are missing city/state or contain invalid states. - Output xlsx has exactly the original columns plus `color`, `grade_letter`, `as_of`, and `notes`, and includes a `Summary` sheet with color counts, flagged count, top carriers, and metadata. - Background jobs transition from queued to running to either completed or failed, using a fresh SQLAlchemy session inside the worker thread. - Expired jobs are swept by both scheduler wiring and the `cleanup-bulk-lookup-jobs` CLI path, nulling blob references after deletion. - All four `/bulk-lookup/*` endpoints remain protected by the same `require_queryable_corpus` trust gate used by Search and Area Risk. ### Admin auth Changes touching any `/admin/*` endpoint must cover: - Disabled (no key configured) → 503. - Missing header → 401. - Wrong key → 401 (constant-time comparison). - Correct key → 2xx success path, and one audit row written. ### Parsing Use fixture corpora — never assert on strings pulled from the live web. Fixtures live under `tests/fixtures/`. Every fixture should carry a short comment at the top of the file explaining which carrier / notice family it represents and which invariant it's there to protect. ## Fixtures ### Adding an area-risk fixture 1. Add or update a scenario entry in `tests/fixtures/area_risk/regressions.json`: ```json [ { "scenario_id": "historical_past_target_returns_red", "request": { "city": "Chicago", "state": "IL" }, "seed_notices": [ { "city": "Chicago", "state": "IL", "title": "Archived Chicago copper retirement notice", "summary": "Archived Chicago copper retirement notice.", "impact_text": "Chicago legacy analog service was retired in a prior filing.", "source_excerpt": "Chicago historical analog shutdown evidence.", "notice_id": "CHICAGO-ARCHIVED", "issue_date": "2024-01-15", "target_date": "2024-01-15", "is_active": false } ], "expected": { "grade_bucket": "red", "status": "past_effective_date", "area_at_risk": true, "earliest_past_target_date": "2024-01-15", "supporting_notice_count": 1 } } ] ``` 2. The parametrized loader in `test_area_risk_grading.py` reads the scenario list, seeds the in-memory test DB from `seed_notices`, and asserts the `expected` block. Use the same keys that `seed_notice(...)` accepts, plus optional `parsed_states`, `location_state`, and `signal_family` for geography conflicts and MAC-Freeze exclusions. The regression projection also exposes `supporting_notice_count`, `supporting_match_sources`, `supporting_geography_conflicts`, and `nearby_municipality_count` for compact assertions. 3. Update the [area-risk-city-qa-checklist.md](area-risk-city-qa-checklist.md) if the scenario should also be in the pre-release regression sweep. ### Adding a parser fixture 1. Place raw notice text under `tests/fixtures/_.txt` (for HTML/PDF, serialize extracted text — the parser operates on text). 2. Prepend a one-line comment describing the fixture. 3. Add a `test_parsers.py` entry asserting the parsed `signal_family`, `notice_type`, `rule_family`, `restriction_types`, `states`, and any distinctive fields. ## CI The CI workflow (`.github/workflows/ci.yml`) runs, per P3-T1: - Backend ruff + pytest against Postgres service container. - Frontend typecheck + lint + vitest + build. - Area-risk regression gate (targeted pytest + the `check-area-risk-conservative-framing.sh` script). ## Hosted smoke `backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py` hits a real deployment's trust summary, search, and dashboard routes. It intentionally fails when `is_queryable=false`; that failure is the correct signal for an empty Space. Run it via: ``` python3 -m pots_shutdown_tracker.scripts.smoke_hosted \ --base-url "$BASE_URL" \ --min-structured-results 1 ``` ## Coverage expectations No hard coverage threshold — instead, require that any new business logic comes with a named test that would fail if the logic regressed. Reviewers should push back on untested behavior change. ## What we do not test - External network calls. Connectors are tested with recorded fixtures. - OpenAI responses. AI service tests use a fake client and exercise the deterministic grounded fallback. - HF dataset uploads. Mocked via `HfApi` fakes. - Neon-specific features in unit tests — integration coverage is at the hosted smoke level.