| # Testing Strategy |
|
|
| Philosophy: tests must encode the product's conservative framing. The |
| bar is higher for area-risk, trust summary, matching, and signal-family |
| classification than for plumbing. |
|
|
| ## Test layout |
|
|
| ``` |
| backend/app/pots_shutdown_tracker/tests/ |
| fixtures/ # HTML/PDF/TXT carrier samples and |
| # JSON area-risk scenarios |
| test_ai.py |
| test_api.py # end-to-end request/response |
| test_area_risk_grading.py # NEW in P1-T1 / P3-T2 |
| test_bulk_lookup.py # xlsx parser, worker, cleanup |
| test_config.py |
| test_connectors.py |
| test_crawler.py |
| test_embeddings.py |
| test_extractors.py |
| test_locks.py |
| test_matching.py |
| test_parsers.py |
| test_policy.py |
| test_review.py |
| test_runtime_startup.py |
| test_scheduler.py |
| test_signal_links.py # NEW in P2-T6 |
| test_smoke_hosted.py |
| test_storage.py |
| ``` |
|
|
| Frontend tests live under `frontend/src/**/*.test.tsx` (Vitest + |
| Testing Library). |
|
|
| ## Minimum local gate before PR |
|
|
| ``` |
| # backend |
| cd backend |
| ruff check app |
| ruff format --check app |
| POTS_TRACKER_DB_URL=sqlite:///:memory: \ |
| POTS_TRACKER_AUTO_CREATE_SCHEMA=true \ |
| POTS_TRACKER_ENABLE_AI=false \ |
| OPENAI_API_KEY= \ |
| pytest -q |
| |
| # frontend |
| cd frontend |
| npm run typecheck |
| npm run lint |
| npm run test |
| npm run build |
| ``` |
|
|
| ## Category-by-category requirements |
|
|
| ### Area-risk (highest-risk logic) |
|
|
| Any change under `services/area_risk.py` or `services/coverage.py` |
| must extend `test_area_risk_grading.py`. Required invariants the tests |
| must encode: |
|
|
| 1. **Past-target shutdowns are first-class evidence.** They grade red |
| permanently, regardless of `is_active`, and there is no separate |
| historical-context channel anymore. |
| 2. **`area_at_risk=True` for non-green grades.** Green is the only |
| false case. |
| 3. **Grade mapping is stable:** |
| - `past_effective_date` β red |
| - `scheduled_within_12mo` β orange |
| - `scheduled_beyond_12mo` / `undated_shutdown` β yellow |
| - `mac_freeze_only` β blue |
| - `no_evidence_found` β green |
| - `insufficient_confidence` stays green when the evidence is too |
| weak to classify confidently |
| 4. **Structured city evidence beats text-city** when both are present. |
| 5. **`parsed_state_fallback` is penalized** in `status_confidence`. |
| 6. **Geography conflict never changes grade** but always emits a |
| caveat. |
| 7. **Airport sub-geographies** like `Miami Airport` can promote to a |
| direct match for the parent city. |
| 8. **Nearby-municipality evidence** uses the configured threshold (3 by |
| default) and can nudge a direct green/blue result up to yellow, but |
| never above yellow. |
| 9. **Nearby evidence never downgrades** a direct grade. |
| 10. **MAC-Freeze notices** grade blue on their own and remain |
| follow-on context when shutdown evidence is present. |
| 11. **State-level MAC-Freeze filings** grade city searches as blue with |
| `status_confidence=low` when no shutdown evidence conflicts; the |
| caveat must name the state. |
| |
| ### MAC-Freeze vs shutdown |
|
|
| Any change under `parsers/source_specific.py` that touches |
| `signal_family` assignment must add or update a test under |
| `test_parsers.py` covering: |
|
|
| - A pure shutdown notice stays `shutdown` even when prose contains |
| weak MAC-Freeze keywords (`withdraw`, `grandfather`). |
| - A notice with explicit restriction tokens **and** availability / |
| grandfather language becomes `att_mac_freeze`. |
| - A notice with one weak signal abstains (stays `shutdown`) and logs |
| `classifier=att_mac_freeze_guard decision=abstain`. |
|
|
| ### Trust gate |
|
|
| Any change under `services/trust.py` or the `require_queryable_corpus` |
| dependency must cover: |
|
|
| - Empty corpus β `is_queryable=false` β `/search`, `/area-risk`, |
| `/match/address`, `/coverage` return 503. |
| - Degraded due to missing coverage metadata β 503 on gated routes. |
| - Healthy corpus β 200 on gated routes. |
| - `/trust-summary`, `/notices/*`, `/dashboard/*`, `/healthz`, `/readyz` |
| stay available regardless. |
|
|
| ### Bulk lookup |
|
|
| Any change under `services/bulk_lookup.py`, the `/bulk-lookup/*` |
| routes, or the frontend Bulk Lookup page must cover: |
|
|
| - Parser accepts case-insensitive city/state headers and the supported |
| aliases (`Location City`, `Province`, `town`, `municipality`). |
| - Parser rejects non-xlsx uploads, missing required columns, and files |
| over `bulk_lookup_max_rows`. |
| - State normalization accepts both two-letter abbreviations and full |
| state names, and flags unrecognized states without failing valid rows. |
| - Partial-failure tolerance: valid rows still process when sibling rows |
| are missing city/state or contain invalid states. |
| - Output xlsx has exactly the original columns plus `color`, |
| `grade_letter`, `as_of`, and `notes`, and includes a `Summary` sheet |
| with color counts, flagged count, top carriers, and metadata. |
| - Background jobs transition from queued to running to either completed |
| or failed, using a fresh SQLAlchemy session inside the worker thread. |
| - Expired jobs are swept by both scheduler wiring and the |
| `cleanup-bulk-lookup-jobs` CLI path, nulling blob references after |
| deletion. |
| - All four `/bulk-lookup/*` endpoints remain protected by the same |
| `require_queryable_corpus` trust gate used by Search and Area Risk. |
|
|
| ### Admin auth |
|
|
| Changes touching any `/admin/*` endpoint must cover: |
|
|
| - Disabled (no key configured) β 503. |
| - Missing header β 401. |
| - Wrong key β 401 (constant-time comparison). |
| - Correct key β 2xx success path, and one audit row written. |
|
|
| ### Parsing |
|
|
| Use fixture corpora β never assert on strings pulled from the live web. |
| Fixtures live under `tests/fixtures/`. Every fixture should carry a |
| short comment at the top of the file explaining which carrier / notice |
| family it represents and which invariant it's there to protect. |
|
|
| ## Fixtures |
|
|
| ### Adding an area-risk fixture |
|
|
| 1. Add or update a scenario entry in |
| `tests/fixtures/area_risk/regressions.json`: |
| ```json |
| [ |
| { |
| "scenario_id": "historical_past_target_returns_red", |
| "request": { "city": "Chicago", "state": "IL" }, |
| "seed_notices": [ |
| { |
| "city": "Chicago", |
| "state": "IL", |
| "title": "Archived Chicago copper retirement notice", |
| "summary": "Archived Chicago copper retirement notice.", |
| "impact_text": "Chicago legacy analog service was retired in a prior filing.", |
| "source_excerpt": "Chicago historical analog shutdown evidence.", |
| "notice_id": "CHICAGO-ARCHIVED", |
| "issue_date": "2024-01-15", |
| "target_date": "2024-01-15", |
| "is_active": false |
| } |
| ], |
| "expected": { |
| "grade_bucket": "red", |
| "status": "past_effective_date", |
| "area_at_risk": true, |
| "earliest_past_target_date": "2024-01-15", |
| "supporting_notice_count": 1 |
| } |
| } |
| ] |
| ``` |
| 2. The parametrized loader in `test_area_risk_grading.py` reads the |
| scenario list, seeds the in-memory test DB from `seed_notices`, and |
| asserts the `expected` block. Use the same keys that |
| `seed_notice(...)` accepts, plus optional `parsed_states`, |
| `location_state`, and `signal_family` for geography conflicts and |
| MAC-Freeze exclusions. The regression projection also exposes |
| `supporting_notice_count`, `supporting_match_sources`, |
| `supporting_geography_conflicts`, and `nearby_municipality_count` |
| for compact assertions. |
| 3. Update the |
| [area-risk-city-qa-checklist.md](area-risk-city-qa-checklist.md) if |
| the scenario should also be in the pre-release regression sweep. |
|
|
| ### Adding a parser fixture |
|
|
| 1. Place raw notice text under |
| `tests/fixtures/<carrier>_<short_slug>.txt` (for HTML/PDF, serialize |
| extracted text β the parser operates on text). |
| 2. Prepend a one-line comment describing the fixture. |
| 3. Add a `test_parsers.py` entry asserting the parsed |
| `signal_family`, `notice_type`, `rule_family`, |
| `restriction_types`, `states`, and any distinctive fields. |
|
|
| ## CI |
|
|
| The CI workflow (`.github/workflows/ci.yml`) runs, per P3-T1: |
|
|
| - Backend ruff + pytest against Postgres service container. |
| - Frontend typecheck + lint + vitest + build. |
| - Area-risk regression gate (targeted pytest + the |
| `check-area-risk-conservative-framing.sh` script). |
|
|
| ## Hosted smoke |
|
|
| `backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py` hits a real |
| deployment's trust summary, search, and dashboard routes. It |
| intentionally fails when `is_queryable=false`; that failure is the |
| correct signal for an empty Space. |
|
|
| Run it via: |
|
|
| ``` |
| python3 -m pots_shutdown_tracker.scripts.smoke_hosted \ |
| --base-url "$BASE_URL" \ |
| --min-structured-results 1 |
| ``` |
|
|
| ## Coverage expectations |
|
|
| No hard coverage threshold β instead, require that any new business |
| logic comes with a named test that would fail if the logic regressed. |
| Reviewers should push back on untested behavior change. |
|
|
| ## What we do not test |
|
|
| - External network calls. Connectors are tested with recorded fixtures. |
| - OpenAI responses. AI service tests use a fake client and exercise the |
| deterministic grounded fallback. |
| - HF dataset uploads. Mocked via `HfApi` fakes. |
| - Neon-specific features in unit tests β integration coverage is at the |
| hosted smoke level. |
|
|