github-actions
Deploy e9638c4ddc3ed29a18779b38f43922aa3139b311
611bfd9

Testing Strategy

Philosophy: tests must encode the product's conservative framing. The bar is higher for area-risk, trust summary, matching, and signal-family classification than for plumbing.

Test layout

backend/app/pots_shutdown_tracker/tests/
  fixtures/                         # HTML/PDF/TXT carrier samples and
                                    # JSON area-risk scenarios
  test_ai.py
  test_api.py                       # end-to-end request/response
  test_area_risk_grading.py         # NEW in P1-T1 / P3-T2
  test_bulk_lookup.py               # xlsx parser, worker, cleanup
  test_config.py
  test_connectors.py
  test_crawler.py
  test_embeddings.py
  test_extractors.py
  test_locks.py
  test_matching.py
  test_parsers.py
  test_policy.py
  test_review.py
  test_runtime_startup.py
  test_scheduler.py
  test_signal_links.py              # NEW in P2-T6
  test_smoke_hosted.py
  test_storage.py

Frontend tests live under frontend/src/**/*.test.tsx (Vitest + Testing Library).

Minimum local gate before PR

# backend
cd backend
ruff check app
ruff format --check app
POTS_TRACKER_DB_URL=sqlite:///:memory: \
POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
POTS_TRACKER_ENABLE_AI=false \
OPENAI_API_KEY= \
pytest -q

# frontend
cd frontend
npm run typecheck
npm run lint
npm run test
npm run build

Category-by-category requirements

Area-risk (highest-risk logic)

Any change under services/area_risk.py or services/coverage.py must extend test_area_risk_grading.py. Required invariants the tests must encode:

  1. Past-target shutdowns are first-class evidence. They grade red permanently, regardless of is_active, and there is no separate historical-context channel anymore.
  2. area_at_risk=True for non-green grades. Green is the only false case.
  3. Grade mapping is stable:
    • past_effective_date β†’ red
    • scheduled_within_12mo β†’ orange
    • scheduled_beyond_12mo / undated_shutdown β†’ yellow
    • mac_freeze_only β†’ blue
    • no_evidence_found β†’ green
    • insufficient_confidence stays green when the evidence is too weak to classify confidently
  4. Structured city evidence beats text-city when both are present.
  5. parsed_state_fallback is penalized in status_confidence.
  6. Geography conflict never changes grade but always emits a caveat.
  7. Airport sub-geographies like Miami Airport can promote to a direct match for the parent city.
  8. Nearby-municipality evidence uses the configured threshold (3 by default) and can nudge a direct green/blue result up to yellow, but never above yellow.
  9. Nearby evidence never downgrades a direct grade.
  10. MAC-Freeze notices grade blue on their own and remain follow-on context when shutdown evidence is present.
  11. State-level MAC-Freeze filings grade city searches as blue with status_confidence=low when no shutdown evidence conflicts; the caveat must name the state.

MAC-Freeze vs shutdown

Any change under parsers/source_specific.py that touches signal_family assignment must add or update a test under test_parsers.py covering:

  • A pure shutdown notice stays shutdown even when prose contains weak MAC-Freeze keywords (withdraw, grandfather).
  • A notice with explicit restriction tokens and availability / grandfather language becomes att_mac_freeze.
  • A notice with one weak signal abstains (stays shutdown) and logs classifier=att_mac_freeze_guard decision=abstain.

Trust gate

Any change under services/trust.py or the require_queryable_corpus dependency must cover:

  • Empty corpus β†’ is_queryable=false β†’ /search, /area-risk, /match/address, /coverage return 503.
  • Degraded due to missing coverage metadata β†’ 503 on gated routes.
  • Healthy corpus β†’ 200 on gated routes.
  • /trust-summary, /notices/*, /dashboard/*, /healthz, /readyz stay available regardless.

Bulk lookup

Any change under services/bulk_lookup.py, the /bulk-lookup/* routes, or the frontend Bulk Lookup page must cover:

  • Parser accepts case-insensitive city/state headers and the supported aliases (Location City, Province, town, municipality).
  • Parser rejects non-xlsx uploads, missing required columns, and files over bulk_lookup_max_rows.
  • State normalization accepts both two-letter abbreviations and full state names, and flags unrecognized states without failing valid rows.
  • Partial-failure tolerance: valid rows still process when sibling rows are missing city/state or contain invalid states.
  • Output xlsx has exactly the original columns plus color, grade_letter, as_of, and notes, and includes a Summary sheet with color counts, flagged count, top carriers, and metadata.
  • Background jobs transition from queued to running to either completed or failed, using a fresh SQLAlchemy session inside the worker thread.
  • Expired jobs are swept by both scheduler wiring and the cleanup-bulk-lookup-jobs CLI path, nulling blob references after deletion.
  • All four /bulk-lookup/* endpoints remain protected by the same require_queryable_corpus trust gate used by Search and Area Risk.

Admin auth

Changes touching any /admin/* endpoint must cover:

  • Disabled (no key configured) β†’ 503.
  • Missing header β†’ 401.
  • Wrong key β†’ 401 (constant-time comparison).
  • Correct key β†’ 2xx success path, and one audit row written.

Parsing

Use fixture corpora β€” never assert on strings pulled from the live web. Fixtures live under tests/fixtures/. Every fixture should carry a short comment at the top of the file explaining which carrier / notice family it represents and which invariant it's there to protect.

Fixtures

Adding an area-risk fixture

  1. Add or update a scenario entry in tests/fixtures/area_risk/regressions.json:
    [
      {
        "scenario_id": "historical_past_target_returns_red",
        "request": { "city": "Chicago", "state": "IL" },
        "seed_notices": [
          {
            "city": "Chicago",
            "state": "IL",
            "title": "Archived Chicago copper retirement notice",
            "summary": "Archived Chicago copper retirement notice.",
            "impact_text": "Chicago legacy analog service was retired in a prior filing.",
            "source_excerpt": "Chicago historical analog shutdown evidence.",
            "notice_id": "CHICAGO-ARCHIVED",
            "issue_date": "2024-01-15",
            "target_date": "2024-01-15",
            "is_active": false
          }
        ],
        "expected": {
          "grade_bucket": "red",
          "status": "past_effective_date",
          "area_at_risk": true,
          "earliest_past_target_date": "2024-01-15",
          "supporting_notice_count": 1
        }
      }
    ]
    
  2. The parametrized loader in test_area_risk_grading.py reads the scenario list, seeds the in-memory test DB from seed_notices, and asserts the expected block. Use the same keys that seed_notice(...) accepts, plus optional parsed_states, location_state, and signal_family for geography conflicts and MAC-Freeze exclusions. The regression projection also exposes supporting_notice_count, supporting_match_sources, supporting_geography_conflicts, and nearby_municipality_count for compact assertions.
  3. Update the area-risk-city-qa-checklist.md if the scenario should also be in the pre-release regression sweep.

Adding a parser fixture

  1. Place raw notice text under tests/fixtures/<carrier>_<short_slug>.txt (for HTML/PDF, serialize extracted text β€” the parser operates on text).
  2. Prepend a one-line comment describing the fixture.
  3. Add a test_parsers.py entry asserting the parsed signal_family, notice_type, rule_family, restriction_types, states, and any distinctive fields.

CI

The CI workflow (.github/workflows/ci.yml) runs, per P3-T1:

  • Backend ruff + pytest against Postgres service container.
  • Frontend typecheck + lint + vitest + build.
  • Area-risk regression gate (targeted pytest + the check-area-risk-conservative-framing.sh script).

Hosted smoke

backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py hits a real deployment's trust summary, search, and dashboard routes. It intentionally fails when is_queryable=false; that failure is the correct signal for an empty Space.

Run it via:

python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
  --base-url "$BASE_URL" \
  --min-structured-results 1

Coverage expectations

No hard coverage threshold β€” instead, require that any new business logic comes with a named test that would fail if the logic regressed. Reviewers should push back on untested behavior change.

What we do not test

  • External network calls. Connectors are tested with recorded fixtures.
  • OpenAI responses. AI service tests use a fake client and exercise the deterministic grounded fallback.
  • HF dataset uploads. Mocked via HfApi fakes.
  • Neon-specific features in unit tests β€” integration coverage is at the hosted smoke level.