Spaces:

crazycrazypete
/

pots-shutdown-tracker

Running

App Files Files Community

pots-shutdown-tracker / docs /TESTING.md

github-actions

Deploy e9638c4ddc3ed29a18779b38f43922aa3139b311

611bfd9 about 1 month ago

preview code

raw

history blame contribute delete

9.18 kB

Testing Strategy

Philosophy: tests must encode the product's conservative framing. The bar is higher for area-risk, trust summary, matching, and signal-family classification than for plumbing.

Test layout

backend/app/pots_shutdown_tracker/tests/
  fixtures/                         # HTML/PDF/TXT carrier samples and
                                    # JSON area-risk scenarios
  test_ai.py
  test_api.py                       # end-to-end request/response
  test_area_risk_grading.py         # NEW in P1-T1 / P3-T2
  test_bulk_lookup.py               # xlsx parser, worker, cleanup
  test_config.py
  test_connectors.py
  test_crawler.py
  test_embeddings.py
  test_extractors.py
  test_locks.py
  test_matching.py
  test_parsers.py
  test_policy.py
  test_review.py
  test_runtime_startup.py
  test_scheduler.py
  test_signal_links.py              # NEW in P2-T6
  test_smoke_hosted.py
  test_storage.py

Frontend tests live under frontend/src/**/*.test.tsx (Vitest + Testing Library).

Minimum local gate before PR

# backend
cd backend
ruff check app
ruff format --check app
POTS_TRACKER_DB_URL=sqlite:///:memory: \
POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
POTS_TRACKER_ENABLE_AI=false \
OPENAI_API_KEY= \
pytest -q

# frontend
cd frontend
npm run typecheck
npm run lint
npm run test
npm run build

Category-by-category requirements

Area-risk (highest-risk logic)

Any change under services/area_risk.py or services/coverage.py must extend test_area_risk_grading.py. Required invariants the tests must encode:

Past-target shutdowns are first-class evidence. They grade red permanently, regardless of is_active, and there is no separate historical-context channel anymore.
area_at_risk=True for non-green grades. Green is the only false case.
Grade mapping is stable:
- past_effective_date → red
- scheduled_within_12mo → orange
- scheduled_beyond_12mo / undated_shutdown → yellow
- mac_freeze_only → blue
- no_evidence_found → green
- insufficient_confidence stays green when the evidence is too weak to classify confidently
Structured city evidence beats text-city when both are present.
parsed_state_fallback is penalized in status_confidence.
Geography conflict never changes grade but always emits a caveat.
Airport sub-geographies like Miami Airport can promote to a direct match for the parent city.
Nearby-municipality evidence uses the configured threshold (3 by default) and can nudge a direct green/blue result up to yellow, but never above yellow.
Nearby evidence never downgrades a direct grade.
MAC-Freeze notices grade blue on their own and remain follow-on context when shutdown evidence is present.
State-level MAC-Freeze filings grade city searches as blue with status_confidence=low when no shutdown evidence conflicts; the caveat must name the state.

MAC-Freeze vs shutdown

Any change under parsers/source_specific.py that touches signal_family assignment must add or update a test under test_parsers.py covering:

A pure shutdown notice stays shutdown even when prose contains weak MAC-Freeze keywords (withdraw, grandfather).
A notice with explicit restriction tokens and availability / grandfather language becomes att_mac_freeze.
A notice with one weak signal abstains (stays shutdown) and logs classifier=att_mac_freeze_guard decision=abstain.

Trust gate

Any change under services/trust.py or the require_queryable_corpus dependency must cover:

Empty corpus → is_queryable=false → /search, /area-risk, /match/address, /coverage return 503.
Degraded due to missing coverage metadata → 503 on gated routes.
Healthy corpus → 200 on gated routes.
/trust-summary, /notices/*, /dashboard/*, /healthz, /readyz stay available regardless.

Bulk lookup

Any change under services/bulk_lookup.py, the /bulk-lookup/* routes, or the frontend Bulk Lookup page must cover:

Parser accepts case-insensitive city/state headers and the supported aliases (Location City, Province, town, municipality).
Parser rejects non-xlsx uploads, missing required columns, and files over bulk_lookup_max_rows.
State normalization accepts both two-letter abbreviations and full state names, and flags unrecognized states without failing valid rows.
Partial-failure tolerance: valid rows still process when sibling rows are missing city/state or contain invalid states.
Output xlsx has exactly the original columns plus color, grade_letter, as_of, and notes, and includes a Summary sheet with color counts, flagged count, top carriers, and metadata.
Background jobs transition from queued to running to either completed or failed, using a fresh SQLAlchemy session inside the worker thread.
Expired jobs are swept by both scheduler wiring and the cleanup-bulk-lookup-jobs CLI path, nulling blob references after deletion.
All four /bulk-lookup/* endpoints remain protected by the same require_queryable_corpus trust gate used by Search and Area Risk.

Admin auth

Changes touching any /admin/* endpoint must cover:

Disabled (no key configured) → 503.
Missing header → 401.
Wrong key → 401 (constant-time comparison).
Correct key → 2xx success path, and one audit row written.

Parsing

Use fixture corpora — never assert on strings pulled from the live web. Fixtures live under tests/fixtures/. Every fixture should carry a short comment at the top of the file explaining which carrier / notice family it represents and which invariant it's there to protect.

Fixtures

Adding an area-risk fixture

Add or update a scenario entry in tests/fixtures/area_risk/regressions.json:

[
  {
    "scenario_id": "historical_past_target_returns_red",
    "request": { "city": "Chicago", "state": "IL" },
    "seed_notices": [
      {
        "city": "Chicago",
        "state": "IL",
        "title": "Archived Chicago copper retirement notice",
        "summary": "Archived Chicago copper retirement notice.",
        "impact_text": "Chicago legacy analog service was retired in a prior filing.",
        "source_excerpt": "Chicago historical analog shutdown evidence.",
        "notice_id": "CHICAGO-ARCHIVED",
        "issue_date": "2024-01-15",
        "target_date": "2024-01-15",
        "is_active": false
      }
    ],
    "expected": {
      "grade_bucket": "red",
      "status": "past_effective_date",
      "area_at_risk": true,
      "earliest_past_target_date": "2024-01-15",
      "supporting_notice_count": 1
    }
  }
]

The parametrized loader in test_area_risk_grading.py reads the scenario list, seeds the in-memory test DB from seed_notices, and asserts the expected block. Use the same keys that seed_notice(...) accepts, plus optional parsed_states, location_state, and signal_family for geography conflicts and MAC-Freeze exclusions. The regression projection also exposes supporting_notice_count, supporting_match_sources, supporting_geography_conflicts, and nearby_municipality_count for compact assertions.
Update the area-risk-city-qa-checklist.md if the scenario should also be in the pre-release regression sweep.

Adding a parser fixture

Place raw notice text under tests/fixtures/<carrier>_<short_slug>.txt (for HTML/PDF, serialize extracted text — the parser operates on text).
Prepend a one-line comment describing the fixture.
Add a test_parsers.py entry asserting the parsed signal_family, notice_type, rule_family, restriction_types, states, and any distinctive fields.

CI

The CI workflow (.github/workflows/ci.yml) runs, per P3-T1:

Backend ruff + pytest against Postgres service container.
Frontend typecheck + lint + vitest + build.
Area-risk regression gate (targeted pytest + the check-area-risk-conservative-framing.sh script).

Hosted smoke

backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py hits a real deployment's trust summary, search, and dashboard routes. It intentionally fails when is_queryable=false; that failure is the correct signal for an empty Space.

Run it via:

python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
  --base-url "$BASE_URL" \
  --min-structured-results 1

Coverage expectations

No hard coverage threshold — instead, require that any new business logic comes with a named test that would fail if the logic regressed. Reviewers should push back on untested behavior change.

What we do not test

External network calls. Connectors are tested with recorded fixtures.
OpenAI responses. AI service tests use a fake client and exercise the deterministic grounded fallback.
HF dataset uploads. Mocked via HfApi fakes.
Neon-specific features in unit tests — integration coverage is at the hosted smoke level.