github-actions
Deploy e9638c4ddc3ed29a18779b38f43922aa3139b311
611bfd9
# Testing Strategy
Philosophy: tests must encode the product's conservative framing. The
bar is higher for area-risk, trust summary, matching, and signal-family
classification than for plumbing.
## Test layout
```
backend/app/pots_shutdown_tracker/tests/
fixtures/ # HTML/PDF/TXT carrier samples and
# JSON area-risk scenarios
test_ai.py
test_api.py # end-to-end request/response
test_area_risk_grading.py # NEW in P1-T1 / P3-T2
test_bulk_lookup.py # xlsx parser, worker, cleanup
test_config.py
test_connectors.py
test_crawler.py
test_embeddings.py
test_extractors.py
test_locks.py
test_matching.py
test_parsers.py
test_policy.py
test_review.py
test_runtime_startup.py
test_scheduler.py
test_signal_links.py # NEW in P2-T6
test_smoke_hosted.py
test_storage.py
```
Frontend tests live under `frontend/src/**/*.test.tsx` (Vitest +
Testing Library).
## Minimum local gate before PR
```
# backend
cd backend
ruff check app
ruff format --check app
POTS_TRACKER_DB_URL=sqlite:///:memory: \
POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
POTS_TRACKER_ENABLE_AI=false \
OPENAI_API_KEY= \
pytest -q
# frontend
cd frontend
npm run typecheck
npm run lint
npm run test
npm run build
```
## Category-by-category requirements
### Area-risk (highest-risk logic)
Any change under `services/area_risk.py` or `services/coverage.py`
must extend `test_area_risk_grading.py`. Required invariants the tests
must encode:
1. **Past-target shutdowns are first-class evidence.** They grade red
permanently, regardless of `is_active`, and there is no separate
historical-context channel anymore.
2. **`area_at_risk=True` for non-green grades.** Green is the only
false case.
3. **Grade mapping is stable:**
- `past_effective_date` β†’ red
- `scheduled_within_12mo` β†’ orange
- `scheduled_beyond_12mo` / `undated_shutdown` β†’ yellow
- `mac_freeze_only` β†’ blue
- `no_evidence_found` β†’ green
- `insufficient_confidence` stays green when the evidence is too
weak to classify confidently
4. **Structured city evidence beats text-city** when both are present.
5. **`parsed_state_fallback` is penalized** in `status_confidence`.
6. **Geography conflict never changes grade** but always emits a
caveat.
7. **Airport sub-geographies** like `Miami Airport` can promote to a
direct match for the parent city.
8. **Nearby-municipality evidence** uses the configured threshold (3 by
default) and can nudge a direct green/blue result up to yellow, but
never above yellow.
9. **Nearby evidence never downgrades** a direct grade.
10. **MAC-Freeze notices** grade blue on their own and remain
follow-on context when shutdown evidence is present.
11. **State-level MAC-Freeze filings** grade city searches as blue with
`status_confidence=low` when no shutdown evidence conflicts; the
caveat must name the state.
### MAC-Freeze vs shutdown
Any change under `parsers/source_specific.py` that touches
`signal_family` assignment must add or update a test under
`test_parsers.py` covering:
- A pure shutdown notice stays `shutdown` even when prose contains
weak MAC-Freeze keywords (`withdraw`, `grandfather`).
- A notice with explicit restriction tokens **and** availability /
grandfather language becomes `att_mac_freeze`.
- A notice with one weak signal abstains (stays `shutdown`) and logs
`classifier=att_mac_freeze_guard decision=abstain`.
### Trust gate
Any change under `services/trust.py` or the `require_queryable_corpus`
dependency must cover:
- Empty corpus β†’ `is_queryable=false` β†’ `/search`, `/area-risk`,
`/match/address`, `/coverage` return 503.
- Degraded due to missing coverage metadata β†’ 503 on gated routes.
- Healthy corpus β†’ 200 on gated routes.
- `/trust-summary`, `/notices/*`, `/dashboard/*`, `/healthz`, `/readyz`
stay available regardless.
### Bulk lookup
Any change under `services/bulk_lookup.py`, the `/bulk-lookup/*`
routes, or the frontend Bulk Lookup page must cover:
- Parser accepts case-insensitive city/state headers and the supported
aliases (`Location City`, `Province`, `town`, `municipality`).
- Parser rejects non-xlsx uploads, missing required columns, and files
over `bulk_lookup_max_rows`.
- State normalization accepts both two-letter abbreviations and full
state names, and flags unrecognized states without failing valid rows.
- Partial-failure tolerance: valid rows still process when sibling rows
are missing city/state or contain invalid states.
- Output xlsx has exactly the original columns plus `color`,
`grade_letter`, `as_of`, and `notes`, and includes a `Summary` sheet
with color counts, flagged count, top carriers, and metadata.
- Background jobs transition from queued to running to either completed
or failed, using a fresh SQLAlchemy session inside the worker thread.
- Expired jobs are swept by both scheduler wiring and the
`cleanup-bulk-lookup-jobs` CLI path, nulling blob references after
deletion.
- All four `/bulk-lookup/*` endpoints remain protected by the same
`require_queryable_corpus` trust gate used by Search and Area Risk.
### Admin auth
Changes touching any `/admin/*` endpoint must cover:
- Disabled (no key configured) β†’ 503.
- Missing header β†’ 401.
- Wrong key β†’ 401 (constant-time comparison).
- Correct key β†’ 2xx success path, and one audit row written.
### Parsing
Use fixture corpora β€” never assert on strings pulled from the live web.
Fixtures live under `tests/fixtures/`. Every fixture should carry a
short comment at the top of the file explaining which carrier / notice
family it represents and which invariant it's there to protect.
## Fixtures
### Adding an area-risk fixture
1. Add or update a scenario entry in
`tests/fixtures/area_risk/regressions.json`:
```json
[
{
"scenario_id": "historical_past_target_returns_red",
"request": { "city": "Chicago", "state": "IL" },
"seed_notices": [
{
"city": "Chicago",
"state": "IL",
"title": "Archived Chicago copper retirement notice",
"summary": "Archived Chicago copper retirement notice.",
"impact_text": "Chicago legacy analog service was retired in a prior filing.",
"source_excerpt": "Chicago historical analog shutdown evidence.",
"notice_id": "CHICAGO-ARCHIVED",
"issue_date": "2024-01-15",
"target_date": "2024-01-15",
"is_active": false
}
],
"expected": {
"grade_bucket": "red",
"status": "past_effective_date",
"area_at_risk": true,
"earliest_past_target_date": "2024-01-15",
"supporting_notice_count": 1
}
}
]
```
2. The parametrized loader in `test_area_risk_grading.py` reads the
scenario list, seeds the in-memory test DB from `seed_notices`, and
asserts the `expected` block. Use the same keys that
`seed_notice(...)` accepts, plus optional `parsed_states`,
`location_state`, and `signal_family` for geography conflicts and
MAC-Freeze exclusions. The regression projection also exposes
`supporting_notice_count`, `supporting_match_sources`,
`supporting_geography_conflicts`, and `nearby_municipality_count`
for compact assertions.
3. Update the
[area-risk-city-qa-checklist.md](area-risk-city-qa-checklist.md) if
the scenario should also be in the pre-release regression sweep.
### Adding a parser fixture
1. Place raw notice text under
`tests/fixtures/<carrier>_<short_slug>.txt` (for HTML/PDF, serialize
extracted text β€” the parser operates on text).
2. Prepend a one-line comment describing the fixture.
3. Add a `test_parsers.py` entry asserting the parsed
`signal_family`, `notice_type`, `rule_family`,
`restriction_types`, `states`, and any distinctive fields.
## CI
The CI workflow (`.github/workflows/ci.yml`) runs, per P3-T1:
- Backend ruff + pytest against Postgres service container.
- Frontend typecheck + lint + vitest + build.
- Area-risk regression gate (targeted pytest + the
`check-area-risk-conservative-framing.sh` script).
## Hosted smoke
`backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py` hits a real
deployment's trust summary, search, and dashboard routes. It
intentionally fails when `is_queryable=false`; that failure is the
correct signal for an empty Space.
Run it via:
```
python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
--base-url "$BASE_URL" \
--min-structured-results 1
```
## Coverage expectations
No hard coverage threshold β€” instead, require that any new business
logic comes with a named test that would fail if the logic regressed.
Reviewers should push back on untested behavior change.
## What we do not test
- External network calls. Connectors are tested with recorded fixtures.
- OpenAI responses. AI service tests use a fake client and exercise the
deterministic grounded fallback.
- HF dataset uploads. Mocked via `HfApi` fakes.
- Neon-specific features in unit tests β€” integration coverage is at the
hosted smoke level.