# Testing Strategy

Philosophy: tests must encode the product's conservative framing. The
bar is higher for area-risk, trust summary, matching, and signal-family
classification than for plumbing.

## Test layout

```
backend/app/pots_shutdown_tracker/tests/
  fixtures/                         # HTML/PDF/TXT carrier samples and
                                    # JSON area-risk scenarios
  test_ai.py
  test_api.py                       # end-to-end request/response
  test_area_risk_grading.py         # NEW in P1-T1 / P3-T2
  test_bulk_lookup.py               # xlsx parser, worker, cleanup
  test_config.py
  test_connectors.py
  test_crawler.py
  test_embeddings.py
  test_extractors.py
  test_locks.py
  test_matching.py
  test_parsers.py
  test_policy.py
  test_review.py
  test_runtime_startup.py
  test_scheduler.py
  test_signal_links.py              # NEW in P2-T6
  test_smoke_hosted.py
  test_storage.py
```

Frontend tests live under `frontend/src/**/*.test.tsx` (Vitest +
Testing Library).

## Minimum local gate before PR

```
# backend
cd backend
ruff check app
ruff format --check app
POTS_TRACKER_DB_URL=sqlite:///:memory: \
POTS_TRACKER_AUTO_CREATE_SCHEMA=true \
POTS_TRACKER_ENABLE_AI=false \
OPENAI_API_KEY= \
pytest -q

# frontend
cd frontend
npm run typecheck
npm run lint
npm run test
npm run build
```

## Category-by-category requirements

### Area-risk (highest-risk logic)

Any change under `services/area_risk.py` or `services/coverage.py`
must extend `test_area_risk_grading.py`. Required invariants the tests
must encode:

1. **Past-target shutdowns are first-class evidence.** They grade red
   permanently, regardless of `is_active`, and there is no separate
   historical-context channel anymore.
2. **`area_at_risk=True` for non-green grades.** Green is the only
   false case.
3. **Grade mapping is stable:**
   - `past_effective_date` → red
   - `scheduled_within_12mo` → orange
   - `scheduled_beyond_12mo` / `undated_shutdown` → yellow
   - `mac_freeze_only` → blue
   - `no_evidence_found` → green
   - `insufficient_confidence` stays green when the evidence is too
     weak to classify confidently
4. **Structured city evidence beats text-city** when both are present.
5. **`parsed_state_fallback` is penalized** in `status_confidence`.
6. **Geography conflict never changes grade** but always emits a
   caveat.
7. **Airport sub-geographies** like `Miami Airport` can promote to a
   direct match for the parent city.
8. **Nearby-municipality evidence** uses the configured threshold (3 by
   default) and can nudge a direct green/blue result up to yellow, but
   never above yellow.
9. **Nearby evidence never downgrades** a direct grade.
10. **MAC-Freeze notices** grade blue on their own and remain
    follow-on context when shutdown evidence is present.
11. **State-level MAC-Freeze filings** grade city searches as blue with
    `status_confidence=low` when no shutdown evidence conflicts; the
    caveat must name the state.

### MAC-Freeze vs shutdown

Any change under `parsers/source_specific.py` that touches
`signal_family` assignment must add or update a test under
`test_parsers.py` covering:

- A pure shutdown notice stays `shutdown` even when prose contains
  weak MAC-Freeze keywords (`withdraw`, `grandfather`).
- A notice with explicit restriction tokens **and** availability /
  grandfather language becomes `att_mac_freeze`.
- A notice with one weak signal abstains (stays `shutdown`) and logs
  `classifier=att_mac_freeze_guard decision=abstain`.

### Trust gate

Any change under `services/trust.py` or the `require_queryable_corpus`
dependency must cover:

- Empty corpus → `is_queryable=false` → `/search`, `/area-risk`,
  `/match/address`, `/coverage` return 503.
- Degraded due to missing coverage metadata → 503 on gated routes.
- Healthy corpus → 200 on gated routes.
- `/trust-summary`, `/notices/*`, `/dashboard/*`, `/healthz`, `/readyz`
  stay available regardless.

### Bulk lookup

Any change under `services/bulk_lookup.py`, the `/bulk-lookup/*`
routes, or the frontend Bulk Lookup page must cover:

- Parser accepts case-insensitive city/state headers and the supported
  aliases (`Location City`, `Province`, `town`, `municipality`).
- Parser rejects non-xlsx uploads, missing required columns, and files
  over `bulk_lookup_max_rows`.
- State normalization accepts both two-letter abbreviations and full
  state names, and flags unrecognized states without failing valid rows.
- Partial-failure tolerance: valid rows still process when sibling rows
  are missing city/state or contain invalid states.
- Output xlsx has exactly the original columns plus `color`,
  `grade_letter`, `as_of`, and `notes`, and includes a `Summary` sheet
  with color counts, flagged count, top carriers, and metadata.
- Background jobs transition from queued to running to either completed
  or failed, using a fresh SQLAlchemy session inside the worker thread.
- Expired jobs are swept by both scheduler wiring and the
  `cleanup-bulk-lookup-jobs` CLI path, nulling blob references after
  deletion.
- All four `/bulk-lookup/*` endpoints remain protected by the same
  `require_queryable_corpus` trust gate used by Search and Area Risk.

### Admin auth

Changes touching any `/admin/*` endpoint must cover:

- Disabled (no key configured) → 503.
- Missing header → 401.
- Wrong key → 401 (constant-time comparison).
- Correct key → 2xx success path, and one audit row written.

### Parsing

Use fixture corpora — never assert on strings pulled from the live web.
Fixtures live under `tests/fixtures/`. Every fixture should carry a
short comment at the top of the file explaining which carrier / notice
family it represents and which invariant it's there to protect.

## Fixtures

### Adding an area-risk fixture

1. Add or update a scenario entry in
   `tests/fixtures/area_risk/regressions.json`:
   ```json
   [
     {
       "scenario_id": "historical_past_target_returns_red",
       "request": { "city": "Chicago", "state": "IL" },
       "seed_notices": [
         {
           "city": "Chicago",
           "state": "IL",
           "title": "Archived Chicago copper retirement notice",
           "summary": "Archived Chicago copper retirement notice.",
           "impact_text": "Chicago legacy analog service was retired in a prior filing.",
           "source_excerpt": "Chicago historical analog shutdown evidence.",
           "notice_id": "CHICAGO-ARCHIVED",
           "issue_date": "2024-01-15",
           "target_date": "2024-01-15",
           "is_active": false
         }
       ],
       "expected": {
         "grade_bucket": "red",
         "status": "past_effective_date",
         "area_at_risk": true,
         "earliest_past_target_date": "2024-01-15",
         "supporting_notice_count": 1
       }
     }
   ]
   ```
2. The parametrized loader in `test_area_risk_grading.py` reads the
   scenario list, seeds the in-memory test DB from `seed_notices`, and
   asserts the `expected` block. Use the same keys that
   `seed_notice(...)` accepts, plus optional `parsed_states`,
   `location_state`, and `signal_family` for geography conflicts and
   MAC-Freeze exclusions. The regression projection also exposes
   `supporting_notice_count`, `supporting_match_sources`,
   `supporting_geography_conflicts`, and `nearby_municipality_count`
   for compact assertions.
3. Update the
   [area-risk-city-qa-checklist.md](area-risk-city-qa-checklist.md) if
   the scenario should also be in the pre-release regression sweep.

### Adding a parser fixture

1. Place raw notice text under
   `tests/fixtures/<carrier>_<short_slug>.txt` (for HTML/PDF, serialize
   extracted text — the parser operates on text).
2. Prepend a one-line comment describing the fixture.
3. Add a `test_parsers.py` entry asserting the parsed
   `signal_family`, `notice_type`, `rule_family`,
   `restriction_types`, `states`, and any distinctive fields.

## CI

The CI workflow (`.github/workflows/ci.yml`) runs, per P3-T1:

- Backend ruff + pytest against Postgres service container.
- Frontend typecheck + lint + vitest + build.
- Area-risk regression gate (targeted pytest + the
  `check-area-risk-conservative-framing.sh` script).

## Hosted smoke

`backend/app/pots_shutdown_tracker/scripts/smoke_hosted.py` hits a real
deployment's trust summary, search, and dashboard routes. It
intentionally fails when `is_queryable=false`; that failure is the
correct signal for an empty Space.

Run it via:

```
python3 -m pots_shutdown_tracker.scripts.smoke_hosted \
  --base-url "$BASE_URL" \
  --min-structured-results 1
```

## Coverage expectations

No hard coverage threshold — instead, require that any new business
logic comes with a named test that would fail if the logic regressed.
Reviewers should push back on untested behavior change.

## What we do not test

- External network calls. Connectors are tested with recorded fixtures.
- OpenAI responses. AI service tests use a fake client and exercise the
  deterministic grounded fallback.
- HF dataset uploads. Mocked via `HfApi` fakes.
- Neon-specific features in unit tests — integration coverage is at the
  hosted smoke level.