# Architecture Snapshot of how the POTS Shutdown Tracker works end-to-end. Update this file when behavior changes. ## High-level flow The core path is crawl → parse → normalize → classify → review → search → coverage → area-risk → UI. The standalone Mermaid source lives in [docs/diagrams/data-flow.mmd](diagrams/data-flow.mmd). ```mermaid flowchart LR source["Public source pages / PDFs"] --> crawler["connectors/<carrier>.py"] crawler --> raw["services/crawler.py
fetch, dedupe, write RawNotice"] raw --> parse["services/parser.py + parsers/source_specific.py
parse_notice()"] parse --> normalize["Normalize into RawNotice + NormalizedNotice graph"] normalize --> classify["services/review.py + services/att_mac_freeze.py
classify and auto-dispose"] classify --> review["Review queue + admin actions"] review --> trust["Trust summary / queryable corpus gate"] trust --> search["services/search.py"] search --> coverage["services/coverage.py"] coverage --> area["services/area_risk.py"] area --> ui["API + UI"] normalize -. audit trail .-> audit["services/audit.py"] normalize -. embeddings .-> embeddings["services/embeddings.py"] ``` ```mermaid sequenceDiagram actor User participant UI as UI participant API as api/router.py participant Trust as services/trust.py participant Search as services/search.py User->>UI: Enter a search UI->>API: POST /search API->>Trust: require_queryable_corpus() Trust-->>API: is_queryable? alt corpus queryable API->>Search: execute search Search-->>API: matches + supporting notices API-->>UI: 200 results else corpus not queryable API-->>UI: 503 trust-gate refusal end ``` ## Component responsibilities ### Ingestion - **Connectors** (`connectors/.py`) own source-specific URL discovery. They emit `DiscoveredDocument` objects with a URL, title hints, and optional pre-fetched raw text or bytes. - **Crawler** (`services/crawler.py`) fetches with `services/connectors/common.py`, respects per-source rate budgets, and writes `RawNotice` rows. Content-hash dedupe: identical bytes are skipped but `last_seen_at` is advanced (see [IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md) P2-T1). - **Storage backend** (`services/storage.py`) is pluggable: filesystem (local/dev) or Hugging Face Dataset (hosted). Dataset repo must be private (enforced by P2-T2). ### Parsing and normalization - **Rule parser** (`parsers/rule_parser.py`) is the source of truth for `notice_type`, `rule_family`, `restriction_types`, and default `signal_family="shutdown"`. - **Source-specific overrides** (`parsers/source_specific.py`) apply per carrier. AT&T has the deepest overrides (DSA tracker, network disclosure, tariff/MAC Freeze paths). Overrides run before the generic parser and can be overridden back to `shutdown` only when the MAC-Freeze guard's dual-signal rule is satisfied (see P1-T5). - **Active window policy** (`utils/policy.py`) decides `is_active` from `issue_date`, `revised_date`, `target_date`, `fetched_at`, and the configured lookback months. Default lookback is 60 months. Shutdown notices with `target_date <= today` stay active permanently; other signal families still respect the post-target grace window from P2-T5. ### User-facing APIs - **Trust summary** (`services/trust.py`) reports corpus health, `is_queryable`, covered carriers, recently announced and updated notices. This is the single source of truth for whether the app should answer user queries. - **Search** (`services/search.py`) mixes structured filters and vector ranking; respects `signal_family` (defaults to `shutdown`). - **Coverage** (`services/coverage.py`) resolves city/state/ZIP against structured `ImpactedAddress` and `NoticeLocation`, then falls back to text-city geography with address/contact suppression heuristics. Priority: `impacted_address (4) > notice_location (3) > text_city_geography (2) > structured_state (1) > parsed_state_fallback (1)`. - **Area risk** (`services/area_risk.py`) builds a grade bubble per area using a five-tier model (`green/yellow/blue/orange/red`), plus a per-carrier breakdown, caveats, and nearby-municipality context. Past-target shutdown notices are permanent red evidence; MAC Freeze alone is blue; scheduled shutdowns split at the 12-month boundary; nearby municipalities can nudge a weak direct result up to yellow but never above it. Historical context is now folded into the main supporting-notices list instead of a separate channel. State-level MAC Freeze tariff filings surface as blue for a city search when no shutdown evidence conflicts. Confidence is `low` and the caveat explicitly attributes the signal to the state-wide tariff rather than a city-level filing. - **Matching** (`services/matching.py`) returns precise matches (exact_address, fuzzy_address, wire_center, CLLI) and, post P1-T6, a separate `area_awareness` list of state-level hits with no risk level. - **MAC Freeze assessment** (`services/att_mac_freeze.py`) is a dedicated endpoint when `signal_family` is `att_mac_freeze` or `all`. It never reaches shutdown grade bubbles. - **Signal links** (`services/signal_links.py`) crosses MAC Freeze to shutdown only by geography + carrier-root match. - **Bulk lookup** (`services/bulk_lookup.py`) accepts xlsx uploads at `/bulk-lookup/jobs`, validates city/state rows, queues background processing through a `ThreadPoolExecutor` on `app.state`, calls `assess_area_risk` per valid row, and writes an enriched xlsx with a `Summary` sheet back to the shared storage backend. The job list, detail, and download endpoints are trust-gated with `require_queryable_corpus`; uploads are retained for 7 days by default and swept by the scheduler/CLI cleanup path. ### Admin workflows - **Review queue** surfaces low-confidence or structurally incomplete notices. Auto-disposition is opt-in and dry-run by default. - **Reparse / reapply-active-window / sanitize-notice-dates / rebuild-embeddings** are triggerable via `/admin/*` or CLI. All admin mutations require the `X-Admin-Key` header (P1-T3) and write to `AdminAuditEvent` (P3-T5). ## Frontend responsibilities - **Audience split** is Vite-time (`VITE_POTS_TRACKER_AUDIENCE`). Consumer sees Search + trust gate. Internal sees Search, Dashboard, Match, Admin, Notice Detail. The audience split is UI-only; server auth is the real guardrail for admin. - **Trust gate** blocks Search and (post P1-T4) Match when `is_queryable=false`. Bulk lookup uses the same trust gate for upload, job list/detail, and download. - **Area-risk UI** clearly labels evidence scope (city vs state), match source (impacted address vs text geography), and caveats. - **MAC Freeze** renders in its own panel with explicit "not by itself a shutdown" copy. - **Bulk lookup UI** exposes `/bulk-lookup` for both internal and consumer audiences. It uploads xlsx files, polls queued/running jobs, renders color counts and top carriers, and links to the enriched workbook download when complete. ### Brand and visual system - The SPA uses the Masters Telecom palette: Navy `#1A2B62`, Adriatic Blue `#4194BC`, Light Blue `#A0C6DB`, Grey `#808085`, and White `#FFFFFF`. - Brand colors are chrome only: header, links, primary buttons, subtle info surfaces, and borders. - Tier colors stay semantic and visually distinct from the brand blues: Red `#B42318`, Orange `#DC6803`, Yellow `#CA8504`, Violet `#6B5CA5`, and Green `#079455`. - The backend `grade_bucket="blue"` contract is unchanged, but the UI renders that tier as Violet so MAC-Freeze does not read as generic info next to Navy/Adriatic chrome. - User-facing blue-tier copy keeps the label "Future Shutdown Notice Anticipated"; only color-word references say Violet. ## Hosted deployment shape - Docker image (see `Dockerfile`) builds the frontend, then copies the dist into a Python runtime that serves both the SPA and the API on port 7860. - Hugging Face Space executes the image. Optional storage backend is a private HF Dataset repo. - Neon Postgres holds the corpus. `vector` extension required. - APScheduler runs weekly. In multi-instance deployments the scheduler acquires a PG advisory lock before firing (P2-T3). - APScheduler also runs daily bulk-lookup retention cleanup at 03:00 with `misfire_grace_time=300` and `coalesce=True`. ## Bulk lookup Bulk lookup is a trust-gated user-facing workflow for operators who need to enrich a spreadsheet of locations instead of checking one area at a time. - `POST /bulk-lookup/jobs` accepts one `.xlsx` file with city/state columns, validates headers and row limits synchronously, stores the input blob, creates a `BulkLookupJob`, and queues background processing. - `GET /bulk-lookup/jobs` lists the 20 most recent jobs. - `GET /bulk-lookup/jobs/{id}` returns status, row counts, color counts, top carriers, and download availability. - `GET /bulk-lookup/jobs/{id}/download` streams the completed enriched workbook and returns `409` until processing is complete or `410` after expiry. Processing uses `ThreadPoolExecutor(max_workers=settings.bulk_lookup_concurrent_workers)` on `app.state`. Each worker opens its own SQLAlchemy session, reads the input from the configured storage backend, calls `assess_area_risk` for each valid row, flags invalid rows without failing the whole job, and writes a Results sheet plus Summary sheet. The executor is shut down in the FastAPI lifespan `finally` block. Retention defaults to 7 days. The scheduler cleanup job and `python app/cli.py cleanup-bulk-lookup-jobs` both call the same cleanup service to expire rows and delete stored input/output blobs. ## Active window A notice is `is_active=True` when: - Its latest of `issue_date`, `revised_date`, `target_date`, or `fetched_at` falls within `POTS_TRACKER_LOOKBACK_MONTHS` (default 60). - Non-shutdown notices still respect the post-target grace window from P2-T5: `target_date is None` or `target_date + active_window_post_target_grace_days >= today`. - Shutdown notices with `target_date <= today` remain active permanently so they stay available as historical red evidence for area-risk grading. Archived notices can still exist in the corpus, but they no longer flow through a separate historical-context channel. ## Signal families | Value | Meaning | |------------------|-------------------------------------------------| | `shutdown` | Analog / POTS / copper retirement, discontinuance, switch decommissioning, TDM→IP transition, network disclosure. Default. | | `att_mac_freeze` | AT&T commercial availability restriction (no new orders / moves / adds / changes, grandfathering). Not a shutdown by itself. | Queries default to `shutdown`. The UI never cross-mingles the two. ## Storage model See `backend/app/pots_shutdown_tracker/models/entities.py` for the full schema. Key relationships: - `RawNotice 1 ←→ 1 NormalizedNotice` - `NormalizedNotice 1 ←→ N NoticeLocation` - `NormalizedNotice 1 ←→ N ImpactedAddress` - `NormalizedNotice 1 ←→ N NoticeChunk` - `NormalizedNotice 1 ←→ N NoticeAuditEvent` - `CustomerMatch` is a persisted record of matching results. - `AIJob` records every AI prompt/response for audit. - `BulkLookupJob` tracks uploaded xlsx jobs, blob references, row counts, color counts, top carriers, and retention expiry. ## See also - [docs/ONBOARDING.md](ONBOARDING.md) — first-day setup. - [docs/TESTING.md](TESTING.md) — test strategy and fixtures. - [docs/SECURITY.md](SECURITY.md) — threat model and auth posture. - [docs/diagrams/data-flow.mmd](diagrams/data-flow.mmd) — standalone Mermaid source for the architecture diagrams. - [docs/RUNBOOK.md](RUNBOOK.md) — operational playbook (create per P4-T2). - [IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md) — prioritized work list.