| # Architecture |
|
|
| Snapshot of how the POTS Shutdown Tracker works end-to-end. Update this |
| file when behavior changes. |
|
|
| ## High-level flow |
|
|
| The core path is crawl β parse β normalize β classify β review β |
| search β coverage β area-risk β UI. The standalone Mermaid source lives |
| in [docs/diagrams/data-flow.mmd](diagrams/data-flow.mmd). |
|
|
| ```mermaid |
| flowchart LR |
| source["Public source pages / PDFs"] --> crawler["connectors/<carrier>.py"] |
| crawler --> raw["services/crawler.py<br/>fetch, dedupe, write RawNotice"] |
| raw --> parse["services/parser.py + parsers/source_specific.py<br/>parse_notice()"] |
| parse --> normalize["Normalize into RawNotice + NormalizedNotice graph"] |
| normalize --> classify["services/review.py + services/att_mac_freeze.py<br/>classify and auto-dispose"] |
| classify --> review["Review queue + admin actions"] |
| review --> trust["Trust summary / queryable corpus gate"] |
| trust --> search["services/search.py"] |
| search --> coverage["services/coverage.py"] |
| coverage --> area["services/area_risk.py"] |
| area --> ui["API + UI"] |
| normalize -. audit trail .-> audit["services/audit.py"] |
| normalize -. embeddings .-> embeddings["services/embeddings.py"] |
| ``` |
|
|
| ```mermaid |
| sequenceDiagram |
| actor User |
| participant UI as UI |
| participant API as api/router.py |
| participant Trust as services/trust.py |
| participant Search as services/search.py |
| |
| User->>UI: Enter a search |
| UI->>API: POST /search |
| API->>Trust: require_queryable_corpus() |
| Trust-->>API: is_queryable? |
| alt corpus queryable |
| API->>Search: execute search |
| Search-->>API: matches + supporting notices |
| API-->>UI: 200 results |
| else corpus not queryable |
| API-->>UI: 503 trust-gate refusal |
| end |
| ``` |
|
|
| ## Component responsibilities |
|
|
| ### Ingestion |
|
|
| - **Connectors** (`connectors/<source>.py`) own source-specific URL |
| discovery. They emit `DiscoveredDocument` objects with a URL, title |
| hints, and optional pre-fetched raw text or bytes. |
| - **Crawler** (`services/crawler.py`) fetches with |
| `services/connectors/common.py`, respects per-source rate budgets, |
| and writes `RawNotice` rows. Content-hash dedupe: identical bytes are |
| skipped but `last_seen_at` is advanced (see |
| [IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md) P2-T1). |
| - **Storage backend** (`services/storage.py`) is pluggable: |
| filesystem (local/dev) or Hugging Face Dataset (hosted). Dataset repo |
| must be private (enforced by P2-T2). |
|
|
| ### Parsing and normalization |
|
|
| - **Rule parser** (`parsers/rule_parser.py`) is the source of truth for |
| `notice_type`, `rule_family`, `restriction_types`, and default |
| `signal_family="shutdown"`. |
| - **Source-specific overrides** (`parsers/source_specific.py`) apply per |
| carrier. AT&T has the deepest overrides (DSA tracker, network |
| disclosure, tariff/MAC Freeze paths). Overrides run before the |
| generic parser and can be overridden back to `shutdown` only when the |
| MAC-Freeze guard's dual-signal rule is satisfied (see P1-T5). |
| - **Active window policy** (`utils/policy.py`) decides `is_active` from |
| `issue_date`, `revised_date`, `target_date`, `fetched_at`, and the |
| configured lookback months. Default lookback is 60 months. Shutdown |
| notices with `target_date <= today` stay active permanently; other |
| signal families still respect the post-target grace window from P2-T5. |
|
|
| ### User-facing APIs |
|
|
| - **Trust summary** (`services/trust.py`) reports corpus health, |
| `is_queryable`, covered carriers, recently announced and updated |
| notices. This is the single source of truth for whether the app |
| should answer user queries. |
| - **Search** (`services/search.py`) mixes structured filters and vector |
| ranking; respects `signal_family` (defaults to `shutdown`). |
| - **Coverage** (`services/coverage.py`) resolves city/state/ZIP against |
| structured `ImpactedAddress` and `NoticeLocation`, then falls back to |
| text-city geography with address/contact suppression heuristics. |
| Priority: `impacted_address (4) > notice_location (3) > |
| text_city_geography (2) > structured_state (1) > parsed_state_fallback (1)`. |
| - **Area risk** (`services/area_risk.py`) builds a grade bubble per |
| area using a five-tier model (`green/yellow/blue/orange/red`), plus a |
| per-carrier breakdown, caveats, and nearby-municipality context. |
| Past-target shutdown notices are permanent red evidence; MAC Freeze |
| alone is blue; scheduled shutdowns split at the 12-month boundary; |
| nearby municipalities can nudge a weak direct result up to yellow but |
| never above it. Historical context is now folded into the main |
| supporting-notices list instead of a separate channel. |
| State-level MAC Freeze tariff filings surface as blue for a city |
| search when no shutdown evidence conflicts. Confidence is `low` and |
| the caveat explicitly attributes the signal to the state-wide tariff |
| rather than a city-level filing. |
| - **Matching** (`services/matching.py`) returns precise matches |
| (exact_address, fuzzy_address, wire_center, CLLI) and, post P1-T6, a |
| separate `area_awareness` list of state-level hits with no risk |
| level. |
| - **MAC Freeze assessment** (`services/att_mac_freeze.py`) is a |
| dedicated endpoint when `signal_family` is `att_mac_freeze` or `all`. |
| It never reaches shutdown grade bubbles. |
| - **Signal links** (`services/signal_links.py`) crosses MAC Freeze to |
| shutdown only by geography + carrier-root match. |
| - **Bulk lookup** (`services/bulk_lookup.py`) accepts xlsx uploads at |
| `/bulk-lookup/jobs`, validates city/state rows, queues background |
| processing through a `ThreadPoolExecutor` on `app.state`, calls |
| `assess_area_risk` per valid row, and writes an enriched xlsx with a |
| `Summary` sheet back to the shared storage backend. The job list, |
| detail, and download endpoints are trust-gated with |
| `require_queryable_corpus`; uploads are retained for 7 days by |
| default and swept by the scheduler/CLI cleanup path. |
|
|
| ### Admin workflows |
|
|
| - **Review queue** surfaces low-confidence or structurally incomplete |
| notices. Auto-disposition is opt-in and dry-run by default. |
| - **Reparse / reapply-active-window / sanitize-notice-dates / |
| rebuild-embeddings** are triggerable via `/admin/*` or CLI. All admin |
| mutations require the `X-Admin-Key` header (P1-T3) and write to |
| `AdminAuditEvent` (P3-T5). |
|
|
| ## Frontend responsibilities |
|
|
| - **Audience split** is Vite-time (`VITE_POTS_TRACKER_AUDIENCE`). |
| Consumer sees Search + trust gate. Internal sees Search, Dashboard, |
| Match, Admin, Notice Detail. The audience split is UI-only; server |
| auth is the real guardrail for admin. |
| - **Trust gate** blocks Search and (post P1-T4) Match when |
| `is_queryable=false`. Bulk lookup uses the same trust gate for upload, |
| job list/detail, and download. |
| - **Area-risk UI** clearly labels evidence scope (city vs state), match |
| source (impacted address vs text geography), and caveats. |
| - **MAC Freeze** renders in its own panel with explicit "not by itself a |
| shutdown" copy. |
| - **Bulk lookup UI** exposes `/bulk-lookup` for both internal and |
| consumer audiences. It uploads xlsx files, polls queued/running jobs, |
| renders color counts and top carriers, and links to the enriched |
| workbook download when complete. |
|
|
| ### Brand and visual system |
|
|
| - The SPA uses the Masters Telecom palette: Navy `#1A2B62`, |
| Adriatic Blue `#4194BC`, Light Blue `#A0C6DB`, Grey `#808085`, |
| and White `#FFFFFF`. |
| - Brand colors are chrome only: header, links, primary buttons, subtle |
| info surfaces, and borders. |
| - Tier colors stay semantic and visually distinct from the brand blues: |
| Red `#B42318`, Orange `#DC6803`, Yellow `#CA8504`, Violet |
| `#6B5CA5`, and Green `#079455`. |
| - The backend `grade_bucket="blue"` contract is unchanged, but the UI |
| renders that tier as Violet so MAC-Freeze does not read as generic |
| info next to Navy/Adriatic chrome. |
| - User-facing blue-tier copy keeps the label "Future Shutdown Notice |
| Anticipated"; only color-word references say Violet. |
|
|
| ## Hosted deployment shape |
|
|
| - Docker image (see `Dockerfile`) builds the frontend, then copies the |
| dist into a Python runtime that serves both the SPA and the API on |
| port 7860. |
| - Hugging Face Space executes the image. Optional storage backend is a |
| private HF Dataset repo. |
| - Neon Postgres holds the corpus. `vector` extension required. |
| - APScheduler runs weekly. In multi-instance deployments the scheduler |
| acquires a PG advisory lock before firing (P2-T3). |
| - APScheduler also runs daily bulk-lookup retention cleanup at 03:00 |
| with `misfire_grace_time=300` and `coalesce=True`. |
|
|
| ## Bulk lookup |
|
|
| Bulk lookup is a trust-gated user-facing workflow for operators who need |
| to enrich a spreadsheet of locations instead of checking one area at a |
| time. |
|
|
| - `POST /bulk-lookup/jobs` accepts one `.xlsx` file with city/state |
| columns, validates headers and row limits synchronously, stores the |
| input blob, creates a `BulkLookupJob`, and queues background |
| processing. |
| - `GET /bulk-lookup/jobs` lists the 20 most recent jobs. |
| - `GET /bulk-lookup/jobs/{id}` returns status, row counts, color counts, |
| top carriers, and download availability. |
| - `GET /bulk-lookup/jobs/{id}/download` streams the completed enriched |
| workbook and returns `409` until processing is complete or `410` after |
| expiry. |
|
|
| Processing uses `ThreadPoolExecutor(max_workers=settings.bulk_lookup_concurrent_workers)` |
| on `app.state`. Each worker opens its own SQLAlchemy session, reads the |
| input from the configured storage backend, calls `assess_area_risk` for |
| each valid row, flags invalid rows without failing the whole job, and |
| writes a Results sheet plus Summary sheet. The executor is shut down in |
| the FastAPI lifespan `finally` block. |
|
|
| Retention defaults to 7 days. The scheduler cleanup job and |
| `python app/cli.py cleanup-bulk-lookup-jobs` both call the same cleanup |
| service to expire rows and delete stored input/output blobs. |
|
|
| ## Active window |
|
|
| A notice is `is_active=True` when: |
|
|
| - Its latest of `issue_date`, `revised_date`, `target_date`, or |
| `fetched_at` falls within `POTS_TRACKER_LOOKBACK_MONTHS` (default 60). |
| - Non-shutdown notices still respect the post-target grace window from |
| P2-T5: `target_date is None` or |
| `target_date + active_window_post_target_grace_days >= today`. |
| - Shutdown notices with `target_date <= today` remain active |
| permanently so they stay available as historical red evidence for |
| area-risk grading. |
|
|
| Archived notices can still exist in the corpus, but they no longer flow |
| through a separate historical-context channel. |
|
|
| ## Signal families |
|
|
| | Value | Meaning | |
| |------------------|-------------------------------------------------| |
| | `shutdown` | Analog / POTS / copper retirement, discontinuance, switch decommissioning, TDMβIP transition, network disclosure. Default. | |
| | `att_mac_freeze` | AT&T commercial availability restriction (no new orders / moves / adds / changes, grandfathering). Not a shutdown by itself. | |
|
|
| Queries default to `shutdown`. The UI never cross-mingles the two. |
|
|
| ## Storage model |
|
|
| See `backend/app/pots_shutdown_tracker/models/entities.py` for the full |
| schema. Key relationships: |
|
|
| - `RawNotice 1 ββ 1 NormalizedNotice` |
| - `NormalizedNotice 1 ββ N NoticeLocation` |
| - `NormalizedNotice 1 ββ N ImpactedAddress` |
| - `NormalizedNotice 1 ββ N NoticeChunk` |
| - `NormalizedNotice 1 ββ N NoticeAuditEvent` |
| - `CustomerMatch` is a persisted record of matching results. |
| - `AIJob` records every AI prompt/response for audit. |
| - `BulkLookupJob` tracks uploaded xlsx jobs, blob references, row |
| counts, color counts, top carriers, and retention expiry. |
|
|
| ## See also |
|
|
| - [docs/ONBOARDING.md](ONBOARDING.md) β first-day setup. |
| - [docs/TESTING.md](TESTING.md) β test strategy and fixtures. |
| - [docs/SECURITY.md](SECURITY.md) β threat model and auth posture. |
| - [docs/diagrams/data-flow.mmd](diagrams/data-flow.mmd) β standalone Mermaid source for the architecture diagrams. |
| - [docs/RUNBOOK.md](RUNBOOK.md) β operational playbook (create per P4-T2). |
| - [IMPROVEMENT_PLAN.md](../IMPROVEMENT_PLAN.md) β prioritized work list. |
|
|