Architecture
Snapshot of how the POTS Shutdown Tracker works end-to-end. Update this file when behavior changes.
High-level flow
The core path is crawl β parse β normalize β classify β review β search β coverage β area-risk β UI. The standalone Mermaid source lives in docs/diagrams/data-flow.mmd.
flowchart LR
source["Public source pages / PDFs"] --> crawler["connectors/<carrier>.py"]
crawler --> raw["services/crawler.py<br/>fetch, dedupe, write RawNotice"]
raw --> parse["services/parser.py + parsers/source_specific.py<br/>parse_notice()"]
parse --> normalize["Normalize into RawNotice + NormalizedNotice graph"]
normalize --> classify["services/review.py + services/att_mac_freeze.py<br/>classify and auto-dispose"]
classify --> review["Review queue + admin actions"]
review --> trust["Trust summary / queryable corpus gate"]
trust --> search["services/search.py"]
search --> coverage["services/coverage.py"]
coverage --> area["services/area_risk.py"]
area --> ui["API + UI"]
normalize -. audit trail .-> audit["services/audit.py"]
normalize -. embeddings .-> embeddings["services/embeddings.py"]
sequenceDiagram
actor User
participant UI as UI
participant API as api/router.py
participant Trust as services/trust.py
participant Search as services/search.py
User->>UI: Enter a search
UI->>API: POST /search
API->>Trust: require_queryable_corpus()
Trust-->>API: is_queryable?
alt corpus queryable
API->>Search: execute search
Search-->>API: matches + supporting notices
API-->>UI: 200 results
else corpus not queryable
API-->>UI: 503 trust-gate refusal
end
Component responsibilities
Ingestion
- Connectors (
connectors/<source>.py) own source-specific URL discovery. They emitDiscoveredDocumentobjects with a URL, title hints, and optional pre-fetched raw text or bytes. - Crawler (
services/crawler.py) fetches withservices/connectors/common.py, respects per-source rate budgets, and writesRawNoticerows. Content-hash dedupe: identical bytes are skipped butlast_seen_atis advanced (see IMPROVEMENT_PLAN.md P2-T1). - Storage backend (
services/storage.py) is pluggable: filesystem (local/dev) or Hugging Face Dataset (hosted). Dataset repo must be private (enforced by P2-T2).
Parsing and normalization
- Rule parser (
parsers/rule_parser.py) is the source of truth fornotice_type,rule_family,restriction_types, and defaultsignal_family="shutdown". - Source-specific overrides (
parsers/source_specific.py) apply per carrier. AT&T has the deepest overrides (DSA tracker, network disclosure, tariff/MAC Freeze paths). Overrides run before the generic parser and can be overridden back toshutdownonly when the MAC-Freeze guard's dual-signal rule is satisfied (see P1-T5). - Active window policy (
utils/policy.py) decidesis_activefromissue_date,revised_date,target_date,fetched_at, and the configured lookback months. Default lookback is 60 months. Shutdown notices withtarget_date <= todaystay active permanently; other signal families still respect the post-target grace window from P2-T5.
User-facing APIs
- Trust summary (
services/trust.py) reports corpus health,is_queryable, covered carriers, recently announced and updated notices. This is the single source of truth for whether the app should answer user queries. - Search (
services/search.py) mixes structured filters and vector ranking; respectssignal_family(defaults toshutdown). - Coverage (
services/coverage.py) resolves city/state/ZIP against structuredImpactedAddressandNoticeLocation, then falls back to text-city geography with address/contact suppression heuristics. Priority:impacted_address (4) > notice_location (3) > text_city_geography (2) > structured_state (1) > parsed_state_fallback (1). - Area risk (
services/area_risk.py) builds a grade bubble per area using a five-tier model (green/yellow/blue/orange/red), plus a per-carrier breakdown, caveats, and nearby-municipality context. Past-target shutdown notices are permanent red evidence; MAC Freeze alone is blue; scheduled shutdowns split at the 12-month boundary; nearby municipalities can nudge a weak direct result up to yellow but never above it. Historical context is now folded into the main supporting-notices list instead of a separate channel. State-level MAC Freeze tariff filings surface as blue for a city search when no shutdown evidence conflicts. Confidence islowand the caveat explicitly attributes the signal to the state-wide tariff rather than a city-level filing. - Matching (
services/matching.py) returns precise matches (exact_address, fuzzy_address, wire_center, CLLI) and, post P1-T6, a separatearea_awarenesslist of state-level hits with no risk level. - MAC Freeze assessment (
services/att_mac_freeze.py) is a dedicated endpoint whensignal_familyisatt_mac_freezeorall. It never reaches shutdown grade bubbles. - Signal links (
services/signal_links.py) crosses MAC Freeze to shutdown only by geography + carrier-root match. - Bulk lookup (
services/bulk_lookup.py) accepts xlsx uploads at/bulk-lookup/jobs, validates city/state rows, queues background processing through aThreadPoolExecutoronapp.state, callsassess_area_riskper valid row, and writes an enriched xlsx with aSummarysheet back to the shared storage backend. The job list, detail, and download endpoints are trust-gated withrequire_queryable_corpus; uploads are retained for 7 days by default and swept by the scheduler/CLI cleanup path.
Admin workflows
- Review queue surfaces low-confidence or structurally incomplete notices. Auto-disposition is opt-in and dry-run by default.
- Reparse / reapply-active-window / sanitize-notice-dates /
rebuild-embeddings are triggerable via
/admin/*or CLI. All admin mutations require theX-Admin-Keyheader (P1-T3) and write toAdminAuditEvent(P3-T5).
Frontend responsibilities
- Audience split is Vite-time (
VITE_POTS_TRACKER_AUDIENCE). Consumer sees Search + trust gate. Internal sees Search, Dashboard, Match, Admin, Notice Detail. The audience split is UI-only; server auth is the real guardrail for admin. - Trust gate blocks Search and (post P1-T4) Match when
is_queryable=false. Bulk lookup uses the same trust gate for upload, job list/detail, and download. - Area-risk UI clearly labels evidence scope (city vs state), match source (impacted address vs text geography), and caveats.
- MAC Freeze renders in its own panel with explicit "not by itself a shutdown" copy.
- Bulk lookup UI exposes
/bulk-lookupfor both internal and consumer audiences. It uploads xlsx files, polls queued/running jobs, renders color counts and top carriers, and links to the enriched workbook download when complete.
Brand and visual system
- The SPA uses the Masters Telecom palette: Navy
#1A2B62, Adriatic Blue#4194BC, Light Blue#A0C6DB, Grey#808085, and White#FFFFFF. - Brand colors are chrome only: header, links, primary buttons, subtle info surfaces, and borders.
- Tier colors stay semantic and visually distinct from the brand blues:
Red
#B42318, Orange#DC6803, Yellow#CA8504, Violet#6B5CA5, and Green#079455. - The backend
grade_bucket="blue"contract is unchanged, but the UI renders that tier as Violet so MAC-Freeze does not read as generic info next to Navy/Adriatic chrome. - User-facing blue-tier copy keeps the label "Future Shutdown Notice Anticipated"; only color-word references say Violet.
Hosted deployment shape
- Docker image (see
Dockerfile) builds the frontend, then copies the dist into a Python runtime that serves both the SPA and the API on port 7860. - Hugging Face Space executes the image. Optional storage backend is a private HF Dataset repo.
- Neon Postgres holds the corpus.
vectorextension required. - APScheduler runs weekly. In multi-instance deployments the scheduler acquires a PG advisory lock before firing (P2-T3).
- APScheduler also runs daily bulk-lookup retention cleanup at 03:00
with
misfire_grace_time=300andcoalesce=True.
Bulk lookup
Bulk lookup is a trust-gated user-facing workflow for operators who need to enrich a spreadsheet of locations instead of checking one area at a time.
POST /bulk-lookup/jobsaccepts one.xlsxfile with city/state columns, validates headers and row limits synchronously, stores the input blob, creates aBulkLookupJob, and queues background processing.GET /bulk-lookup/jobslists the 20 most recent jobs.GET /bulk-lookup/jobs/{id}returns status, row counts, color counts, top carriers, and download availability.GET /bulk-lookup/jobs/{id}/downloadstreams the completed enriched workbook and returns409until processing is complete or410after expiry.
Processing uses ThreadPoolExecutor(max_workers=settings.bulk_lookup_concurrent_workers)
on app.state. Each worker opens its own SQLAlchemy session, reads the
input from the configured storage backend, calls assess_area_risk for
each valid row, flags invalid rows without failing the whole job, and
writes a Results sheet plus Summary sheet. The executor is shut down in
the FastAPI lifespan finally block.
Retention defaults to 7 days. The scheduler cleanup job and
python app/cli.py cleanup-bulk-lookup-jobs both call the same cleanup
service to expire rows and delete stored input/output blobs.
Active window
A notice is is_active=True when:
- Its latest of
issue_date,revised_date,target_date, orfetched_atfalls withinPOTS_TRACKER_LOOKBACK_MONTHS(default 60). - Non-shutdown notices still respect the post-target grace window from
P2-T5:
target_date is Noneortarget_date + active_window_post_target_grace_days >= today. - Shutdown notices with
target_date <= todayremain active permanently so they stay available as historical red evidence for area-risk grading.
Archived notices can still exist in the corpus, but they no longer flow through a separate historical-context channel.
Signal families
| Value | Meaning |
|---|---|
shutdown |
Analog / POTS / copper retirement, discontinuance, switch decommissioning, TDMβIP transition, network disclosure. Default. |
att_mac_freeze |
AT&T commercial availability restriction (no new orders / moves / adds / changes, grandfathering). Not a shutdown by itself. |
Queries default to shutdown. The UI never cross-mingles the two.
Storage model
See backend/app/pots_shutdown_tracker/models/entities.py for the full
schema. Key relationships:
RawNotice 1 ββ 1 NormalizedNoticeNormalizedNotice 1 ββ N NoticeLocationNormalizedNotice 1 ββ N ImpactedAddressNormalizedNotice 1 ββ N NoticeChunkNormalizedNotice 1 ββ N NoticeAuditEventCustomerMatchis a persisted record of matching results.AIJobrecords every AI prompt/response for audit.BulkLookupJobtracks uploaded xlsx jobs, blob references, row counts, color counts, top carriers, and retention expiry.
See also
- docs/ONBOARDING.md β first-day setup.
- docs/TESTING.md β test strategy and fixtures.
- docs/SECURITY.md β threat model and auth posture.
- docs/diagrams/data-flow.mmd β standalone Mermaid source for the architecture diagrams.
- docs/RUNBOOK.md β operational playbook (create per P4-T2).
- IMPROVEMENT_PLAN.md β prioritized work list.