pots-shutdown-tracker / docs /ARCHITECTURE.md
github-actions
Deploy e9638c4ddc3ed29a18779b38f43922aa3139b311
611bfd9

Architecture

Snapshot of how the POTS Shutdown Tracker works end-to-end. Update this file when behavior changes.

High-level flow

The core path is crawl β†’ parse β†’ normalize β†’ classify β†’ review β†’ search β†’ coverage β†’ area-risk β†’ UI. The standalone Mermaid source lives in docs/diagrams/data-flow.mmd.

flowchart LR
    source["Public source pages / PDFs"] --> crawler["connectors/<carrier>.py"]
    crawler --> raw["services/crawler.py<br/>fetch, dedupe, write RawNotice"]
    raw --> parse["services/parser.py + parsers/source_specific.py<br/>parse_notice()"]
    parse --> normalize["Normalize into RawNotice + NormalizedNotice graph"]
    normalize --> classify["services/review.py + services/att_mac_freeze.py<br/>classify and auto-dispose"]
    classify --> review["Review queue + admin actions"]
    review --> trust["Trust summary / queryable corpus gate"]
    trust --> search["services/search.py"]
    search --> coverage["services/coverage.py"]
    coverage --> area["services/area_risk.py"]
    area --> ui["API + UI"]
    normalize -. audit trail .-> audit["services/audit.py"]
    normalize -. embeddings .-> embeddings["services/embeddings.py"]
sequenceDiagram
    actor User
    participant UI as UI
    participant API as api/router.py
    participant Trust as services/trust.py
    participant Search as services/search.py

    User->>UI: Enter a search
    UI->>API: POST /search
    API->>Trust: require_queryable_corpus()
    Trust-->>API: is_queryable?
    alt corpus queryable
        API->>Search: execute search
        Search-->>API: matches + supporting notices
        API-->>UI: 200 results
    else corpus not queryable
        API-->>UI: 503 trust-gate refusal
    end

Component responsibilities

Ingestion

  • Connectors (connectors/<source>.py) own source-specific URL discovery. They emit DiscoveredDocument objects with a URL, title hints, and optional pre-fetched raw text or bytes.
  • Crawler (services/crawler.py) fetches with services/connectors/common.py, respects per-source rate budgets, and writes RawNotice rows. Content-hash dedupe: identical bytes are skipped but last_seen_at is advanced (see IMPROVEMENT_PLAN.md P2-T1).
  • Storage backend (services/storage.py) is pluggable: filesystem (local/dev) or Hugging Face Dataset (hosted). Dataset repo must be private (enforced by P2-T2).

Parsing and normalization

  • Rule parser (parsers/rule_parser.py) is the source of truth for notice_type, rule_family, restriction_types, and default signal_family="shutdown".
  • Source-specific overrides (parsers/source_specific.py) apply per carrier. AT&T has the deepest overrides (DSA tracker, network disclosure, tariff/MAC Freeze paths). Overrides run before the generic parser and can be overridden back to shutdown only when the MAC-Freeze guard's dual-signal rule is satisfied (see P1-T5).
  • Active window policy (utils/policy.py) decides is_active from issue_date, revised_date, target_date, fetched_at, and the configured lookback months. Default lookback is 60 months. Shutdown notices with target_date <= today stay active permanently; other signal families still respect the post-target grace window from P2-T5.

User-facing APIs

  • Trust summary (services/trust.py) reports corpus health, is_queryable, covered carriers, recently announced and updated notices. This is the single source of truth for whether the app should answer user queries.
  • Search (services/search.py) mixes structured filters and vector ranking; respects signal_family (defaults to shutdown).
  • Coverage (services/coverage.py) resolves city/state/ZIP against structured ImpactedAddress and NoticeLocation, then falls back to text-city geography with address/contact suppression heuristics. Priority: impacted_address (4) > notice_location (3) > text_city_geography (2) > structured_state (1) > parsed_state_fallback (1).
  • Area risk (services/area_risk.py) builds a grade bubble per area using a five-tier model (green/yellow/blue/orange/red), plus a per-carrier breakdown, caveats, and nearby-municipality context. Past-target shutdown notices are permanent red evidence; MAC Freeze alone is blue; scheduled shutdowns split at the 12-month boundary; nearby municipalities can nudge a weak direct result up to yellow but never above it. Historical context is now folded into the main supporting-notices list instead of a separate channel. State-level MAC Freeze tariff filings surface as blue for a city search when no shutdown evidence conflicts. Confidence is low and the caveat explicitly attributes the signal to the state-wide tariff rather than a city-level filing.
  • Matching (services/matching.py) returns precise matches (exact_address, fuzzy_address, wire_center, CLLI) and, post P1-T6, a separate area_awareness list of state-level hits with no risk level.
  • MAC Freeze assessment (services/att_mac_freeze.py) is a dedicated endpoint when signal_family is att_mac_freeze or all. It never reaches shutdown grade bubbles.
  • Signal links (services/signal_links.py) crosses MAC Freeze to shutdown only by geography + carrier-root match.
  • Bulk lookup (services/bulk_lookup.py) accepts xlsx uploads at /bulk-lookup/jobs, validates city/state rows, queues background processing through a ThreadPoolExecutor on app.state, calls assess_area_risk per valid row, and writes an enriched xlsx with a Summary sheet back to the shared storage backend. The job list, detail, and download endpoints are trust-gated with require_queryable_corpus; uploads are retained for 7 days by default and swept by the scheduler/CLI cleanup path.

Admin workflows

  • Review queue surfaces low-confidence or structurally incomplete notices. Auto-disposition is opt-in and dry-run by default.
  • Reparse / reapply-active-window / sanitize-notice-dates / rebuild-embeddings are triggerable via /admin/* or CLI. All admin mutations require the X-Admin-Key header (P1-T3) and write to AdminAuditEvent (P3-T5).

Frontend responsibilities

  • Audience split is Vite-time (VITE_POTS_TRACKER_AUDIENCE). Consumer sees Search + trust gate. Internal sees Search, Dashboard, Match, Admin, Notice Detail. The audience split is UI-only; server auth is the real guardrail for admin.
  • Trust gate blocks Search and (post P1-T4) Match when is_queryable=false. Bulk lookup uses the same trust gate for upload, job list/detail, and download.
  • Area-risk UI clearly labels evidence scope (city vs state), match source (impacted address vs text geography), and caveats.
  • MAC Freeze renders in its own panel with explicit "not by itself a shutdown" copy.
  • Bulk lookup UI exposes /bulk-lookup for both internal and consumer audiences. It uploads xlsx files, polls queued/running jobs, renders color counts and top carriers, and links to the enriched workbook download when complete.

Brand and visual system

  • The SPA uses the Masters Telecom palette: Navy #1A2B62, Adriatic Blue #4194BC, Light Blue #A0C6DB, Grey #808085, and White #FFFFFF.
  • Brand colors are chrome only: header, links, primary buttons, subtle info surfaces, and borders.
  • Tier colors stay semantic and visually distinct from the brand blues: Red #B42318, Orange #DC6803, Yellow #CA8504, Violet #6B5CA5, and Green #079455.
  • The backend grade_bucket="blue" contract is unchanged, but the UI renders that tier as Violet so MAC-Freeze does not read as generic info next to Navy/Adriatic chrome.
  • User-facing blue-tier copy keeps the label "Future Shutdown Notice Anticipated"; only color-word references say Violet.

Hosted deployment shape

  • Docker image (see Dockerfile) builds the frontend, then copies the dist into a Python runtime that serves both the SPA and the API on port 7860.
  • Hugging Face Space executes the image. Optional storage backend is a private HF Dataset repo.
  • Neon Postgres holds the corpus. vector extension required.
  • APScheduler runs weekly. In multi-instance deployments the scheduler acquires a PG advisory lock before firing (P2-T3).
  • APScheduler also runs daily bulk-lookup retention cleanup at 03:00 with misfire_grace_time=300 and coalesce=True.

Bulk lookup

Bulk lookup is a trust-gated user-facing workflow for operators who need to enrich a spreadsheet of locations instead of checking one area at a time.

  • POST /bulk-lookup/jobs accepts one .xlsx file with city/state columns, validates headers and row limits synchronously, stores the input blob, creates a BulkLookupJob, and queues background processing.
  • GET /bulk-lookup/jobs lists the 20 most recent jobs.
  • GET /bulk-lookup/jobs/{id} returns status, row counts, color counts, top carriers, and download availability.
  • GET /bulk-lookup/jobs/{id}/download streams the completed enriched workbook and returns 409 until processing is complete or 410 after expiry.

Processing uses ThreadPoolExecutor(max_workers=settings.bulk_lookup_concurrent_workers) on app.state. Each worker opens its own SQLAlchemy session, reads the input from the configured storage backend, calls assess_area_risk for each valid row, flags invalid rows without failing the whole job, and writes a Results sheet plus Summary sheet. The executor is shut down in the FastAPI lifespan finally block.

Retention defaults to 7 days. The scheduler cleanup job and python app/cli.py cleanup-bulk-lookup-jobs both call the same cleanup service to expire rows and delete stored input/output blobs.

Active window

A notice is is_active=True when:

  • Its latest of issue_date, revised_date, target_date, or fetched_at falls within POTS_TRACKER_LOOKBACK_MONTHS (default 60).
  • Non-shutdown notices still respect the post-target grace window from P2-T5: target_date is None or target_date + active_window_post_target_grace_days >= today.
  • Shutdown notices with target_date <= today remain active permanently so they stay available as historical red evidence for area-risk grading.

Archived notices can still exist in the corpus, but they no longer flow through a separate historical-context channel.

Signal families

Value Meaning
shutdown Analog / POTS / copper retirement, discontinuance, switch decommissioning, TDM→IP transition, network disclosure. Default.
att_mac_freeze AT&T commercial availability restriction (no new orders / moves / adds / changes, grandfathering). Not a shutdown by itself.

Queries default to shutdown. The UI never cross-mingles the two.

Storage model

See backend/app/pots_shutdown_tracker/models/entities.py for the full schema. Key relationships:

  • RawNotice 1 ←→ 1 NormalizedNotice
  • NormalizedNotice 1 ←→ N NoticeLocation
  • NormalizedNotice 1 ←→ N ImpactedAddress
  • NormalizedNotice 1 ←→ N NoticeChunk
  • NormalizedNotice 1 ←→ N NoticeAuditEvent
  • CustomerMatch is a persisted record of matching results.
  • AIJob records every AI prompt/response for audit.
  • BulkLookupJob tracks uploaded xlsx jobs, blob references, row counts, color counts, top carriers, and retention expiry.

See also