open-navigator / website /docs /data-sources /data-model-erd.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
metadata
displayed_sidebar: developersSidebar
sidebar_position: 1

import ZoomableMermaid from '@site/src/components/ZoomableMermaid';

Data Model & Entity Relationship Diagram

Comprehensive overview of all data entities extracted, processed, and uploaded to HuggingFace datasets.

οΏ½ HuggingFace Dataset Structure

Current Datasets Being Uploaded

open-navigator-data/
β”œβ”€β”€ jurisdictions/          # πŸ›οΈ Core jurisdiction data
β”‚   β”œβ”€β”€ cities              # 19,000+ incorporated places
β”‚   β”œβ”€β”€ counties            # 3,144 U.S. counties
β”‚   β”œβ”€β”€ states              # 50 states + DC, territories
β”‚   β”œβ”€β”€ school_districts    # 13,000+ districts (NCES data)
β”‚   └── census_data         # Basic FIPS codes & census year reference
β”‚
β”œβ”€β”€ demographics/           # πŸ‘₯ Comprehensive demographic data (U.S. Census)
β”‚   β”œβ”€β”€ population          # Total population, age distribution
β”‚   β”œβ”€β”€ race_ethnicity      # Race and ethnicity breakdowns
β”‚   β”œβ”€β”€ income_economics    # Income, poverty, SNAP benefits
β”‚   β”œβ”€β”€ education           # Educational attainment levels
β”‚   β”œβ”€β”€ housing             # Housing units, ownership, values
β”‚   β”œβ”€β”€ employment          # Unemployment, labor force participation
β”‚   └── health_insurance    # Insurance coverage (uninsured, Medicaid, Medicare)
β”‚
β”œβ”€β”€ social/                 # πŸ“± Social media presence
β”‚   β”œβ”€β”€ twitter             # Twitter/X accounts
β”‚   β”œβ”€β”€ facebook            # Facebook pages
β”‚   β”œβ”€β”€ instagram           # Instagram accounts
β”‚   └── linkedin            # LinkedIn pages
β”‚
β”œβ”€β”€ videos/                 # πŸ“Ή Video & streaming platforms
β”‚   β”œβ”€β”€ youtube_channels    # Government YouTube channels
β”‚   β”œβ”€β”€ vimeo              # Vimeo accounts
β”‚   └── livestreams        # Live meeting streams
β”‚
β”œβ”€β”€ platforms/              # πŸ–₯️ Meeting management systems
β”‚   β”œβ”€β”€ legistar           # Legistar URLs
β”‚   β”œβ”€β”€ granicus           # Granicus links
β”‚   β”œβ”€β”€ suiteone           # SuiteOne systems
β”‚   └── civicplus          # CivicPlus platforms
β”‚
β”œβ”€β”€ domains/                # 🌐 Official government websites
β”‚   β”œβ”€β”€ gsa_domains        # .gov domain registry
β”‚   β”œβ”€β”€ municipal_websites # City/county websites
β”‚   └── state_portals      # State government sites
β”‚
β”œβ”€β”€ events/                 # πŸ“‹ Meetings, Hearings & Public Events
β”‚   β”œβ”€β”€ events             # Government meetings, public hearings, community forums, town halls
β”‚   β”œβ”€β”€ event_participants # Officials and organizations participating in events
β”‚   β”œβ”€β”€ event_agenda_items # Individual agenda topics discussed
β”‚   β”œβ”€β”€ event_documents    # Agendas, minutes, presentations, handouts
β”‚   β”œβ”€β”€ event_media        # Video recordings, livestreams, audio files
β”‚   └── event_bills        # Bills discussed or considered at meetings
β”‚
β”œβ”€β”€ contacts/               # πŸ‘₯ All People - Officials, Candidates, Donors, Constituents
β”‚   β”œβ”€β”€ officials          # Elected and appointed officials (mayors, council members, legislators)
β”‚   β”œβ”€β”€ official_roles     # Current and historical positions held
β”‚   β”œβ”€β”€ official_contacts  # Email, phone, office addresses
β”‚   β”œβ”€β”€ official_identifiers # External IDs (Twitter, OpenStates, Ballotpedia)
β”‚   β”œβ”€β”€ official_links     # Websites, social media profiles
β”‚   β”œβ”€β”€ candidates         # Political candidates (House, Senate, President - FEC data)
β”‚   β”œβ”€β”€ nonprofit_donors   # Nonprofit leadership political giving (FEC analysis)
β”‚   └── constituents       # Donors, volunteers, members, beneficiaries (all people)
β”‚
β”œβ”€β”€ nonprofits/             # 🏒 Nonprofit organizations & churches
β”‚   β”œβ”€β”€ irs_eobmf          # IRS EO-BMF bulk data (1.9M+ organizations) - PRIMARY SOURCE
β”‚   β”œβ”€β”€ irs_nonprofits     # Legacy IRS 990 data (deprecated - use irs_eobmf)
β”‚   β”œβ”€β”€ propublica_data    # ProPublica API (financials, NTEE codes)
β”‚   β”œβ”€β”€ everyorg_data      # Every.org API (missions, causes, logos)
β”‚   β”œβ”€β”€ nonprofit_990s     # Detailed Form 990 financials (yearly filings)
β”‚   β”œβ”€β”€ congregations      # πŸ› Church & congregation data (ARDA, HIFLD, NCS)
β”‚   β”œβ”€β”€ constituents       # 🀝 Donors, volunteers, members, beneficiaries (Microsoft CDM)
β”‚   β”œβ”€β”€ donations          # πŸ’ Financial contributions and in-kind gifts (Microsoft CDM)
β”‚   β”œβ”€β”€ campaigns          # πŸ“£ Fundraising campaigns and appeals (Microsoft CDM)
β”‚   β”œβ”€β”€ memberships        # 🎫 Member enrollment and renewals (Microsoft CDM)
β”‚   β”œβ”€β”€ volunteer_activities # πŸ™‹ Volunteer hours and activities (Microsoft CDM)
β”‚   β”œβ”€β”€ program_delivery   # 🎯 Programs and services delivered (Microsoft CDM)
β”‚   └── program_outcomes   # πŸ“Š Impact metrics and outcome measurements (Microsoft CDM)
β”‚
β”œβ”€β”€ grants/                 # πŸ’΅ Grant funding transactions
β”‚   β”œβ”€β”€ nonprofit_grants   # Grants to nonprofits (from 990 Schedule I)
β”‚   β”œβ”€β”€ government_grants  # Government grants to orgs/jurisdictions
β”‚   β”œβ”€β”€ foundation_grants  # Private foundation grants
β”‚   └── federal_grants     # Federal funding programs
β”‚
β”œβ”€β”€ causes/                 # 🎯 Cause & category taxonomy
β”‚   β”œβ”€β”€ ntee_codes         # IRS NTEE classification system
β”‚   └── everyorg_causes    # Every.org cause tags
β”‚
β”œβ”€β”€ budgets/                # πŸ’° Government budgets & finances
β”‚   β”œβ”€β”€ city_budgets       # City/municipal budgets & spending
β”‚   β”œβ”€β”€ county_budgets     # County budgets & expenditures
β”‚   β”œβ”€β”€ state_budgets      # State government finances
β”‚   β”œβ”€β”€ school_budgets     # School district finances (NCES F-33)
β”‚   β”œβ”€β”€ bond_debt          # Municipal bonds & debt obligations
β”‚   β”œβ”€β”€ budget_line_items  # πŸ“Š Detailed budget categories & line items
β”‚   └── budget_deltas      # πŸ” Budget-to-Minutes Delta analysis (political economy)
β”‚
β”œβ”€β”€ decisions/              # βš–οΈ Policy decisions & political economy analysis
β”‚   β”œβ”€β”€ policy_decisions   # Extracted decisions from meetings
β”‚   β”œβ”€β”€ decision_frames    # Frame analysis (rhetoric patterns)
β”‚   β”œβ”€β”€ decision_options   # Options considered & rejected
β”‚   β”œβ”€β”€ decision_tradeoffs # Tradeoffs discussed (cost vs benefit, etc.)
β”‚   β”œβ”€β”€ stakeholder_positions # πŸ‘₯ Who spoke for/against (Influence Radar)
β”‚   β”œβ”€β”€ decision_votes     # Detailed vote records per decision
β”‚   β”œβ”€β”€ deferral_patterns  # πŸ“… Stalling detection (same topic, multiple deferrals)
β”‚   β”œβ”€β”€ deferral_instances # Individual tabling events linked to patterns
β”‚   β”œβ”€β”€ keyword_density    # Quantitative indicators (grant/taxpayer/emergency)
β”‚   └── deferral_patterns  # Tabled/delayed decisions (temporal analysis)
β”‚
β”œβ”€β”€ elections/              # πŸ“… Election cycles & temporal analysis
β”‚   β”œβ”€β”€ election_cycles    # Election dates & periods
β”‚   └── election_influences # Pre/post-election decision patterns
β”‚
β”œβ”€β”€ campaigns/              # πŸ’° Political campaign finance (FEC data)
β”‚   β”œβ”€β”€ committees         # PACs, Super PACs, campaign committees
β”‚   └── contributions      # Individual political contributions $200+
β”‚
β”œβ”€β”€ civic/                  # πŸ—³οΈ Google Civic & Wikidata
β”‚   β”œβ”€β”€ civic_divisions    # OCD divisions
β”‚   β”œβ”€β”€ representatives    # From Google Civic API
β”‚   β”œβ”€β”€ wikidata_entities  # Structured entities
β”‚   └── dbpedia_resources  # Wikipedia infobox data
β”‚
β”œβ”€β”€ ballots/                # πŸ—³οΈ Ballot initiatives & referendums
β”‚   β”œβ”€β”€ state_measures      # State propositions (fluoridation votes!)
β”‚   β”œβ”€β”€ local_measures      # City/county ballot questions
β”‚   └── election_results    # Historical voting outcomes
β”‚
β”œβ”€β”€ bills/                  # πŸ“œ Legislation & Lawmaking
β”‚   β”œβ”€β”€ bills              # Bills and resolutions (1.5M+ from all 50 states)
β”‚   β”œβ”€β”€ bill_actions       # Bill history (introduced, committee, floor vote, signed)
β”‚   β”œβ”€β”€ bill_sponsorships  # Primary sponsors and co-sponsors
β”‚   β”œβ”€β”€ bill_abstracts     # Bill summaries and descriptions
β”‚   β”œβ”€β”€ bill_versions      # Different versions of bill text (introduced, amended, enrolled)
β”‚   β”œβ”€β”€ bill_version_links # Links to PDF/HTML bill text
β”‚   β”œβ”€β”€ bill_documents     # Supporting documents (fiscal notes, amendments, analysis)
β”‚   β”œβ”€β”€ bill_document_links # Links to supporting documents
β”‚   β”œβ”€β”€ bill_subjects      # Subject/topic tags per bill
β”‚   β”œβ”€β”€ legislative_sessions # Session identifiers (2023 Regular, 2024 Special, etc.)
β”‚   β”œβ”€β”€ vote_events        # Roll call votes on bills
β”‚   β”œβ”€β”€ vote_counts        # Vote tallies (yes, no, abstain, absent)
β”‚   └── individual_votes   # Individual legislator votes (yes/no/abstain)
β”‚
β”œβ”€β”€ topics/                 # 🎯 Advocacy causes & campaigns
β”‚   β”œβ”€β”€ topic_definitions   # Validated survey questions from Roper Center
β”‚   β”œβ”€β”€ survey_questions    # Public opinion question wording library
β”‚   β”œβ”€β”€ jurisdiction_topics # What each city is discussing
β”‚   └── advocacy_alerts     # Opportunities for engagement
β”‚
β”œβ”€β”€ surveys/                # πŸ“Š Public opinion research & polling data
β”‚   β”œβ”€β”€ survey_providers    # Polling organizations (Gallup, Pew, Roper, etc.)
β”‚   β”œβ”€β”€ survey_studies      # Individual survey studies/waves
β”‚   β”œβ”€β”€ survey_variables    # Questions/items asked in surveys
β”‚   β”œβ”€β”€ survey_responses    # Aggregate and individual response data
β”‚   β”œβ”€β”€ ipoll_metadata      # Roper iPoll catalog metadata
β”‚   └── survey_crosstabs    # Breakdowns by demographics, geography
β”‚
β”œβ”€β”€ factchecks/             # βœ… Fact-checking & claim verification
β”‚   β”œβ”€β”€ claim_reviews       # Google Fact Check API (ClaimReview schema)
β”‚   β”œβ”€β”€ politifact          # PolitiFact Truth-O-Meter ratings
β”‚   β”œβ”€β”€ factcheck_org       # FactCheck.org verified claims
β”‚   └── verified_claims     # Aggregated fact-check database
β”‚
β”œβ”€β”€ civic_tech/             # πŸ’» Open source projects & hackathons
β”‚   β”œβ”€β”€ github_repositories # Civic tech projects (GitHub API)
β”‚   β”œβ”€β”€ project_metadata    # Code for America, USDR, Civic Tech Field Guide
β”‚   β”œβ”€β”€ contributors        # Maintainers and core contributors
β”‚   β”œβ”€β”€ project_issues      # Good first issues, contribution opportunities
β”‚   β”œβ”€β”€ hackathons          # Civic hackathon events
β”‚   β”œβ”€β”€ hackathon_projects  # Projects built at hackathons
β”‚   β”œβ”€β”€ brigade_chapters    # Code for America brigade locations
β”‚   └── project_funding     # GitHub Sponsors, grants, OpenCollective
β”‚
β”œβ”€β”€ community_solutions/    # 🌟 Community engagement & use cases
β”‚   β”œβ”€β”€ engagement_spectrum # Spectrum of Community Engagement to Ownership
β”‚   β”œβ”€β”€ use_case_catalog   # Harvard Data-Smart City Solutions examples
β”‚   β”œβ”€β”€ data_academies     # Brookings Institution training programs
β”‚   β”œβ”€β”€ success_stories    # Real-world outcomes (Providence, Portland, Tempe)
β”‚   β”œβ”€β”€ metric_templates   # Pre-built analytics for common challenges
β”‚   └── workflow_guides    # Step-by-step community data workflows
β”‚
β”œβ”€β”€ analytics/              # πŸ“Š Time dimensions & metric views
β”‚   β”œβ”€β”€ date_dimension      # Date/time reference table (YYYY-MM-DD, day_of_week, fiscal_year)
β”‚   β”œβ”€β”€ temporal_relationships  # Time-series joins for all entities
β”‚   β”œβ”€β”€ metric_views        # Pre-computed analytics (advocacy, spending, nonprofit impact)
β”‚   β”œβ”€β”€ aggregated_stats    # Monthly/quarterly/yearly rollups
β”‚   └── dashboard_metrics   # Real-time dashboard data feeds
β”‚
β”œβ”€β”€ standards/              # 🌐 Schema.org, Popolo, CEDS, IATI exports
β”‚   β”œβ”€β”€ schema_org_jsonld   # JSON-LD exports (Event, Person, Organization, Legislation, ClaimReview)
β”‚   β”œβ”€β”€ popolo_exports      # Popolo-compliant JSON (Person, Organization, Membership, VoteEvent)
β”‚   β”œβ”€β”€ ceds_aligned        # CEDS-compliant education data (Element IDs, Option Sets)
β”‚   β”œβ”€β”€ ocd_divisions       # Open Civic Data division IDs
β”‚   β”œβ”€β”€ iati_activities     # IATI Standard v2.03 XML (programs, grants, humanitarian aid)
β”‚   └── rdf_triples         # RDF/Turtle semantic web exports
β”‚
β”œβ”€β”€ vocabulary/             # πŸ”§ OMOP-inspired concept & terminology (SYSTEM-INTERNAL)
β”‚   β”œβ”€β”€ concept             # Master concept table (cities, causes, officials)
β”‚   β”œβ”€β”€ vocabulary          # Vocabulary sources (OCD_ID, IRS_NTEE, US_Census)
β”‚   β”œβ”€β”€ concept_class       # Concept classifications (City, County, 501c3, Mayor)
β”‚   β”œβ”€β”€ concept_relationship # Relationships (City β†’ County, Topic β†’ Legislation)
β”‚   └── domain              # Domain groupings (Jurisdiction, Nonprofit, Policy)
β”‚
└── exports/                # πŸ“€ API-ready formatted exports
    β”œβ”€β”€ csv_bulk            # CSV downloads for all datasets
    β”œβ”€β”€ json_api            # REST API JSON responses
    β”œβ”€β”€ graphql_schema      # GraphQL schema definitions
    └── parquet_optimized   # Compressed Parquet (default format)

Parquet File Naming Convention

Rule: Use underscores (_) consistently, NOT hyphens (-)

Format: {category}_{subcategory}.parquet

Examples:

βœ… CORRECT (using underscores):
jurisdictions_cities.parquet
jurisdictions_counties.parquet
jurisdictions_states.parquet
jurisdictions_school_districts.parquet
social_twitter.parquet
social_facebook.parquet
videos_youtube_channels.parquet
events_events.parquet
contacts_officials.parquet
bills_bills.parquet
nonprofits_organizations.parquet
nonprofits_financials.parquet
nonprofits_programs.parquet
nonprofits_locations.parquet
nonprofits_irs_eobmf.parquet
nonprofits_constituents.parquet
nonprofits_donations.parquet
nonprofits_campaigns.parquet        # Nonprofit fundraising campaigns (NOT political)
contacts_candidates.parquet         # Political candidates (FEC)
contacts_nonprofit_donors.parquet   # Nonprofit leadership political giving (FEC analysis)
contacts_constituents.parquet       # Donors, volunteers, members, beneficiaries
campaigns_committees.parquet        # Political committees/PACs (FEC)
campaigns_contributions.parquet     # Political contributions (FEC)
nonprofits_memberships.parquet
nonprofits_volunteer_activities.parquet
nonprofits_program_delivery.parquet
nonprofits_program_outcomes.parquet
grants_federal_grants.parquet
contacts_officials.parquet
contacts_official_roles.parquet
contacts_official_contacts.parquet
contacts_official_identifiers.parquet
contacts_official_links.parquet
contacts_candidates.parquet
contacts_nonprofit_donors.parquet
contacts_constituents.parquet
bills_bills.parquet
bills_bill_actions.parquet
bills_bill_sponsorships.parquet
bills_bill_abstracts.parquet
bills_bill_versions.parquet
bills_bill_version_links.parquet
bills_bill_documents.parquet
bills_bill_document_links.parquet
bills_bill_subjects.parquet
bills_legislative_sessions.parquet
bills_vote_events.parquet
bills_vote_counts.parquet
bills_individual_votes.parquet
events.parquet
event_participants.parquet
event_agenda_items.parquet
event_documents.parquet
event_media.parquet
event_bills.parquet
budgets_city_budgets.parquet
surveys_national_polls.parquet
surveys_roper_questions.parquet
surveys_survey_providers.parquet
surveys_survey_studies.parquet
surveys_survey_variables.parquet
surveys_survey_responses.parquet
surveys_ipoll_metadata.parquet
factchecks_claim_reviews.parquet
factchecks_politifact.parquet
analytics_date_dimension.parquet
analytics_metric_views.parquet
analytics_temporal_relationships.parquet
standards_schema_org_jsonld.parquet
standards_popolo_exports.parquet
standards_ceds_aligned.parquet
standards_iati_activities.parquet
vocabulary_concept.parquet
vocabulary_vocabulary.parquet
vocabulary_concept_class.parquet
vocabulary_concept_relationship.parquet

❌ INCORRECT (using hyphens):
jurisdictions-cities.parquet
social-twitter.parquet
meetings-government-meetings.parquet
surveys-national-polls.parquet
factchecks-claim-reviews.parquet
analytics-date-dimension.parquet
standards-schema-org.parquet

Why Underscores?

  • βœ… Python-friendly variable names (can use data.jurisdictions_cities)
  • βœ… SQL-compatible column names
  • βœ… Consistent with folder structure (school_districts, not school-districts)
  • βœ… Better for programmatic access
  • βœ… Avoids shell escaping issues

Repository Name Exception:

  • HuggingFace repo: CommunityOne/open-navigator-data (hyphen is fine for URLs)
  • File names inside repo: Use underscores (jurisdictions_cities.parquet)

πŸ”„ Data Extraction Pipeline

Phase 1: Discovery (Bronze Layer)

  1. Census Data β†’ Jurisdictions list
  2. GSA Domains β†’ Government websites
  3. NCES β†’ School districts with financial data (F-33 forms)
  4. IRS EO-BMF β†’ ALL 1.9M+ U.S. tax-exempt organizations (PRIMARY SOURCE)
  5. IRS TEOS β†’ Legacy nonprofit EINs (deprecated - use IRS EO-BMF)
  6. Census of Governments β†’ Municipal budgets & finances
  7. URL Discovery β†’ Meeting platforms, YouTube, budget PDFs
  8. Social Media β†’ Twitter, Facebook accounts

Phase 2: Enrichment (Silver Layer)

  1. IRS EO-BMF β†’ Complete nonprofit registry with 28 data fields per organization
  2. ProPublica Nonprofit Explorer β†’ Enhanced financial data, detailed 990 filings
  3. Every.org API β†’ Nonprofit causes, missions, logos
  4. ARDA (Association of Religion Data Archives) β†’ Congregation characteristics, health ministries
  5. HIFLD Places of Worship β†’ Geospatial church locations (350K+ congregations)
  6. National Congregations Study β†’ Social service provision patterns
  7. NCES F-33 Finance Survey β†’ School district budgets, per-pupil spending
  8. Census Annual Survey β†’ State/local government finances
  9. Municipal Securities Rulemaking Board (EMMA) β†’ Bond debt data
  10. YouTube API β†’ Channel statistics
  11. Open States PostgreSQL Database β†’ Complete legislative data (~10 GB monthly dump)
    • 8,600+ people (legislators, governors, mayors) across all 50 states + DC + Puerto Rico
    • 1.5M+ bills with full text and history
    • 13M+ bill actions (introduced, committee, amendments, floor votes, signed)
    • 7.2M+ sponsorships (primary sponsors and co-sponsors)
    • 3.5M+ bill versions (as introduced, committee substitute, enrolled, enacted)
    • 180K+ events (legislative meetings, hearings, committee sessions)
    • 835K+ event participants (who spoke, testified, or attended)
    • 524K+ agenda items from meetings
    • Vote events with individual legislator positions
    • Organizations (legislative bodies, committees)
    • Jurisdictions (states, territories)
    • Updated monthly from https://data.openstates.org/postgres/monthly/
  12. OpenStates People Repository β†’ Current legislator contact info
  13. Wikidata SPARQL β†’ Entity relationships
  14. DBpedia β†’ Wikipedia structured data
  15. Google Civic β†’ Representatives
  16. OpenFEC API β†’ Political contributions, candidates, committees (campaign finance)
  17. GitHub API β†’ Civic tech projects, contributors, issues
  18. Civic Tech Field Guide β†’ Curated project taxonomy
  19. Code for America β†’ Brigade projects and hackathons
  20. Digital Public Goods Alliance β†’ DPG-certified open source projects

Phase 3: Processing (Gold Layer)

  1. Meeting Extraction β†’ Agenda/minutes text
  2. Video Transcripts β†’ YouTube captions
  3. Document Analysis β†’ Keyword detection
  4. Relationship Mapping β†’ Entity connections
  5. Oral Health Filtering β†’ Topic classification
  6. Temporal Indexing β†’ Date dimension table, time-series relationships
  7. Metric View Creation β†’ Pre-computed analytics (advocacy activity, government spending, nonprofit impact)
  8. Schema.org JSON-LD β†’ Structured data exports (Event, Person, Organization, Legislation, ClaimReview)
  9. Popolo Compliance β†’ Open government standard exports (Person, Organization, Membership, VoteEvent)
  10. CEDS Alignment β†’ Education data mapping to NCES Element IDs and Option Sets

New Dataset Categories Explained

πŸ“Š Analytics Datasets

Purpose: Enable time-series analysis, trend detection, and dashboard metrics without complex SQL queries.

Dataset Description Refresh Frequency
analytics_date_dimension Calendar reference table with fiscal years, quarters, day-of-week, holidays Static (updated annually)
analytics_temporal_relationships Pre-joined date keys for all time-based entities (meetings, votes, budgets, filings) Daily
analytics_metric_views Pre-computed analytics like advocacy_activity, government_spending, nonprofit_impact Hourly
analytics_aggregated_stats Monthly/quarterly/yearly rollups (meeting counts, budget totals, grant sums) Daily
analytics_dashboard_metrics Real-time feeds for dashboards (active meetings today, trending topics, hot ballot measures) Every 5 minutes

Example Use Case:

-- Instead of complex joins, use metric view:
SELECT * FROM analytics_metric_views 
WHERE metric_name = 'advocacy_activity' 
  AND jurisdiction_id = 'ocd-division/country:us/state:al/place:birmingham'
  AND date_period = '2024-Q1';

🌐 Standards-Compliant Exports

Purpose: Maximum interoperability with civic tech platforms, search engines, and semantic web tools.

Dataset Standard Use Case Consumers
standards_schema_org_jsonld Schema.org JSON-LD Google Search rich results, voice assistants Google, Bing, Alexa, Siri
standards_popolo_exports Popolo Project Civic tech platform integration mySociety, OpenNorth, Sunlight Foundation
standards_ceds_aligned Common Education Data Standards Education data exchange, NCES reporting State education depts, Ed-Fi, IMS Global
standards_ocd_divisions Open Civic Data IDs Cross-platform jurisdiction referencing Google Civic, Ballotpedia, Vote Smart
standards_rdf_triples RDF/Turtle Linked open data, knowledge graphs DBpedia, Wikidata, SPARQL endpoints

Example Schema.org Export:

{
  "@context": "https://schema.org",
  "@type": "GovernmentOrganization",
  "name": "Birmingham City Council",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Birmingham",
    "addressRegion": "AL"
  },
  "event": [{
    "@type": "Event",
    "name": "Regular City Council Meeting",
    "startDate": "2024-01-15T18:00:00-06:00"
  }]
}

βœ… Fact-Checking Datasets

Purpose: Verify claims made in meetings, legislation, and political speech.

Dataset Source Fields Update Frequency
factchecks_claim_reviews Google Fact Check API claimReviewed, reviewRating, author, datePublished Daily
factchecks_politifact PolitiFact web scraping claim, ruling, truth_o_meter, context Daily
factchecks_factcheck_org FactCheck.org API/scraping claim, verdict, analysis, sources Daily
factchecks_verified_claims Aggregated + deduplicated claim_text, consensus_rating, verification_sources Daily

Integration with Meetings:

  • Cross-reference meeting transcripts with verified claims
  • Flag unverified statements in legislative debates
  • Track politician accuracy scores over time

οΏ½πŸ“Š Complete Data Model (ERD)

<ZoomableMermaid title="Interactive Entity Relationship Diagram" value={` erDiagram %% ======================================== %% CORE JURISDICTION ENTITIES %% ======================================== %% Schema.org type: AdministrativeArea %% OCD-ID format: ocd-division/country:us/state:al/place:birmingham

JURISDICTION ||--o{ EVENT : hosts
JURISDICTION ||--o{ LEADER : employs
JURISDICTION ||--o{ YOUTUBE_CHANNEL : operates
JURISDICTION ||--o{ SOCIAL_MEDIA : maintains
JURISDICTION ||--o{ MEETING_PLATFORM : uses
JURISDICTION {
    string jurisdiction_id PK
    string name
    string jurisdiction_type
    string state_code
    string county_name
    string website_url
    float latitude
    float longitude
    string fips_code
    int completeness_score
    datetime discovered_at
}

%% Census and Government Data
JURISDICTION ||--o| CENSUS_DATA : has
CENSUS_DATA {
    string jurisdiction_id PK
    string county_fips
    string place_fips
    datetime census_year
}

JURISDICTION ||--o| GSA_DOMAIN : uses
GSA_DOMAIN {
    string domain PK
    string jurisdiction_id FK
    string domain_type
    string agency_name
    string organization_type
    string city
    string state
    datetime created_date
}

%% Government Finances
JURISDICTION ||--o{ GOVERNMENT_BUDGET : has
GOVERNMENT_BUDGET {
    string budget_id PK
    string jurisdiction_id FK
    int fiscal_year
    float total_revenue
    float total_expenditures
    float total_debt
    float property_tax_revenue
    float sales_tax_revenue
    float federal_grants
    float state_grants
    float general_fund_balance
    string budget_document_url
    datetime published_date
}

%% ========================================
%% SCHOOL DISTRICTS (NCES)
%% ========================================
%% Based on CEDS (Common Education Data Standards) and NCES data elements
%% See: https://ceds.ed.gov/ and https://github.com/CEDStandards
%% Schema.org type: EducationalOrganization

JURISDICTION ||--o{ SCHOOL_DISTRICT : contains
SCHOOL_DISTRICT ||--o{ LEADER : governed_by
SCHOOL_DISTRICT {
    string nces_id PK
    string district_name
    string jurisdiction_id FK
    string state_code
    string county_name
    string district_type
    int total_students
    int total_schools
    float total_revenue
    float total_expenditures
    float per_pupil_spending
    float federal_revenue
    float state_revenue
    float local_revenue
    string phone
    string website
    string superintendent
    datetime school_year
}

%% ========================================
%% EVENTS & MEETINGS
%% ========================================
%% Schema.org types: Event, GovernmentEvent, VideoObject, DigitalDocument
%% Parquet files: events.parquet, event_documents.parquet, event_participants.parquet, etc.

%% EVENT - Parent entity for all public events
JURISDICTION ||--o{ EVENT : hosts
EVENT ||--o{ EVENT_PARTICIPANT : includes
EVENT ||--o{ EVENT_AGENDA_ITEM : contains
EVENT ||--o{ EVENT_DOCUMENT : produces
EVENT ||--o{ EVENT_MEDIA : recorded_as
EVENT ||--o{ EVENT_BILL : discusses
EVENT {
    string event_id PK
    string jurisdiction_id FK
    string event_type "meeting/hearing/forum/workshop/town_hall"
    string event_category "legislative/planning/public_health/education"
    string meeting_type "regular/special/emergency/work_session/executive"
    string platform "legistar/granicus/zoom/youtube_live"
    string meeting_number "Sequential identifier"
    datetime event_date
    datetime end_date
    string event_title
    string body_name "City Council/Planning Commission/School Board"
    string status "scheduled/in_progress/completed/cancelled/postponed"
    string location_type "in_person/virtual/hybrid"
    string venue_name
    string venue_address
    string agenda_packet_url
    string minutes_url
    string source_url
    boolean requires_registration
    float registration_fee
    int max_capacity
    string presenter
    string training_topic
    string target_audience
    int agenda_item_count
    int document_count
    boolean has_video
    boolean has_transcript
    boolean oral_health_related
    string data_source "openstates/legistar/granicus/manual"
    datetime agenda_published_at
    datetime minutes_approved_at
    datetime created_at
    datetime extracted_at
}

%% EVENT_PARTICIPANT - Who participated in the event
EVENT_PARTICIPANT {
    string participant_id PK
    string event_id FK
    string participant_name
    string participant_type "official/legislator/staff/public/expert/advocate"
    string organization_name
    string organization_id FK "Links to ORGANIZATION or LEADER"
    string role "chair/member/speaker/witness/observer"
    string participation_type "voting/testifying/presenting/attending"
    int speaking_order
    int speaking_duration_seconds
    string remarks_summary
    datetime participation_date
}

%% EVENT_AGENDA_ITEM - Individual topics discussed at event
EVENT_AGENDA_ITEM {
    string agenda_item_id PK
    string event_id FK
    string item_number "1.A/2.B/etc"
    string title
    string description
    string item_type "action/discussion/report/public_comment/consent"
    string sponsor_name
    string department "Public Works/Health Dept/Finance"
    int sequence_order
    int estimated_duration_minutes
    int actual_duration_minutes
    string outcome "approved/rejected/tabled/amended/continued"
    string vote_result "5-2/unanimous/voice vote"
    boolean requires_public_hearing
    datetime scheduled_time
    string keywords_found
    boolean oral_health_related
}

%% EVENT_DOCUMENT - All documents (agendas, minutes, presentations, handouts)
%% Consolidates former AGENDA, MINUTES, DOCUMENT entities
EVENT_DOCUMENT {
    string document_id PK
    string event_id FK
    string agenda_item_id FK "Optional - links to specific agenda item"
    string document_type "agenda/minutes/presentation/handout/staff_report/ordinance/resolution"
    string title
    string full_text
    string summary_text
    string file_url
    string pdf_url
    string file_type "pdf/docx/pptx/txt"
    int file_size_bytes
    int page_count
    string action_items "For minutes - extracted action items"
    string votes "For minutes - vote records"
    string keywords_found
    boolean oral_health_related
    datetime published_at
    datetime approved_at "For minutes"
    datetime uploaded_at
}

%% EVENT_MEDIA - Video recordings, livestreams, audio files
%% Consolidates former VIDEO entity
EVENT_MEDIA {
    string media_id PK
    string event_id FK
    string media_type "video/audio/livestream/recording"
    string platform "youtube/vimeo/granicus/zoom"
    string media_url
    string embed_url
    string thumbnail_url
    int duration_seconds
    int view_count
    string transcript_text
    string transcript_url
    string caption_language "en/es/etc"
    boolean has_captions
    string video_quality "720p/1080p/4K"
    datetime published_at
    datetime recorded_at
}

%% EVENT_BILL - Bills discussed or voted on at event
%% Links events to legislative bills
EVENT_BILL {
    string event_bill_id PK
    string event_id FK
    string bill_id FK "Links to BILL entity"
    string agenda_item_id FK "Optional - specific agenda item"
    string action_taken "introduced/discussed/committee_vote/floor_vote/signed"
    string vote_result "passed/failed/tabled"
    int yes_votes
    int no_votes
    int abstain_votes
    string discussion_summary
    datetime action_date
}

%% ========================================
%% POLITICAL ECONOMY ANALYSIS
%% ========================================
%% Entities supporting the 4-step advocacy framework:
%%   Step 1: Rhetoric Gap (Frame Analysis)
%%   Step 2: Displacement Matrix (Budget Delta)
%%   Step 3: Influence Radar (Stakeholder Analysis)
%%   Step 4: Deferral Pattern (Temporal Voting Analysis)
%% See: extraction/decision_analyzer.py, extraction/budget_analyzer.py

EVENT ||--o{ POLICY_DECISION : produces
POLICY_DECISION ||--o{ DECISION_FRAME : framed_as
POLICY_DECISION ||--o{ DECISION_OPTION : considered
POLICY_DECISION ||--o{ DECISION_TRADEOFF : discussed
POLICY_DECISION ||--o{ STAKEHOLDER_POSITION : influenced_by
POLICY_DECISION ||--o{ DECISION_VOTE : voted_on
POLICY_DECISION {
    string decision_id PK
    string event_id FK "Links to EVENT"
    string agenda_item_id FK "Optional - links to specific agenda item"
    string decision_summary
    string outcome "approved/rejected/tabled/amended"
    string chosen_option
    string primary_frame "Economic Development/Public Safety/Equity"
    string primary_rationale
    string vote_result "5-2/unanimous/voice vote"
    int contention_score "0-100: Ratio of dissent"
    string implementation_timeline
    string cost_estimate
    float confidence_score
    datetime event_date
    datetime extracted_at
}

%% STEP 1: RHETORIC GAP - Frame Analysis
DECISION_FRAME {
    string frame_id PK
    string decision_id FK
    string frame_type "primary/competing"
    string frame_name "public health/fiscal responsibility/equity"
    string framing_language "Specific phrases used"
    int mention_count
}

DECISION_OPTION {
    string option_id PK
    string decision_id FK
    string option_description
    boolean was_chosen
    string rejection_reason
    string opportunity_cost
}

DECISION_TRADEOFF {
    string tradeoff_id PK
    string decision_id FK
    string tradeoff_type "cost vs benefit/autonomy vs community"
    string discussion_text
    string evidence_cited
}

%% STEP 3: INFLUENCE RADAR - Stakeholder Analysis
STAKEHOLDER_POSITION {
    string position_id PK
    string decision_id FK
    string stakeholder_name
    string stakeholder_role "Health Dept/PTA/Taxpayer Assoc"
    string stakeholder_affiliation
    string position_type "supporter/opponent/undecided"
    string main_argument
    string evidence_type "study/expert opinion/data"
    int speaking_order "Track who speaks when"
}

%% STEP 4: DEFERRAL PATTERN - Temporal Voting Analysis
DECISION_VOTE {
    string vote_record_id PK
    string decision_id FK
    string leader_id FK
    string vote_value "yes/no/abstain/absent"
    string stated_reason
    boolean switched_position
    int days_since_election "Temporal context"
}

%% Deferral Pattern Tracking - Links same topic across multiple meetings
POLICY_DECISION ||--o{ DEFERRAL_INSTANCE : part_of
DEFERRAL_PATTERN ||--o{ DEFERRAL_INSTANCE : tracks
DEFERRAL_PATTERN {
    string pattern_id PK
    string topic "Community Dental Clinic Funding"
    string conclusion "Strategic delay or genuine study"
    datetime first_mentioned "When first introduced"
    datetime last_discussed "Most recent tabling"
    int total_deferrals "How many times tabled"
    int months_in_limbo "Time since first mention"
    string pattern_type "Rationale of Attrition/Sincere Analysis/Political Timing"
    string strategic_inference "Waiting for momentum to fade"
    int discomfort_score "1-10: Political sensitivity"
    string next_review_date "When scheduled to revisit"
    boolean still_pending
}

DEFERRAL_INSTANCE {
    string instance_id PK
    string pattern_id FK
    string decision_id FK
    datetime deferral_date
    string stated_reason "More study needed/Budget constraints/Public input"
    string speaker "Who gave the justification"
    int months_since_first "Time elapsed"
    string previous_reason "What they said last time"
    boolean reason_changed "Shifting justifications"
}

%% STEP 2: DISPLACEMENT MATRIX - Budget-to-Minutes Delta
GOVERNMENT_BUDGET ||--o{ BUDGET_LINE_ITEM : contains
BUDGET_LINE_ITEM ||--o{ BUDGET_DELTA : analyzed_as
BUDGET_LINE_ITEM {
    string line_item_id PK
    string budget_id FK
    string category "Education/Health/Infrastructure"
    string subcategory
    string description
    float current_amount
    float previous_amount
    float change_amount
    float percent_change
    string funding_source "grants/taxpayer/bonds"
    int fiscal_year
}

BUDGET_DELTA {
    string delta_id PK
    string line_item_id FK
    int event_mentions "How many times discussed at events"
    string praise_level "High/Medium/Low/None"
    string funding_change "Expansion/Stagnant/Decreased"
    string delta_type "Expansion/Lip Service/Hidden Priority/Aligned"
    float delta_score "-1 to +1: rhetoric vs reality gap"
    string stated_rationale "What they said at events"
    string inferred_rationale "What budget reveals"
    string underlying_logic "Genuine/Performative/Bureaucratic"
    datetime analyzed_at
}

%% QUANTITATIVE INDICATORS
EVENT ||--o{ KEYWORD_DENSITY : measured_by
KEYWORD_DENSITY {
    string density_id PK
    string event_id FK "Links to EVENT"
    string keyword "grant/taxpayer/emergency/equity"
    string keyword_category "funding_source/urgency/values"
    int occurrence_count
    float per_1000_words
    datetime analyzed_at
}

%% ELECTION CYCLE ANALYSIS
JURISDICTION ||--o{ ELECTION_CYCLE : holds
ELECTION_CYCLE ||--o{ POLICY_DECISION : influences
ELECTION_CYCLE {
    string election_id PK
    string jurisdiction_id FK
    datetime election_date
    string election_type "municipal/school board/state"
    int decisions_12mo_before
    int decisions_6mo_before
    int decisions_3mo_before
    int decisions_6mo_after
    float avg_project_cost_before
    float avg_project_cost_after
    boolean pre_election_spike_detected
    string inference "incumbency protection/normal variance"
}

%% ========================================
%% LEADERS & OFFICIALS
%% ========================================
%% Based on Popolo Project schema for representing people and positions
%% See: https://github.com/popolo-project/popolo-spec
%% Schema.org types: Person, GovernmentOfficial

LEADER ||--o{ VOTE : casts
LEADER ||--o{ SOCIAL_MEDIA : maintains
LEADER {
    string leader_id PK
    string jurisdiction_id FK
    string full_name
    string title
    string position_type
    string office
    string party_affiliation
    string email
    string phone
    string website
    string photo_url
    datetime term_start
    datetime term_end
    boolean is_active
    datetime verified_at
}

VOTE {
    string vote_id PK
    string leader_id FK
    string event_id FK "Links to EVENT"
    string agenda_item_id FK "Optional - links to specific agenda item"
    string item_description
    string vote_value
    datetime vote_date
}

%% ========================================
%% STATE LEGISLATORS & LEGISLATIVE DATA (Open States/Plural Policy)
%% ========================================
%% Data Source: Open States PostgreSQL dump (~10 GB)
%% Coverage: 7,300+ state legislators across all 50 states + DC + Puerto Rico
%% See: https://open.pluralpolicy.com/data/ and https://github.com/openstates/people/blob/master/schema.md
%% Schema.org types: Person, GovernmentOfficial, Legislation, VoteAction
%% Popolo Project compliance: https://www.popoloproject.com/

JURISDICTION ||--o{ LEGISLATOR : represents
LEGISLATOR ||--o{ LEGISLATOR_OFFICE : has
LEGISLATOR ||--o{ COMMITTEE_MEMBERSHIP : serves_on
LEGISLATOR ||--o{ BILL_SPONSOR : sponsors
LEGISLATOR ||--o{ LEGISLATOR_VOTE : casts
LEGISLATOR ||--o{ SOCIAL_MEDIA : maintains
LEGISLATOR {
    string legislator_id PK "ocd-person/{uuid}"
    string jurisdiction_id FK "ocd-division/country:us/state:al"
    string full_name
    string given_name "First name"
    string family_name "Last name"
    string middle_name
    string suffix
    string gender "Male/Female/Other"
    string email "Official email"
    string biography "Official bio text"
    string birth_date "YYYY-MM-DD"
    string death_date "YYYY-MM-DD"
    string image_url "Official photo"
    string twitter_handle "@username"
    string youtube_handle
    string instagram_handle
    string facebook_handle
    string party_name "Democratic/Republican/Independent"
    datetime party_start_date
    datetime party_end_date
    string chamber "upper/lower/legislature"
    string district "District name/number"
    datetime term_start_date
    datetime term_end_date
    string end_reason "resignation/death/term_limit/defeated"
    string website_url
    boolean is_active
    datetime last_updated
}

LEGISLATOR_OFFICE {
    string office_id PK
    string legislator_id FK
    string office_type "District Office/Capitol Office"
    string address "Mailing address"
    string voice "Phone number"
    string fax "Fax number"
    string email "Office email"
    string city
    string state
    string zip_code
    boolean is_primary
}

COMMITTEE ||--o{ COMMITTEE_MEMBERSHIP : includes
COMMITTEE {
    string committee_id PK "ocd-organization/{uuid}"
    string jurisdiction_id FK
    string name "Committee on Health and Human Services"
    string chamber "upper/lower/legislature"
    string classification "committee/subcommittee"
    string parent_committee_id FK "If subcommittee"
    datetime created_date
    datetime abolished_date
    boolean is_active
}

COMMITTEE_MEMBERSHIP {
    string membership_id PK
    string committee_id FK
    string legislator_id FK
    string role "chair/vice_chair/ranking_member/member"
    datetime start_date
    datetime end_date
    boolean is_active
}

BILL ||--o{ BILL_SPONSOR : has
BILL ||--o{ BILL_ACTION : tracked_by
BILL ||--o{ BILL_VERSION : has_versions
BILL ||--o{ VOTE_EVENT : subject_of
BILL {
    string bill_id PK "ocd-bill/{uuid}"
    string jurisdiction_id FK
    string session_id "2024 Regular Session"
    string chamber "upper/lower"
    string identifier "HB 123/SB 456"
    string title "An Act to provide dental screenings in schools"
    string classification "bill/resolution/concurrent_resolution"
    string subject "Health/Education/Budget"
    string full_text "Complete bill text"
    string summary "Bill summary"
    datetime introduced_date
    datetime first_reading_date
    datetime second_reading_date
    datetime third_reading_date
    datetime committee_referral_date
    string committee_assigned
    datetime committee_vote_date
    string committee_outcome "favorable/unfavorable"
    datetime floor_vote_date
    string floor_outcome "passed/failed"
    datetime signed_date
    datetime effective_date
    string status "introduced/committee/floor/passed/failed/signed/vetoed"
    string source_url "Link to official bill text"
    datetime last_action_date
    string last_action_description
    boolean is_oral_health_related
    datetime created_at
    datetime updated_at
}

BILL_SPONSOR {
    string sponsor_id PK
    string bill_id FK
    string legislator_id FK
    string sponsor_type "primary/cosponsor"
    int sponsor_order "1 for primary, 2+ for cosponsors"
    datetime sponsored_date
}

BILL_ACTION {
    string action_id PK
    string bill_id FK
    datetime action_date
    string action_description "Referred to Committee on Health"
    string action_type "introduction/referral/committee/amendment/vote/signing"
    string chamber "upper/lower"
    int sequence_number
}

BILL_VERSION {
    string version_id PK
    string bill_id FK
    string version_name "As Introduced/Committee Substitute/Enrolled"
    string full_text "Complete version text"
    string pdf_url
    datetime version_date
    string note "Amendments adopted"
}

VOTE_EVENT ||--o{ LEGISLATOR_VOTE : includes
VOTE_EVENT {
    string vote_event_id PK "ocd-vote/{uuid}"
    string bill_id FK
    string jurisdiction_id FK
    string chamber "upper/lower"
    datetime vote_date
    string motion_text "Passage of HB 123"
    string motion_classification "passage/amendment/procedural"
    string result "passed/failed"
    int yes_count
    int no_count
    int abstain_count
    int absent_count
    int not_voting_count
    int total_count
    float pass_threshold "0.5 for simple majority, 0.67 for supermajority"
    boolean passed
    string source_url "Link to official vote record"
}

LEGISLATOR_VOTE {
    string legislator_vote_id PK
    string vote_event_id FK
    string legislator_id FK
    string vote_position "yes/no/abstain/absent/not_voting/excused"
    string voter_name "Name at time of vote"
    string voter_party "Party at time of vote"
    string voter_district "District at time of vote"
}

%% ========================================
%% ORGANIZATIONS (NONPROFITS & CHURCHES)
%% ========================================
%% Based on Popolo Project schema for organizations
%% See: https://github.com/popolo-project/popolo-spec
%% Schema.org types: Organization, NGO, Church, WorshipPlace
%% PRIMARY DATA SOURCE: IRS EO-BMF (1.9M+ organizations, 28 fields)

ORGANIZATION ||--o{ SOCIAL_MEDIA : maintains
ORGANIZATION ||--o{ LEADER : employs
ORGANIZATION ||--o{ NONPROFIT_FINANCES : files
ORGANIZATION ||--o{ CAMPAIGN : runs
ORGANIZATION ||--o{ PROGRAM_DELIVERY : delivers
ORGANIZATION ||--o{ CONGREGATION : has
ORGANIZATION {
    string org_id PK
    string ein "Employer ID Number (IRS EO-BMF)"
    string name "Organization legal name (IRS EO-BMF)"
    string sort_name "Alphabetic sort name (IRS EO-BMF)"
    string ntee_code "NTEE classification (IRS EO-BMF)"
    string ntee_description
    string subsection_code "501(c)(3) = 03, etc (IRS EO-BMF)"
    string foundation_code "15=Public Charity, etc (IRS EO-BMF)"
    string organization_code "1=Corporation, 2=Trust (IRS EO-BMF)"
    string deductibility_status "1=Deductible (IRS EO-BMF)"
    string exempt_status_code "1=Unconditional (IRS EO-BMF)"
    string causes "Every.org tags"
    string org_type
    string state_code "2-letter state (IRS EO-BMF)"
    string city "City name (IRS EO-BMF)"
    string street_address "Street address (IRS EO-BMF)"
    string zip_code "ZIP code (IRS EO-BMF)"
    float asset_amount "Total assets (IRS EO-BMF)"
    float income_amount "Annual income (IRS EO-BMF)"
    float revenue_amount "Total revenue (IRS EO-BMF)"
    string ruling_date "Tax-exempt ruling date YYYYMM (IRS EO-BMF)"
    string tax_period "Tax period YYYYMM (IRS EO-BMF)"
    string activity_codes "Activity classification (IRS EO-BMF)"
    string group_exemption "Group ruling number (IRS EO-BMF)"
    string affiliation_code "Subordinate code (IRS EO-BMF)"
    int employee_count "From Form 990"
    string mission_statement "ProPublica/Every.org"
    string description "ProPublica/Every.org"
    string logo_url "Every.org"
    string website "Discovered URL"
    boolean is_verified "Data quality flag"
    string data_source "IRS_EO_BMF/ProPublica/EveryOrg"
    datetime last_updated
}

%% Churches & Congregations (org_type='church', ntee_code='X20')
%% Data sources: IRS TEOS, ARDA, HIFLD Places of Worship, National Congregations Study
%% Many churches provide health ministries, food programs, and community services
CONGREGATION {
    string congregation_id PK
    string org_id FK
    string denomination
    string religious_tradition
    int congregation_size
    int weekly_attendance
    boolean has_health_ministry
    boolean has_food_program
    boolean has_youth_program
    boolean has_senior_program
    string service_schedule
    string clergy_name
    string clergy_title
    int volunteer_count
    float community_outreach_budget
    string facility_type
    int seating_capacity
    datetime founded_year
    string parent_denomination
    boolean serves_underserved
}

%% Nonprofit 990 Financial Data
NONPROFIT_FINANCES {
    string filing_id PK
    string ein FK
    int tax_year
    float total_revenue
    float total_expenses
    float total_assets
    float total_liabilities
    float net_assets
    float program_expenses
    float admin_expenses
    float fundraising_expenses
    float grants_paid
    float contributions_received
    float government_grants
    float foundation_grants
    float corporate_donations
    float individual_donations
    float membership_dues
    float special_events_revenue
    float program_service_revenue
    float investment_income
    float rental_income
    float sale_of_assets
    float other_revenue
    float employee_compensation
    int employee_count
    int volunteer_count
    float overhead_ratio
    float fundraising_efficiency
    string form_990_url
    datetime filing_date
}

%% Grant Transactions (Individual Grants)
ORGANIZATION ||--o{ GRANT : receives
JURISDICTION ||--o{ GRANT : awards
GRANT {
    string grant_id PK
    string recipient_ein FK
    string recipient_name
    string recipient_type
    string funder_name
    string funder_ein
    string funder_type
    float grant_amount
    string grant_purpose
    string program_area
    datetime award_date
    datetime start_date
    datetime end_date
    int grant_duration_months
    string grant_status
    string funding_source
    boolean multi_year
    string restrictions
    string reporting_requirements
}

%% ========================================
%% MEDIA & COMMUNICATIONS
%% ========================================

YOUTUBE_CHANNEL {
    string channel_id PK
    string jurisdiction_id FK
    string channel_name
    string channel_url
    int subscriber_count
    int video_count
    int total_views
    string description
    datetime created_date
    datetime last_scraped
}

SOCIAL_MEDIA {
    string account_id PK
    string entity_id FK
    string entity_type
    string platform
    string handle
    string profile_url
    int follower_count
    int post_count
    boolean is_verified
    datetime last_updated
}

MEETING_PLATFORM {
    string platform_id PK
    string jurisdiction_id FK
    string platform_name
    string base_url
    string api_endpoint
    string calendar_url
    string archive_url
    boolean has_api
    datetime discovered_at
}

%% ========================================
%% OPEN STATES (LEGISLATIVE DATA)
%% ========================================

STATE_LEGISLATURE ||--o{ STATE_LEGISLATOR : has_member
STATE_LEGISLATURE ||--o{ STATE_BILL : introduces
STATE_LEGISLATURE ||--o{ STATE_COMMITTEE : contains
STATE_LEGISLATURE {
    string legislature_id PK
    string state_code
    string session_year
    string session_type
    datetime session_start
    datetime session_end
}

STATE_LEGISLATOR {
    string legislator_id PK
    string legislature_id FK
    string full_name
    string party
    string district
    string chamber
    string email
    string capitol_phone
    string photo_url
    string openstates_id
}

STATE_BILL {
    string bill_id PK
    string legislature_id FK
    string bill_number
    string title
    string summary
    string status
    string sponsors
    datetime introduced_date
    datetime last_action_date
    string openstates_url
}

STATE_COMMITTEE {
    string committee_id PK
    string legislature_id FK
    string name
    string chamber
    string parent_id FK
    string members
}

%% ========================================
%% CIVIC TECH & OPEN SOURCE PROJECTS
%% ========================================
%% Data sources: GitHub API, Civic Tech Field Guide, Code for America, USDR, Digital Public Goods Alliance
%% Treats open source projects as first-class civic entities

CIVIC_TECH_PROJECT ||--o{ PROJECT_CONTRIBUTOR : has
CIVIC_TECH_PROJECT ||--o{ PROJECT_ISSUE : tracks
CIVIC_TECH_PROJECT ||--o{ PROJECT_FUNDING : receives
CIVIC_TECH_PROJECT ||--o{ HACKATHON_PROJECT : originates_from
CIVIC_TECH_PROJECT {
    string project_id PK
    string repository_url
    string github_owner
    string github_repo
    string project_name
    string description
    string readme_content
    string primary_language
    string tech_stack
    int star_count
    int fork_count
    int contributor_count
    int open_issue_count
    string license_type
    string project_category
    string civic_tech_topics
    string issue_area
    boolean is_dpg_certified
    string project_status
    string homepage_url
    string documentation_url
    datetime created_at
    datetime last_commit
    datetime last_updated
}

PROJECT_CONTRIBUTOR {
    string contributor_id PK
    string project_id FK
    string github_username
    string display_name
    string email
    string role
    int commit_count
    int pr_count
    int issue_count
    boolean is_maintainer
    boolean is_core_contributor
    string sponsor_url
    datetime first_contribution
    datetime last_contribution
}

PROJECT_ISSUE {
    string issue_id PK
    string project_id FK
    int issue_number
    string title
    string description
    string status
    string labels
    boolean is_good_first_issue
    boolean is_help_wanted
    int reaction_count
    int comment_count
    string assigned_to
    datetime created_at
    datetime closed_at
}

PROJECT_FUNDING {
    string funding_id PK
    string project_id FK
    string funding_source
    float monthly_revenue
    int sponsor_count
    string grant_name
    float grant_amount
    string funding_platform
    datetime funding_date
}

%% Hackathons & Civic Tech Events
HACKATHON ||--o{ HACKATHON_PROJECT : produces
HACKATHON ||--o{ HACKATHON_PARTICIPANT : includes
HACKATHON {
    string hackathon_id PK
    string event_name
    string organizer
    string location
    string city
    string state
    datetime start_date
    datetime end_date
    string event_type
    string focus_area
    int participant_count
    int project_count
    string event_url
    string registration_url
    boolean is_virtual
    string brigade_chapter
    string sponsor_organizations
    datetime created_at
}

HACKATHON_PROJECT {
    string hackathon_project_id PK
    string hackathon_id FK
    string project_id FK
    string project_name
    string description
    string team_members
    string repository_url
    string demo_url
    string presentation_url
    boolean won_award
    string award_category
    boolean became_ongoing_project
    datetime created_at
}

HACKATHON_PARTICIPANT {
    string participant_id PK
    string hackathon_id FK
    string participant_name
    string email
    string github_username
    string role
    string skills
    string team_name
    datetime registered_at
}

%% Code for America Brigades
BRIGADE_CHAPTER ||--o{ HACKATHON : hosts
BRIGADE_CHAPTER ||--o{ CIVIC_TECH_PROJECT : maintains
BRIGADE_CHAPTER {
    string brigade_id PK
    string brigade_name
    string city
    string state
    string website_url
    string github_org
    string meetup_url
    int member_count
    int project_count
    string meeting_schedule
    string contact_email
    boolean is_active
    datetime founded_date
    datetime last_activity
}

%% ========================================
%% WIKIDATA ENTITIES
%% ========================================

WIKIDATA_ENTITY ||--o{ WIKIDATA_RELATIONSHIP : source
WIKIDATA_ENTITY ||--o{ WIKIDATA_RELATIONSHIP : target
WIKIDATA_ENTITY {
    string wikidata_id PK
    string entity_type
    string label
    string description
    string wikipedia_url
    string image_url
    string aliases
    string properties
    datetime last_updated
}

WIKIDATA_RELATIONSHIP {
    string relationship_id PK
    string source_id FK
    string target_id FK
    string predicate
    datetime start_date
    datetime end_date
    string qualifiers
}

%% ========================================
%% COMMUNITY SOLUTIONS - USE CASE TEMPLATES
%% ========================================

COMMUNITY_SOLUTION ||--o{ SOLUTION_STAKEHOLDER : involves
COMMUNITY_SOLUTION ||--o{ SOLUTION_METRIC : tracks
COMMUNITY_SOLUTION {
    string solution_id PK
    string title
    string challenge_description
    string engagement_level "Spectrum: Inform/Consult/Involve/Collaborate/Defer"
    string sector_focus "CBOs/City-County/Philanthropy/Facilitative Leaders"
    string harvard_use_case_id "Links to Harvard Data-Smart catalog"
    string brookings_academy_id "Links to Data Academy programs"
    string ocd_division_id FK "Jurisdiction implementing solution"
    date implementation_start
    date implementation_end
    string status "Planning/Active/Completed"
    string outcome_summary
    string data_sources "Which datasets used"
    timestamp created_at
    timestamp updated_at
}

SOLUTION_STAKEHOLDER {
    string stakeholder_id PK
    string solution_id FK
    string stakeholder_type "CBO/Government/Funder/Facilitator"
    string organization_id "Links to NONPROFIT/JURISDICTION/GRANT_MAKER/OFFICIAL"
    string role_description
    string engagement_level
    boolean is_lead_organization
    timestamp created_at
    timestamp updated_at
}

SOLUTION_METRIC {
    string metric_id PK
    string solution_id FK
    string metric_name
    string metric_type "KPI/Output/Outcome/Impact"
    string data_source "Which dataset(s)"
    string metric_view_id "Links to /analytics/metric_views"
    decimal baseline_value
    decimal target_value
    decimal current_value
    date measurement_date
    string notes
    timestamp created_at
    timestamp updated_at
}

%% ========================================
%% DBPEDIA ENTITIES
%% ========================================

DBPEDIA_RESOURCE {
    string resource_uri PK
    string label
    string description
    string categories
    string classes
    string infobox_properties
    string wikipedia_url
    int ref_count
    datetime extracted_at
}

%% ========================================
%% GOOGLE CIVIC DATA
%% ========================================

CIVIC_DIVISION ||--o{ CIVIC_REPRESENTATIVE : has
CIVIC_DIVISION {
    string ocd_id PK
    string division_name
    string division_type
    string state
    string county
    string office_types
    string boundaries_geojson
}

CIVIC_REPRESENTATIVE {
    string representative_id PK
    string ocd_id FK
    string name
    string office_name
    string party
    string phones
    string emails
    string urls
    string social_channels
    string photo_url
}

CIVIC_ELECTION {
    string election_id PK
    string name
    datetime election_day
    string ocd_division FK
    string contests
    string polling_locations
}

%% ========================================
%% USER & SOCIAL FEATURES
%% ========================================

USER ||--o{ USER_FOLLOW : follows
USER ||--o{ LEADER_FOLLOW : follows_leader
USER ||--o{ ORG_FOLLOW : follows_org
USER ||--o{ CAUSE_FOLLOW : follows_cause
USER {
    int user_id PK
    string email
    string username
    string full_name
    string oauth_provider
    string state
    string county
    string city
    string school_board
    boolean profile_completed
    datetime created_at
}

USER_FOLLOW {
    int id PK
    int follower_id FK
    int following_id FK
    datetime created_at
}

LEADER_FOLLOW {
    int id PK
    int user_id FK
    string leader_id FK
    datetime created_at
}

ORG_FOLLOW {
    int id PK
    int user_id FK
    string org_id FK
    datetime created_at
}

CAUSE ||--o{ CAUSE_FOLLOW : followed_by
CAUSE {
    int cause_id PK
    string name
    string slug
    string description
    string category
    string icon_url
    string color
    int follower_count
}

CAUSE_FOLLOW {
    int id PK
    int user_id FK
    int cause_id FK
    datetime created_at
}

%% ========================================
%% MEETINGBANK (HUGGINGFACE DATASET)
%% ========================================

MEETINGBANK_MEETING {
    string instance_id PK
    string city_name
    datetime meeting_date
    string transcript_text
    string summary_text
    string source_url
    string split
    datetime ingested_at
}

%% ========================================
%% MICROSOFT CDM: NONPROFIT CONSTITUENT MANAGEMENT
%% Based on Microsoft Common Data Model for Nonprofits
%% See: https://github.com/microsoft/Nonprofits/
%% ========================================

CONSTITUENT ||--o{ DONATION : makes
CONSTITUENT ||--o{ MEMBERSHIP : enrolls_in
CONSTITUENT ||--o{ VOLUNTEER_ACTIVITY : participates_in
CONSTITUENT {
    string constituent_id PK
    string constituent_type "Donor, Volunteer, Member, Beneficiary"
    string first_name
    string last_name
    string email
    string phone
    string address
    string city
    string state_code
    string zip_code
    datetime first_engagement_date
    datetime last_engagement_date
    float lifetime_giving_total
    int volunteer_hours_total
    string preferred_communication
    boolean is_active
    datetime created_at
}

CAMPAIGN ||--o{ DONATION : receives
ORGANIZATION ||--o{ CAMPAIGN : runs
CAMPAIGN {
    string campaign_id PK
    string org_id FK
    string campaign_name
    string campaign_type "Annual Fund, Capital, Major Gifts, Peer-to-Peer"
    datetime start_date
    datetime end_date
    float goal_amount
    float raised_amount
    int donor_count
    string status "Planning, Active, Completed, Cancelled"
    string description
    datetime created_at
}

DONATION ||--o| DESIGNATION : allocated_to
DONATION {
    string donation_id PK
    string constituent_id FK
    string campaign_id FK
    string designation_id FK "Program, Fund, or Campaign"
    float amount
    string donation_type "Cash, Stock, In-Kind, Pledge"
    datetime donation_date
    string payment_method "Check, Credit Card, ACH, Wire"
    string acknowledgment_status "Pending, Sent, Thanked"
    boolean is_recurring
    string recurring_frequency "Monthly, Quarterly, Annual"
    string receipt_number
    datetime created_at
}

DESIGNATION {
    string designation_id PK
    string designation_name "General Fund, Building Fund, Program X"
    string designation_type "Unrestricted, Restricted, Endowment"
    string fund_code
    float current_balance
    string description
}

MEMBERSHIP {
    string membership_id PK
    string constituent_id FK
    string org_id FK
    string membership_type "Individual, Family, Corporate, Lifetime"
    datetime start_date
    datetime end_date
    float membership_fee
    string status "Active, Expired, Cancelled, Grace Period"
    boolean auto_renew
    datetime renewal_date
    string benefits "Newsletter, Events, Discounts"
    datetime created_at
}

VOLUNTEER_ACTIVITY {
    string activity_id PK
    string constituent_id FK
    string org_id FK
    string activity_type "Event, Ongoing, Skilled, Board Service"
    datetime activity_date
    float hours_logged
    string role "Coordinator, Assistant, Specialist"
    string skills_used "IT, Legal, Marketing, Manual Labor"
    string supervisor
    string notes
    datetime created_at
}

PROGRAM_DELIVERY ||--o{ PROGRAM_OUTCOME : achieves
ORGANIZATION ||--o{ PROGRAM_DELIVERY : delivers
JURISDICTION ||--o{ PROGRAM_DELIVERY : serves
PROGRAM_DELIVERY {
    string program_id PK "Also iati-identifier"
    string org_id FK
    string program_name
    string program_type "Direct Service, Advocacy, Education, Research"
    string target_population "Youth, Seniors, Low-Income, Veterans"
    int beneficiaries_served
    datetime start_date "IATI: activity-date[@type='start-planned']"
    datetime end_date "IATI: activity-date[@type='end-actual']"
    float program_budget "IATI: budget[@type='original']"
    float program_expenses "IATI: transaction[@type='4']"
    string status "Active, Completed, On Hold"
    string description
    string iati_identifier "Unique IATI activity ID"
    string iati_activity_status "1=Pipeline, 2=Active, 3=Completed, 4=Suspended, 5=Cancelled"
    string iati_sector_code "OECD DAC 5-digit sector code"
    string iati_sector_vocabulary "DAC, NTEE, Custom"
    string recipient_country_code "ISO 3166-1 alpha-2"
    string recipient_region "Sub-national region"
    datetime created_at
}

PROGRAM_OUTCOME {
    string outcome_id PK
    string program_id FK
    string outcome_name "Literacy Rate, Job Placement, Health Improvement"
    string metric_type "Percentage, Count, Score, Binary"
    float baseline_value "IATI: baseline[@year, @value]"
    float target_value "IATI: target[@value]"
    float actual_value "IATI: actual[@value]"
    string measurement_period "Monthly, Quarterly, Annual"
    datetime measurement_date
    datetime period_start "IATI: period-start[@iso-date]"
    datetime period_end "IATI: period-end[@iso-date]"
    string data_source "Survey, Administrative, Third-Party"
    string iati_result_type "1=Output, 2=Outcome, 3=Impact, 9=Other"
    string iati_indicator_measure "1=Unit, 2=Percentage, 3=Nominal, 4=Ordinal, 5=Qualitative"
    boolean iati_ascending "True if higher is better"
    string notes
    datetime created_at
}

%% ========================================
%% BALLOT MEASURES & ADVOCACY
%% Data Sources: Ballotpedia (comprehensive measures), 
%%               MIT Election Lab (federal results),
%%               OpenElections (certified state results)
%% See: ballot-election-sources.md
%% Schema.org type: Legislation (with referendumProposal property)
%% ========================================

JURISDICTION ||--o{ BALLOT_MEASURE : hosts
STATE_LEGISLATURE ||--o{ BALLOT_MEASURE : proposes
POLICY_TOPIC ||--o{ BALLOT_MEASURE : addresses
BALLOT_MEASURE {
    string measure_id PK
    string jurisdiction_id FK "OCD-ID format"
    string state_code "Two-letter code"
    datetime election_date
    string measure_number "Proposition 15, Question 2"
    string title "Measure title"
    string description "Full description"
    string measure_type "Initiative, Referendum, Bond"
    string topic_category "fluoridation, education, tax"
    string status "qualified, certified, passed, failed"
    string result "passed, failed, pending"
    int yes_votes "Total yes votes"
    int no_votes "Total no votes"
    float yes_percentage "Yes vote percentage"
    string full_text_url "Official measure text"
    string ballotpedia_url "Ballotpedia reference"
    string openelections_source "OpenElections CSV file"
    datetime created_at
}

POLICY_TOPIC ||--o{ MEETING : discussed_in
POLICY_TOPIC ||--o{ LEGISLATION : addresses
POLICY_TOPIC ||--o{ SURVEY_VARIABLE : measured_by
POLICY_TOPIC {
    string topic_id PK
    string topic_name "Water Fluoridation Support"
    string category "health_policy"
    string description "Public opinion on fluoridation"
    string keywords "fluoride, water treatment"
    int priority_level "1-10 importance ranking"
    string icon "🦷"
    int jurisdiction_count "How many jurisdictions discuss"
    datetime created_at
}

%% ========================================
%% PUBLIC OPINION SURVEYS & POLLING
%% Data Sources: Roper Center iPoll, Pew Research, Gallup, ANES
%% Standardized survey metadata and response data
%% ========================================

SURVEY_PROVIDER ||--o{ SURVEY : conducts
SURVEY_PROVIDER {
    string provider_id PK
    string provider_name "Gallup, Pew Research Center, Roper Center"
    string organization_type "Academic, Commercial, Government, Nonprofit"
    string country "US, UK, International"
    string website_url
    string methodology_url
    boolean is_member_aapor "American Association for Public Opinion Research"
    boolean is_member_ncpp "National Council on Public Polls"
    string data_access_policy "Open, Restricted, Membership, Purchase"
    datetime founded_date
    datetime created_at
}

SURVEY ||--o{ SURVEY_VARIABLE : contains
JURISDICTION ||--o{ SURVEY : targets
SURVEY {
    string survey_id PK
    string provider_id FK
    string study_name "Gallup Poll Social Series: Environment"
    string study_number "USGALLUP.031915 (Roper format)"
    string roper_ipoll_id "Unique iPoll identifier"
    datetime field_start_date "First day of data collection"
    datetime field_end_date "Last day of data collection"
    int sample_size "Total respondents"
    string population "U.S. adults 18+, Registered voters"
    string sampling_method "Random digit dial, Address-based, Online panel"
    string mode "Telephone, Web, In-person, Mail, Mixed"
    float response_rate "AAPOR RR1 response rate percentage"
    float margin_of_error "95% confidence interval MOE"
    string weighting_variables "Age, gender, race, education, region"
    string geographic_coverage "National, State, County, City"
    string jurisdiction_id FK "If targeted to specific jurisdiction"
    string funding_source "NSF, Private foundation, Media sponsor"
    string data_collection_firm "Contract research organization"
    boolean is_cross_sectional "True if one-time, False if panel/longitudinal"
    string language "English, Spanish, Bilingual"
    string questionnaire_url "Link to full questionnaire PDF"
    string topline_results_url "Frequency tables"
    string microdata_url "Individual-level data file"
    datetime created_at
}

SURVEY_VARIABLE ||--o{ SURVEY_RESPONSE : has_responses
POLICY_TOPIC ||--o{ SURVEY_VARIABLE : measures
SURVEY_VARIABLE {
    string variable_id PK
    string survey_id FK
    string topic_id FK "Links to POLICY_TOPIC"
    string variable_name "Q12, V0023, FLUORIDE_SUPPORT"
    string question_text "Do you favor or oppose adding fluoride to your community's water supply?"
    string question_wording_exact "Full verbatim question including intro"
    int question_number "Order in questionnaire"
    string variable_label "Fluoride support"
    string response_type "Single choice, Multiple choice, Numeric, Text"
    string scale_type "Binary, Likert 5-point, 0-10, Thermometer"
    string response_options_json "JSON array of coded responses"
    boolean is_required "Skip logic"
    string filter_condition "If Q11=Yes, ask Q12"
    string universe "All respondents, Homeowners only, etc"
    string notes "Interviewer instructions, context"
    datetime created_at
}

SURVEY_RESPONSE {
    string response_id PK
    string variable_id FK
    string jurisdiction_id FK "For geographic crosstabs"
    string response_value "Favor, Oppose, Don't know"
    int response_code "1, 2, 98"
    int unweighted_count "Raw number of respondents"
    int weighted_count "Weighted frequency"
    float percentage "Weighted percentage"
    float standard_error "SE for percentage estimate"
    string demographic_filter "Age 18-29, College grad, Republican"
    string geographic_filter "Northeast region, Urban, Alabama"
    string crosstab_type "Total, By age, By education, By party ID"
    datetime created_at
}

JURISDICTION ||--o{ LEGISLATION : enacts
STATE_LEGISLATURE ||--o{ LEGISLATION : proposes
%% Schema.org type: Legislation
LEGISLATION {
    string bill_id PK
    string jurisdiction_id FK
    string state_code
    string bill_number
    string title
    string description
    string status
    string sponsor
    datetime introduced_date
    datetime passed_date
    datetime effective_date
    string full_text_url
    string openstates_url
    string topic_category
    string chamber
    int vote_yes
    int vote_no
}

%% ========================================
%% POLITICAL CAMPAIGN FINANCE (FEC DATA)
%% ========================================
%% Data Source: OpenFEC API (https://api.open.fec.gov/developers/)
%% Tracks individual political contributions, candidates, and committees
%% Links to nonprofits via employer matching and officer name matching

JURISDICTION ||--o{ POLITICAL_CANDIDATE : represents
POLITICAL_CANDIDATE ||--o{ POLITICAL_CONTRIBUTION : receives
POLITICAL_COMMITTEE ||--o{ POLITICAL_CONTRIBUTION : receives
ORGANIZATION ||--o{ NONPROFIT_POLITICAL_DONOR : employs
POLITICAL_CONTRIBUTION ||--o{ NONPROFIT_POLITICAL_DONOR : matched_from

POLITICAL_CANDIDATE {
    string candidate_id PK "FEC candidate ID"
    string state_code FK
    string candidate_name
    string party "DEM, REP, IND, etc"
    string office "H=House, S=Senate, P=President"
    string office_full "House, Senate, President"
    string district "Congressional district (for House)"
    string election_year
    string incumbent_challenge "I=Incumbent, C=Challenger, O=Open"
    string candidate_status "Active, Inactive"
    int cycle "Election cycle year"
    datetime created_at
}

POLITICAL_COMMITTEE {
    string committee_id PK "FEC committee ID"
    string state_code FK
    string committee_name
    string committee_type "H=House, S=Senate, P=Presidential, N=PAC, Q=Super PAC"
    string committee_type_full
    string designation "P=Principal, A=Authorized, J=Joint"
    string designation_full
    string party "DEM, REP, etc"
    string treasurer_name
    string organization_type
    string filing_frequency
    datetime created_at
}

POLITICAL_CONTRIBUTION {
    string contribution_id PK "FEC sub_id"
    string state_code FK
    string contributor_name
    string contributor_city
    string contributor_state
    string contributor_zip
    string contributor_employer "Links to ORGANIZATION"
    string contributor_occupation
    float contribution_amount
    datetime contribution_date
    string recipient_committee_id FK
    string recipient_committee_name
    string candidate_id FK
    string candidate_name
    string election_type
    string entity_type "Individual, Committee, etc"
    string memo_text
    datetime created_at
}

%% Analysis table linking political contributions to nonprofit leadership
NONPROFIT_POLITICAL_DONOR {
    string donor_analysis_id PK
    string ein FK "Links to ORGANIZATION"
    string organization_name
    string contributor_name
    string contributor_title "Inferred from employer or matched from officers"
    float contribution_amount
    datetime contribution_date
    string recipient_name "Committee or candidate name"
    string candidate_name
    string match_method "Employer Name, Officer Name Match"
    datetime created_at
}

%% ========================================
%% SYSTEM-INTERNAL VOCABULARY TABLES
%% (OMOP-Inspired Concept System)
%% ========================================
%% Based on OHDSI Athena vocabulary structure
%% ID range: 2,000,000,000+ for custom civic concepts
%% These tables appear at bottom to avoid confusing non-technical users

VOCABULARY ||--o{ CONCEPT : contains
CONCEPT_CLASS ||--o{ CONCEPT : categorizes
DOMAIN ||--o{ CONCEPT : groups
CONCEPT ||--o{ CONCEPT_RELATIONSHIP : source
CONCEPT ||--o{ CONCEPT_RELATIONSHIP : target

%% Concept references from other entities
CONCEPT ||--o{ JURISDICTION : city_concept_id
CONCEPT ||--o{ ORGANIZATION : organization_concept_id
CONCEPT ||--o{ LEADER : position_concept_id
CONCEPT ||--o{ POLICY_TOPIC : topic_concept_id

VOCABULARY {
    string vocabulary_id PK "OCD_ID, IRS_NTEE, US_Census, OHDSI_Gender"
    string vocabulary_name
    string vocabulary_reference "URL or citation"
    string vocabulary_version
    string vocabulary_concept_id FK
}

DOMAIN {
    string domain_id PK "Jurisdiction, Nonprofit, Policy, Gender"
    string domain_name
    string domain_concept_id FK
}

CONCEPT_CLASS {
    string concept_class_id PK "City, County, 501c3, Mayor"
    string concept_class_name
    string concept_class_concept_id FK
}

CONCEPT {
    int concept_id PK "2B+ for custom civic concepts"
    string concept_name "Display name"
    string domain_id FK "Jurisdiction, Nonprofit, Policy"
    string vocabulary_id FK "OCD_ID, IRS_NTEE, US_Census"
    string concept_class_id FK "City, County, 501c3, Mayor"
    string standard_concept "S=Standard, C=Classification"
    string concept_code "External identifier"
    datetime valid_start_date
    datetime valid_end_date
    string invalid_reason
}

CONCEPT_RELATIONSHIP {
    int concept_id_1 FK "Source concept"
    int concept_id_2 FK "Target concept"
    string relationship_id "Is part of, Regulates, Addresses"
    datetime valid_start_date
    datetime valid_end_date
    string invalid_reason
}

`} />

βš–οΈ Political Economy Analysis Framework

The ERD includes specialized entities for political economy analysis - understanding the "WHY" behind government decisions, not just the "WHAT". These entities support a 4-step advocacy framework to expose gaps between rhetoric and reality:

The 4-Step Framework for Effective Change

Step 1: Rhetoric Gap - Frame Analysis

Goal: Establish they ALREADY agree it's important (stop the "need" debate)

ERD Support:

  • DECISION_FRAME: Tracks how decisions are framed ("public health" vs "fiscal responsibility" vs "equity")
  • KEYWORD_DENSITY: Measures rhetoric patterns ("priority", "essential", "critical" per 1000 words)
  • POLICY_DECISION.primary_frame: Primary framing language used

Analysis Output:

Frame Distribution:
  12x public health
   8x fiscal responsibility  
   5x equity/access
   3x economic development

β†’ They SAY oral health is a "priority" - hold them to it!

Step 2: Displacement Matrix - Budget-to-Minutes Delta

Goal: Show they HAD the money (stop the "budget constraint" excuse)

ERD Support:

  • BUDGET_LINE_ITEM: Detailed budget categories with year-over-year changes
  • BUDGET_DELTA: Compares meeting rhetoric to actual funding changes
  • BUDGET_DELTA.delta_type: Classifies as "Expansion", "Lip Service", or "Hidden Priority"
  • BUDGET_DELTA.delta_score: Quantifies rhetoric vs reality gap (-1 to +1)

Analysis Output:

🎭 Lip Service Detected:
   β€’ School dental program: -$50,000 decrease
     Mentioned 12x in meetings as "critical for children"
     Logic: PERFORMATIVE POLITICS

πŸ’° Hidden Priority (Where the money REALLY went):
   β€’ IT Infrastructure: +$200,000 increase
     Only mentioned 1x in meetings
     Logic: Bureaucratic inertia - avoiding scrutiny

Step 3: Influence Radar - Stakeholder Analysis

Goal: Name who's blocking it (force personal accountability)

ERD Support:

  • STAKEHOLDER_POSITION: Who spoke for/against with their arguments
  • STAKEHOLDER_POSITION.speaking_order: Track who speaks when (early speakers often most influential)
  • DECISION_VOTE: Individual vote records with stated reasons
  • DECISION_VOTE.switched_position: Flag when officials change their stance

Analysis Output:

πŸ‘₯ Blocking Coalition:
   β€’ Taxpayer Association (opponent) - spoke 1st
     Argument: "Cost concerns in tight budget"
     Counter: School nurses budget UP $300K same meeting
   
   β€’ Council Member Smith (voted NO)
     Stated reason: "Need more study"
     Pattern: Has voted NO on 3 oral health items since 2022

Step 4: Deferral Pattern - Temporal Voting Analysis

Goal: Show they're stalling, not studying (expose the tactic)

ERD Support:

  • DEFERRAL_PATTERN: Tracks same topic across multiple meetings
    • first_mentioned: When decision was first introduced
    • last_discussed: Most recent tabling date
    • total_deferrals: How many times tabled/postponed
    • months_in_limbo: Time elapsed since first mention
    • pattern_type: "Rationale of Attrition" / "Sincere Analysis" / "Political Timing"
    • strategic_inference: Inferred reason for delay
    • next_review_date: When scheduled to revisit (if stated)
  • DEFERRAL_INSTANCE: Individual tabling events with dates
    • deferral_date: When it was tabled
    • stated_reason: Official justification given
    • speaker: Who gave the reason
    • months_since_first: Time elapsed
    • reason_changed: Flag for shifting justifications
  • POLICY_DECISION.outcome: Captures "tabled/deferred/postponed"
  • DECISION_OPTION.rejection_reason: Stated excuses for delay
  • ELECTION_CYCLE: Links decisions to election timelines
  • ELECTION_CYCLE.pre_election_spike_detected: Flag incumbent protection patterns

Complete Date Tracking:

-- All dates needed to track stalling:
MEETING.meeting_date              -- When meeting occurred
POLICY_DECISION.meeting_date      -- When decision was discussed
DEFERRAL_PATTERN.first_mentioned  -- First introduction date
DEFERRAL_PATTERN.last_discussed   -- Most recent tabling
DEFERRAL_INSTANCE.deferral_date   -- Each individual deferral
ELECTION_CYCLE.election_date      -- Election timing context
DECISION_VOTE.days_since_election -- Temporal political context

Analysis Output:

πŸ“… Stalling Pattern DETECTED:
   β€’ Topic: "Fluoridation proposal"
   β€’ First mentioned: 2020-03-15 (4 years, 1 month ago)
   β€’ Total deferrals: 4 times
   β€’ Pattern type: Rationale of Attrition
   
   πŸ“Š Timeline of Shifting Justifications:
     - 2020-03: "Need more study" (12mo before election)
     - 2020-09: "Budget constraints" (6mo before election)  
     - 2022-01: "Need public input" (new council members)
     - 2024-05: "Awaiting staff report" (still pending)
   
   πŸ” Strategic Inference:
   "They're NOT studying - they're AVOIDING! The board isn't 
   debating the merit; they're waiting for the advocate's 
   momentum to fade before the next election cycle."
   
   🚨 Discomfort Score: 10/10 (Extremely politically sensitive)

Quantitative "Why" Indicators

Additional metrics to infer governance logic:

Entity Metric Reveals
POLICY_DECISION.contention_score Ratio of dissent (0-100) Political sensitivity of topic
KEYWORD_DENSITY "grant" vs "taxpayer" frequency Decision driver: outside funding vs local demand
KEYWORD_DENSITY "emergency" occurrence Reactive vs planned governance
ELECTION_CYCLE.avg_project_cost_before Spending spikes pre-election Incumbency protection tactics
BUDGET_DELTA.underlying_logic Genuine vs Performative vs Bureaucratic Real priorities revealed

Implementation Files

These analyses are fully implemented in the codebase:

  • extraction/decision_analyzer.py - Frame analysis, stakeholder extraction
  • extraction/budget_analyzer.py - Budget delta calculation, opportunity cost mapping
  • extraction/temporal_analyzer.py - Election cycle analysis, deferral patterns
  • examples/tuscaloosa_political_economy.py - Complete end-to-end analysis

See Political Economy Analysis Guide for detailed implementation status and usage examples.


🌐 Data Standards & Interoperability

Popolo Project Alignment

Our data model follows the Popolo Project specification for representing people, organizations, and elected positions. Popolo is an international open government data standard adopted by 30+ civic tech organizations worldwide including mySociety, Sunlight Foundation, OpenNorth, and Civic Commons.

Popolo Class Mappings

Popolo Class Our Entity Description Key Fields
Person LEADER Elected officials, government employees full_name, email, phone, photo_url
Organization ORGANIZATION Nonprofits, government agencies, companies name, org_type, address, website
Membership LEADER ↔ ORGANIZATION Relationship between people and organizations leader_id, jurisdiction_id, term_start, term_end
Post LEADER.position_type Positions within organizations title, office, position_type
Contact Detail Embedded fields Communication methods email, phone, website in multiple entities
Motion AGENDA, LEGISLATION Formal proposals for decision title, description, status
Vote Event VOTE Voting on motions/bills vote_date, item_description
Count VOTE, LEGISLATION Vote tallies vote_yes, vote_no, vote_value
Area JURISDICTION Geographic/political boundaries jurisdiction_type, state_code, county_name
Event MEETING Gatherings with agendas meeting_date, meeting_type, body_name
Speech MINUTES, VIDEO Spoken statements transcript_text, summary_text

Schema.org Type Mappings

Our entities map to Schema.org types for SEO-optimized structured data and semantic web compatibility:

Our Entity Schema.org Type Properties JSON-LD Export
JURISDICTION AdministrativeArea name, address, geo, telephone, url βœ… City/county pages
MEETING Event name, startDate, endDate, location, organizer βœ… Google Calendar rich results
LEADER Person + GovernmentOfficial name, email, telephone, jobTitle βœ… Official profiles
ORGANIZATION Organization + NGO name, address, telephone, foundingDate βœ… Nonprofit listings
LEGISLATION Legislation name, legislationDate, legislationPassedBy βœ… Bill tracking
BALLOT_MEASURE Legislation name, datePosted, legislationChanges βœ… Ballot guides
VOTE VoteAction agent, candidate, actionOption βœ… Voting records
FACT_CHECK ClaimReview claimReviewed, reviewRating, author βœ… Google Fact Check Explorer
SCHOOL_DISTRICT EducationalOrganization name, numberOfStudents, address βœ… School district info
VIDEO VideoObject name, description, uploadDate, duration βœ… YouTube integration
DOCUMENT DigitalDocument name, fileFormat, datePublished βœ… Document library
Microsoft CDM Nonprofit Entities
CONSTITUENT Person name, email, telephone, address βœ… Donor/volunteer profiles
DONATION DonateAction agent (Person), recipient (Organization), price βœ… Donation receipts
CAMPAIGN FundingScheme name, startDate, endDate, url βœ… Fundraising campaigns
MEMBERSHIP ProgramMembership member (Person), hostingOrganization, membershipNumber βœ… Member cards
VOLUNTEER_ACTIVITY VolunteerAction agent (Person), startTime, endTime, location βœ… Volunteer tracking
PROGRAM_DELIVERY Service name, provider, serviceType, areaServed βœ… Program catalog
PROGRAM_OUTCOME Observation measurementTechnique, measuredValue, observationDate βœ… Impact reporting

Example: Meeting as Schema.org Event

{
  "@context": "https://schema.org",
  "@type": "Event",
  "@id": "https://www.communityone.com/meetings/city-council-2024-01-15",
  "name": "Birmingham City Council Regular Meeting",
  "description": "Monthly city council meeting covering budget, zoning, and public health initiatives",
  "startDate": "2024-01-15T18:00:00-06:00",
  "endDate": "2024-01-15T20:30:00-06:00",
  "eventStatus": "https://schema.org/EventScheduled",
  "eventAttendanceMode": "https://schema.org/MixedEventAttendanceMode",
  "location": {
    "@type": "Place",
    "name": "Birmingham City Hall",
    "address": {
      "@type": "PostalAddress",
      "streetAddress": "710 N 20th St",
      "addressLocality": "Birmingham",
      "addressRegion": "AL",
      "postalCode": "35203"
    }
  },
  "organizer": {
    "@type": "GovernmentOrganization",
    "name": "Birmingham City Council",
    "url": "https://www.birminghamal.gov/council/"
  },
  "recordedIn": {
    "@type": "VideoObject",
    "name": "City Council Meeting Recording",
    "uploadDate": "2024-01-16",
    "duration": "PT2H30M",
    "thumbnailUrl": "https://example.com/thumbnail.jpg",
    "contentUrl": "https://youtube.com/watch?v=example"
  },
  "subEvent": [
    {
      "@type": "Event",
      "name": "Public Comment Period",
      "startDate": "2024-01-15T18:15:00-06:00"
    }
  ]
}

Common Education Data Standards (CEDS) Alignment

Our SCHOOL_DISTRICT entity follows CEDS specifications for education data interoperability:

Our Field CEDS Element Element ID NCES Alignment
nces_id LEA Identifier (NCES) 000827 CCD LEA ID
district_name Name of Institution 000168 Official district name
district_type LEA Type 000108 Regular/Specialized/Service Agency
total_students Student Count 001475 Fall enrollment
total_schools Number of Schools 000856 Operational schools count
total_revenue Total Revenue 000612 F-33 Survey Line A09
total_expenditures Total Expenditures 000611 F-33 Survey Line B13
per_pupil_spending Expenditure per Student 000613 Total exp / enrollment
federal_revenue Federal Revenue 000614 ESEA Title I, IDEA
state_revenue State Revenue 000615 State aid formulas
local_revenue Local Revenue 000616 Property tax, bonds
superintendent Chief Administrator 000240 District superintendent
school_year School Year 000243 YYYY-YYYY format

CEDS Option Sets:

  • LEA Type (000108): Regular local school district, Specialized (charter/magnet), Supervisory Union, Service Agency, State/Federal Agency
  • Operational Status (000533): Open, Closed, New, Changed Agency, Temporarily Closed
  • Locale Type (001315): City (Large/Midsize/Small), Suburb, Town, Rural (NCES Urban-centric codes)

Benefits:

  • βœ… Compatible with NCES Common Core of Data (CCD) and F-33 Finance Survey
  • βœ… Aligns with Ed-Fi Alliance, IMS Global, and SIF Association standards
  • βœ… Supports federal reporting for ESSA, Title I, IDEA compliance

Microsoft Common Data Model for Nonprofits

Our nonprofit constituent management entities follow Microsoft's Common Data Model for Nonprofits, enabling seamless integration with Dynamics 365 and Power Platform:

Our Entity Microsoft CDM Entity Description Key Relationships
CONSTITUENT Constituent Donors, volunteers, members, beneficiaries β†’ DONATION, MEMBERSHIP, VOLUNTEER_ACTIVITY
DONATION Donation Financial contributions and in-kind gifts ← CONSTITUENT, β†’ CAMPAIGN, β†’ DESIGNATION
CAMPAIGN Campaign Fundraising campaigns and appeals β†’ DONATION
DESIGNATION Designation Fund allocation (programs, unrestricted, endowment) ← DONATION
MEMBERSHIP Membership Member enrollment and renewals ← CONSTITUENT, ← ORGANIZATION
VOLUNTEER_ACTIVITY Volunteer Preference Volunteer activities and hours ← CONSTITUENT
PROGRAM_DELIVERY Delivery Framework Programs and services delivered ← ORGANIZATION, β†’ PROGRAM_OUTCOME
PROGRAM_OUTCOME Objective Measurable impact and KPIs ← PROGRAM_DELIVERY

Microsoft CDM Core Patterns:

  1. Constituent-Centric Design: All engagement activities (donations, volunteering, membership) link to CONSTITUENT
  2. Designation-Based Accounting: Donations are allocated to specific designations (funds, programs, campaigns)
  3. Campaign Tracking: Multi-channel fundraising campaigns track goals, raised amounts, donor counts
  4. Outcome Measurement: Programs track objectives with target vs. actual metrics
  5. Temporal Tracking: Start/end dates on memberships, campaigns, and programs for lifecycle management

Integration Points:

Microsoft Product Integration Type Use Case
Dynamics 365 Nonprofit Native CDM compatibility CRM for constituent relationship management
Power BI Direct data connection Fundraising dashboards, donor analytics
Power Apps Low-code app builder Volunteer management apps, event registration
Power Automate Workflow automation Donation receipts, membership renewals
Azure Synapse Cloud analytics Large-scale constituent analytics

Example: Constituent-Donation Relationship

-- Find top 10 donors by lifetime giving
SELECT 
    c.constituent_id,
    c.first_name,
    c.last_name,
    c.email,
    c.lifetime_giving_total,
    COUNT(d.donation_id) as donation_count,
    AVG(d.amount) as avg_donation,
    MAX(d.donation_date) as last_donation_date
FROM CONSTITUENT c
JOIN DONATION d ON c.constituent_id = d.constituent_id
WHERE c.constituent_type = 'Donor'
GROUP BY c.constituent_id, c.first_name, c.last_name, c.email, c.lifetime_giving_total
ORDER BY c.lifetime_giving_total DESC
LIMIT 10;

Benefits:

  • βœ… Microsoft Ecosystem: Native compatibility with Dynamics 365, Power Platform, Azure
  • βœ… Industry Standard: Used by large nonprofits (United Way, Boys & Girls Clubs, etc.)
  • βœ… Grant Reporting: Built-in support for outcome tracking and funder reporting
  • βœ… LYBNT Analysis: "Last Year But Not This Year" donor reactivation queries
  • βœ… Constituent 360: Unified view of all engagement touchpoints

Underlying Standards

Popolo builds upon W3C, IETF, and DCMI specifications for maximum interoperability:

Standard Use Case Example in Our Model
FOAF (Friend of a Friend) Social network, people relationships LEADER connections, ORGANIZATION networks
vCard (IETF RFC 6350) Contact information email, phone, address fields across entities
Schema.org Structured web data Meeting metadata, organization profiles for SEO
DCMI Terms Metadata, provenance created_at, updated_at, source_url timestamps
W3C Organization Ontology Hierarchical organizations Government hierarchy, nonprofit structures
ISA Location Core Vocabulary Address standardization address, city, state_code, latitude, longitude
GeoNames Ontology Geographic identifiers Place names, jurisdiction boundaries
SKOS Taxonomies and classification NTEE codes, policy topic categories

Benefits of Standards Compliance

  1. Interoperability: Data can be easily shared with other civic tech platforms
  2. API Compatibility: Standard field names work with existing tools (e.g., EveryPolitician, Open Civic Data)
  3. Semantic Web: RDF/JSON-LD export capabilities for linked open data
  4. Tooling: Existing libraries and validators (e.g., pupa, everypolitician-popolo)
  5. Documentation: Well-documented schemas reduce onboarding time

Example: Popolo-Compatible JSON-LD Export

{
  "@context": "http://www.popoloproject.com/contexts/person.jsonld",
  "@type": "Person",
  "id": "ocd-person/12345678-90ab-cdef-1234-567890abcdef",
  "name": "Jane Doe",
  "email": "jane.doe@example.gov",
  "links": [{
    "note": "official website",
    "url": "https://example.gov/mayor"
  }],
  "memberships": [{
    "@type": "Membership",
    "organization_id": "ocd-organization/jurisdiction/us/city/springfield",
    "post_id": "mayor",
    "role": "Mayor",
    "start_date": "2022-01-01",
    "end_date": "2026-12-31"
  }],
  "contact_details": [{
    "type": "email",
    "value": "jane.doe@example.gov",
    "note": "official"
  }, {
    "type": "voice",
    "value": "+1-555-123-4567"
  }],
  "sources": [{
    "url": "https://example.gov/government/officials",
    "note": "Official city website"
  }]
}

Open Civic Data (OCD-ID) Identifiers

We use OCD-IDs for jurisdiction identifiers following OCDEP 2:

Format: ocd-division/country:<country_code>/<type>:<type_id>

Examples:
- State: ocd-division/country:us/state:al
- County: ocd-division/country:us/state:al/county:jefferson
- City: ocd-division/country:us/state:al/place:birmingham
- School District: ocd-division/country:us/state:al/school_district:birmingham_city

Normalization Rules:

  • Lowercase ASCII characters (a-z)
  • Numbers (0-9)
  • Valid punctuation: ._~-
  • Spaces β†’ underscores
  • Remove special characters

πŸ“Š Data Statistics

Entity Type Estimated Count Source
Jurisdictions 22,000+ Census Gazetteer
Counties 3,144 FIPS codes
Cities 19,000+ Incorporated places
School Districts 13,000+ NCES CCD
School District Budgets 13,000+ NCES F-33 Finance Survey
Government Budgets 22,000+ Census of Governments
Municipal Bonds TBD EMMA (MSRB)
Nonprofits 3,000,000+ IRS TEOS
Nonprofit 990 Filings 10,000,000+ ProPublica (10+ years)
Microsoft CDM: Nonprofit Engagement
Constituents TBD Donors, volunteers, members, beneficiaries (Microsoft CDM)
Donations TBD Financial contributions and in-kind gifts (Microsoft CDM)
Campaigns TBD Fundraising campaigns and appeals (Microsoft CDM)
Memberships TBD Member enrollments and renewals (Microsoft CDM)
Volunteer Activities TBD Volunteer hours and service events (Microsoft CDM)
Program Delivery Records TBD Programs and services delivered (Microsoft CDM)
Program Outcomes TBD Impact metrics and KPIs (Microsoft CDM)
Grants (Individual Awards) TBD IRS 990-I, USASpending.gov, Foundation Center
Federal Grants 100,000+ USASpending.gov API
Nonprofit Causes 600+ NTEE + Every.org
YouTube Channels 5,000+ Discovery pipeline
Meeting Platforms 10,000+ URL detection
State Legislators 7,300+ Open States
Meetings & Events 500,000+ Scraped (govt, hearings, events, trainings)
Trainings TBD Professional development, workshops
Documents 2,000,000+ PDF extraction
Ballot Measures TBD State/local election sites
State Bills 100,000+ Open States API
Policy Topics ~50 Curated + extracted
Analytics & Standards
Date Dimension Records ~7,300 20 years (2010-2030)
Metric Views ~100 Pre-computed analytics definitions
Temporal Relationships ~1M+ Date keys for all time-based entities
Schema.org JSON-LD Exports ~500K+ Event, Person, Organization, Legislation, ClaimReview
Popolo Exports ~100K+ Person, Organization, Membership, VoteEvent
CEDS-Aligned Records 13,000+ School districts with NCES Element IDs
OCD Division IDs 22,000+ All jurisdictions with standardized identifiers
IATI Activity Files TBD Programs, grants, humanitarian aid (v2.03 XML)
Fact-Checking
Verified Claims ~50K+ Google Fact Check API
PolitiFact Ratings ~20K+ Truth-O-Meter rulings
FactCheck.org Articles ~10K+ Verified fact-checks
Vocabulary & Concepts (OMOP-Inspired)
Concept Entries ~2M+ Cities, nonprofits, officials, topics, demographics
Vocabularies 10+ OCD_ID, IRS_NTEE, US_Census, NCES, OHDSI (Gender/Race/Ethnicity)
Concept Classes 20+ City, County, 501c3, Mayor, Health Policy, etc.
Concept Relationships ~5M+ Hierarchies (City→County→State), Associations (Topic→Legislation)
OHDSI Gender Concepts 3 MALE, FEMALE, OTHER (Athena standard)
OHDSI Race Concepts 20+ Census OMB categories (Athena standard)
OHDSI Ethnicity Concepts 2 Hispanic or Latino, Not Hispanic or Latino (Athena standard)

οΏ½ See Also

:::tip[Complete Citations & Attributions] For full citations, licenses, BibTeX references, and detailed attribution for all data sources, standards, and research:

πŸ‘‰ View Citations & Data Sources

Includes academic research, government data APIs, civic tech standards (OCD-ID, Popolo, Schema.org, CEDS, IATI), Microsoft CDM, OMOP CDM, fact-checking sources, and more. :::

οΏ½πŸ“ Meeting & Event Types

Event Categories in the MEETING Entity

The MEETING entity tracks 4 main event categories to capture all civic engagement opportunities:

1. Government Meetings (event_category: "government_meeting")

  • City council meetings, school board meetings, county commissions
  • Official business conducted by elected bodies
  • Fields: body_name, meeting_type (regular, special, emergency)
  • Example: "Tuscaloosa City Council Regular Meeting - 3rd Tuesday"

2. Public Hearings (event_category: "public_hearing")

  • Public comment sessions on specific issues
  • Budget hearings, zoning hearings, policy feedback
  • Fields: meeting_type (budget, zoning, policy)
  • Example: "Public Hearing on FY2026 Water System Fluoridation Budget"

3. Community Events (event_category: "community_event")

  • Town halls, community forums, listening sessions
  • Informal engagement between government and citizens
  • Fields: location_type (in-person, virtual, hybrid)
  • Example: "Town Hall on Community Health Priorities"

4. Trainings (event_category: "training") ⭐ NEW

  • Professional development workshops
  • Continuing education for healthcare workers, teachers, officials
  • Certification courses, skill-building sessions
  • Fields:
    • training_topic - Subject matter (e.g., "Pediatric Oral Health", "Water Fluoridation Safety")
    • target_audience - Who should attend (e.g., "Dental Hygienists", "School Nurses", "Water Operators")
    • presenter - Trainer/instructor name or organization
    • requires_registration - Boolean flag
    • registration_fee - Cost to attend (0 for free)
    • max_capacity - Attendance limit
    • end_date - Training end time (multi-day events)
  • Example: "Fluoride Varnish Application Training for School Nurses (3 CEU)"

Why Trainings Matter for Advocacy

Capacity Building:

  • βœ… Identify training gaps ("No fluoride varnish training in past 2 years")
  • βœ… Track professional development opportunities
  • βœ… Monitor continuing education credits (CEUs) offered

Stakeholder Engagement:

  • βœ… Find healthcare workers trained in specific skills
  • βœ… Identify champions (frequent training attendees)
  • βœ… Target outreach to trained professionals

Policy Implementation:

  • βœ… "City wants dental screenings but no trained staff" β†’ Show available trainings
  • βœ… Track certification status (who's qualified to implement policy)
  • βœ… Link training availability to policy feasibility

Example Questions Now Answerable:

  1. "What oral health trainings are offered in Alabama?" β†’ Filter by training_topic LIKE '%oral%'
  2. "Which jurisdictions offer free fluoride training?" β†’ registration_fee = 0 AND training_topic LIKE '%fluoride%'
  3. "How many school nurses attended varnish training last year?" β†’ Count attendees by target_audience
  4. "Are there upcoming water fluoridation operator trainings?" β†’ training_topic AND meeting_date > TODAY

Meeting Types Within Each Category

Government Meetings:

  • Regular sessions, special sessions, emergency meetings
  • Work sessions, committee meetings, executive sessions

Public Hearings:

  • Budget hearings, zoning hearings, policy feedback sessions
  • Environmental impact hearings, license applications

Community Events:

  • Town halls, listening sessions, community forums
  • Neighborhood meetings, stakeholder roundtables

Trainings:

  • Professional development workshops
  • Certification courses (CPR, fluoride application, etc.)
  • Continuing education (CEU/CME credits)
  • Skill-building sessions (motivational interviewing, cultural competency)

οΏ½πŸ’° Nonprofit Funding Source Tracking

Revenue Source Breakdown (Form 990 Data)

The NONPROFIT_FINANCES entity tracks 10 different revenue sources to understand how nonprofits are funded:

1. Grant Revenue (Institutional Funding)

  • government_grants - Federal, state, local government grants
  • foundation_grants - Private foundation grants (Gates, Ford, etc.)
  • Why it matters: Grant-dependent orgs may be less sustainable, more restrictive

2. Donation Revenue (Community Funding)

  • individual_donations - Direct donations from people
  • corporate_donations - Corporate giving programs
  • membership_dues - Member subscriptions/fees
  • Why it matters: Grassroots funding = community support, more flexible use

3. Earned Revenue (Self-Sufficiency)

  • program_service_revenue - Fees for services (clinic visits, classes, etc.)
  • special_events_revenue - Galas, fundraisers, events
  • rental_income - Property rentals
  • sale_of_assets - Asset sales
  • Why it matters: Self-generated revenue = sustainability, independence

4. Investment Revenue

  • investment_income - Interest, dividends, capital gains
  • Why it matters: Endowment size, financial health

5. Other Revenue

  • other_revenue - Miscellaneous sources
  • Why it matters: Unusual funding patterns

Calculated Metrics

  • overhead_ratio = (admin_expenses + fundraising_expenses) / total_expenses

    • Lower = more efficient (more goes to programs)
    • Industry benchmark: <25% overhead is "good"
  • fundraising_efficiency = contributions_received / fundraising_expenses

    • Higher = better (more money raised per dollar spent)
    • Industry benchmark: $4+ raised per $1 spent

Why This Matters for Advocacy

Find sustainable partners:

  • βœ… High individual donations = community trust
  • βœ… Diversified revenue = financial stability
  • ⚠️ Single-grant dependent = risky partnership

Evaluate efficiency:

  • βœ… Low overhead ratio = more program dollars
  • βœ… High fundraising efficiency = good stewardship
  • ⚠️ High admin costs = potential waste

Identify funding gaps:

  • Compare similar nonprofits' revenue mix
  • Find underutilized funding sources (e.g., membership programs)
  • Target corporate donation opportunities

Example Questions Now Answerable:

  1. "Which dental nonprofits have the most individual donors?" (community support)
  2. "What's the average overhead for oral health organizations?" (efficiency benchmark)
  3. "Are dental nonprofits more grant-dependent or self-sufficient?" (sustainability)
  4. "Which funders support oral health work?" (foundation grants analysis)

πŸ’΅ Grant Tracking System

Individual Grant Transactions (GRANT Entity)

The GRANT entity tracks individual grant awards beyond just aggregate 990 financials. This provides transaction-level detail for:

Grant Fields

  • Recipient Info: recipient_ein, recipient_name, recipient_type (nonprofit, government, etc.)
  • Funder Info: funder_name, funder_ein, funder_type (foundation, government, corporate)
  • Grant Details: grant_amount, grant_purpose, program_area
  • Timeline: award_date, start_date, end_date, grant_duration_months
  • Status: grant_status (active, completed, terminated)
  • Type: funding_source (federal, state, foundation, corporate)
  • Restrictions: multi_year, restrictions, reporting_requirements

Data Sources

IRS Form 990 Schedule I:

  • Grants PAID by nonprofits to other organizations
  • Required for organizations granting >$5,000/year
  • Shows foundation giving patterns

USASpending.gov API (FREE):

  • All federal grants to states, localities, nonprofits
  • Contract and grant transactions $25K+
  • Real-time data updated daily

Foundation Center/Candid:

  • Private foundation grants (990-PF data)
  • Grant descriptions, amounts, recipients

State Grant Databases:

  • State-level grant programs
  • Varies by state

Why Grant Tracking Matters

Follow the Money:

  • βœ… "Who funds oral health work in Alabama?" β†’ Track all grants by program_area
  • βœ… "Which foundations support fluoridation?" β†’ Search grant purposes
  • βœ… "How much federal money goes to dental access?" β†’ Sum funding_source = federal

Find Funding Opportunities:

  • βœ… Identify active grant programs (similar grants to similar orgs)
  • βœ… Discover new funders entering a program area
  • βœ… Track grant sizes and typical durations

Partnership Intelligence:

  • βœ… "Who else is this foundation funding?" β†’ Find collaborators
  • βœ… "What's this nonprofit's grant portfolio?" β†’ Assess stability
  • βœ… Multi-year grants = long-term commitment signal

Policy Implementation:

  • βœ… "Is there grant funding for this program?" β†’ Search active grants
  • βœ… "Which jurisdictions received similar grants?" β†’ Learn from others
  • βœ… Track grant requirements and restrictions

Example Questions Now Answerable:

  1. "What federal grants support dental health in Alabama schools?" β†’ funding_source = 'federal' AND program_area LIKE '%dental%' AND recipient_type = 'school_district'

  2. "Which foundations give the largest oral health grants?" β†’ GROUP BY funder_name WHERE program_area LIKE '%oral health%' ORDER BY SUM(grant_amount)

  3. "How long do typical dental access grants last?" β†’ AVG(grant_duration_months) WHERE program_area = 'dental access'

  4. "Which nonprofits receive multi-year fluoridation funding?" β†’ multi_year = true AND grant_purpose LIKE '%fluoride%'

  5. "What grants end in the next 6 months?" β†’ end_date BETWEEN NOW() AND NOW() + 6 MONTHS (renewal opportunities!)

Dataset Structure

grants/
β”œβ”€β”€ nonprofit_grants    # Grants TO nonprofits (Schedule I recipients)
β”œβ”€β”€ government_grants   # Federal/state grants to jurisdictions
β”œβ”€β”€ foundation_grants   # Private foundation giving (990-PF)
└── federal_grants      # USASpending.gov federal grants

⏰ Time Dimension Modeling

To enable robust time-series analysis, trend tracking, and temporal comparisons, we implement a comprehensive time dimension alongside our fact tables.

Time Dimension Table

DATE_DIMENSION {
    date date PK
    year int
    quarter int
    quarter_name string
    month int
    month_name string
    month_abbr string
    day_of_month int
    day_of_week int
    day_name string
    day_abbr string
    week_of_year int
    fiscal_year int
    fiscal_quarter int
    fiscal_month int
    is_weekend boolean
    is_holiday boolean
    holiday_name string
    is_business_day boolean
    days_in_month int
    year_month string
    year_quarter string
}

Temporal Relationships

All time-bound entities link to the date dimension for consistent temporal analysis:

erDiagram
    DATE_DIMENSION ||--o{ MEETING : "meeting_date"
    DATE_DIMENSION ||--o{ GOVERNMENT_BUDGET : "fiscal_year_start"
    DATE_DIMENSION ||--o{ BALLOT_MEASURE : "election_date"
    DATE_DIMENSION ||--o{ LEGISLATION : "introduced_date"
    DATE_DIMENSION ||--o{ GRANT : "start_date, end_date"
    DATE_DIMENSION ||--o{ NONPROFIT_FILING : "tax_period_end"
    DATE_DIMENSION ||--o{ POLICY_TRACKER : "tracked_date"
    DATE_DIMENSION ||--o{ SURVEY : "field_date_start, field_date_end"
    DATE_DIMENSION ||--o{ FACT_CHECK : "published_date"
    
    DATE_DIMENSION {
        date date PK
        year int
        quarter int
        month int
        fiscal_year int
        is_business_day boolean
    }

Temporal Analysis Patterns

Year-over-Year Comparisons:

SELECT 
    d.year,
    d.quarter_name,
    COUNT(m.meeting_id) as meeting_count,
    COUNT(m.meeting_id) - LAG(COUNT(m.meeting_id)) OVER (ORDER BY d.year, d.quarter) as yoy_change
FROM MEETING m
JOIN DATE_DIMENSION d ON m.meeting_date = d.date
WHERE d.year BETWEEN 2023 AND 2025
GROUP BY d.year, d.quarter, d.quarter_name
ORDER BY d.year, d.quarter;

Fiscal Period Aggregation:

SELECT 
    d.fiscal_year,
    d.fiscal_quarter,
    SUM(b.total_expenditures) as total_spending,
    AVG(b.total_expenditures) as avg_spending
FROM GOVERNMENT_BUDGET b
JOIN DATE_DIMENSION d ON b.fiscal_year = d.fiscal_year
WHERE b.jurisdiction_type = 'city'
GROUP BY d.fiscal_year, d.fiscal_quarter;

Trend Detection:

-- Identify growing advocacy momentum
SELECT 
    d.year_month,
    COUNT(DISTINCT pt.topic_id) as active_topics,
    COUNT(m.meeting_id) as related_meetings,
    COUNT(bm.measure_id) as ballot_initiatives
FROM DATE_DIMENSION d
LEFT JOIN MEETING m ON m.meeting_date = d.date AND m.oral_health_related = true
LEFT JOIN POLICY_TRACKER pt ON d.date BETWEEN pt.start_date AND COALESCE(pt.end_date, CURRENT_DATE)
LEFT JOIN BALLOT_MEASURE bm ON bm.election_date = d.date
WHERE d.date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
GROUP BY d.year_month
ORDER BY d.year_month;

πŸ“Š Metric Views

Metric views provide pre-aggregated, analysis-ready datasets combining multiple source tables with built-in dimensions, measures, and filters.

Core Metric View Components

Component Description Example
Source Base table, view, or SQL query containing the data MEETING, GOVERNMENT_BUDGET, NONPROFIT_FILING
Dimensions Column attributes used to segment or group metrics jurisdiction_type, fiscal_year, policy_topic
Measures Column aggregations that produce metrics COUNT(meeting_id) as meeting_count, SUM(grant_amount) as total_funding
Filters Conditions applied to source data to define scope oral_health_related = true, fiscal_year > 2020
Joins Relationships between tables to enrich data JOIN JURISDICTION ON meeting.jurisdiction_id = jurisdiction.jurisdiction_id

Example Metric Views

1. Advocacy Activity Metrics

Purpose: Track oral health advocacy momentum across jurisdictions

CREATE VIEW metric_advocacy_activity AS
SELECT 
    -- Dimensions
    j.jurisdiction_id,
    j.jurisdiction_type,
    j.state_code,
    j.county_name,
    d.year,
    d.quarter_name,
    d.month_name,
    pt.topic_name,
    
    -- Measures
    COUNT(DISTINCT m.meeting_id) as meeting_count,
    COUNT(DISTINCT bm.measure_id) as ballot_measure_count,
    COUNT(DISTINCT l.bill_id) as legislation_count,
    COUNT(DISTINCT fc.claim_id) as fact_check_count,
    
    -- Calculated Metrics
    SUM(CASE WHEN m.oral_health_related THEN 1 ELSE 0 END) as oral_health_meeting_count,
    AVG(CASE WHEN bm.result = 'passed' THEN 1 ELSE 0 END) as ballot_success_rate,
    COUNT(DISTINCT n.nonprofit_id) as active_nonprofit_count

FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.date BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 365 DAY) AND CURRENT_DATE
LEFT JOIN MEETING m ON m.jurisdiction_id = j.jurisdiction_id AND m.meeting_date = d.date
LEFT JOIN POLICY_TRACKER pt ON pt.jurisdiction_id = j.jurisdiction_id
LEFT JOIN BALLOT_MEASURE bm ON bm.jurisdiction_id = j.jurisdiction_id AND bm.election_date = d.date
LEFT JOIN LEGISLATION l ON l.state_code = j.state_code AND l.introduced_date = d.date
LEFT JOIN FACT_CHECK fc ON fc.published_date = d.date
LEFT JOIN NONPROFIT n ON n.jurisdiction_id = j.jurisdiction_id

WHERE 
    -- Filters
    d.is_business_day = true
    AND (
        m.oral_health_related = true 
        OR pt.topic_area LIKE '%oral health%'
        OR pt.topic_area LIKE '%dental%'
        OR pt.topic_area LIKE '%fluoride%'
    )

GROUP BY 
    j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.county_name,
    d.year, d.quarter_name, d.month_name, pt.topic_name;

Usage:

-- Find top 10 most active jurisdictions for oral health advocacy
SELECT 
    jurisdiction_id,
    state_code,
    jurisdiction_type,
    SUM(meeting_count) as total_meetings,
    SUM(ballot_measure_count) as total_ballot_measures,
    SUM(oral_health_meeting_count) as oral_health_meetings
FROM metric_advocacy_activity
WHERE year = 2025
GROUP BY jurisdiction_id, state_code, jurisdiction_type
ORDER BY total_meetings DESC
LIMIT 10;

2. Government Spending Metrics

Purpose: Analyze government budget allocations and trends

CREATE VIEW metric_government_spending AS
SELECT 
    -- Dimensions
    j.jurisdiction_id,
    j.jurisdiction_type,
    j.state_code,
    j.population,
    d.fiscal_year,
    d.fiscal_quarter,
    bc.category_name as budget_category,
    
    -- Measures
    SUM(gb.total_revenue) as total_revenue,
    SUM(gb.total_expenditures) as total_expenditures,
    SUM(gb.total_debt) as total_debt,
    SUM(gb.property_tax_revenue) as property_tax_revenue,
    SUM(gb.federal_grants) as federal_grants,
    SUM(gb.state_grants) as state_grants,
    
    -- Calculated Metrics
    SUM(gb.total_revenue) / NULLIF(j.population, 0) as revenue_per_capita,
    SUM(gb.total_expenditures) / NULLIF(j.population, 0) as spending_per_capita,
    SUM(gb.total_debt) / NULLIF(j.population, 0) as debt_per_capita,
    (SUM(gb.total_revenue) - SUM(gb.total_expenditures)) as budget_surplus_deficit,
    SUM(bc.amount) / NULLIF(SUM(gb.total_expenditures), 0) * 100 as category_pct_of_budget

FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.fiscal_year BETWEEN 2020 AND 2025
JOIN GOVERNMENT_BUDGET gb ON gb.jurisdiction_id = j.jurisdiction_id AND gb.fiscal_year = d.fiscal_year
LEFT JOIN BUDGET_CATEGORY bc ON bc.budget_id = gb.budget_id

WHERE 
    -- Filters
    gb.total_expenditures > 0
    AND j.population > 0

GROUP BY 
    j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.population,
    d.fiscal_year, d.fiscal_quarter, bc.category_name;

Usage:

-- Compare spending per capita across jurisdiction types
SELECT 
    jurisdiction_type,
    fiscal_year,
    AVG(spending_per_capita) as avg_spending_per_capita,
    AVG(revenue_per_capita) as avg_revenue_per_capita,
    AVG(debt_per_capita) as avg_debt_per_capita
FROM metric_government_spending
GROUP BY jurisdiction_type, fiscal_year
ORDER BY fiscal_year DESC, avg_spending_per_capita DESC;

3. Nonprofit Impact Metrics

Purpose: Measure nonprofit activity, funding, and service delivery

CREATE VIEW metric_nonprofit_impact AS
SELECT 
    -- Dimensions
    n.nonprofit_id,
    n.organization_name,
    n.state,
    n.city,
    nc.ntee_code,
    nc.category_name,
    d.tax_year,
    
    -- Measures
    SUM(nf.total_revenue) as total_revenue,
    SUM(nf.total_expenses) as total_expenses,
    SUM(nf.total_assets) as total_assets,
    SUM(nf.program_service_expenses) as program_expenses,
    SUM(nf.fundraising_expenses) as fundraising_expenses,
    SUM(nf.management_expenses) as management_expenses,
    COUNT(DISTINCT g.grant_id) as grants_received_count,
    SUM(g.grant_amount) as total_grants_received,
    
    -- Calculated Metrics
    SUM(nf.program_service_expenses) / NULLIF(SUM(nf.total_expenses), 0) * 100 as program_expense_ratio,
    SUM(nf.fundraising_expenses) / NULLIF(SUM(nf.total_revenue), 0) * 100 as fundraising_efficiency,
    (SUM(nf.total_revenue) - SUM(nf.total_expenses)) as net_income,
    COUNT(DISTINCT nf.year) as years_active

FROM NONPROFIT n
JOIN DATE_DIMENSION d ON d.year BETWEEN 2020 AND 2025
JOIN NONPROFIT_FILING nf ON nf.ein = n.ein AND nf.tax_year = d.year
LEFT JOIN NONPROFIT_CAUSE nc ON nc.cause_id = n.primary_cause_id
LEFT JOIN GRANT g ON g.recipient_ein = n.ein AND g.tax_year = d.year

WHERE 
    -- Filters
    nf.total_revenue > 0
    AND (
        nc.category_name LIKE '%health%'
        OR nc.category_name LIKE '%dental%'
        OR n.mission_statement LIKE '%oral health%'
    )

GROUP BY 
    n.nonprofit_id, n.organization_name, n.state, n.city,
    nc.ntee_code, nc.category_name, d.tax_year;

Usage:

-- Identify high-performing health nonprofits by program efficiency
SELECT 
    organization_name,
    state,
    tax_year,
    total_revenue,
    program_expense_ratio,
    fundraising_efficiency
FROM metric_nonprofit_impact
WHERE 
    tax_year = 2024
    AND total_revenue > 1000000
    AND program_expense_ratio > 75  -- More than 75% goes to programs
ORDER BY program_expense_ratio DESC, total_revenue DESC
LIMIT 20;

Metric View Best Practices

  1. Grain Definition: Clearly define the granularity of each metric view (e.g., per jurisdiction per month)
  2. Performance: Pre-aggregate expensive calculations to improve query performance
  3. Incremental Updates: Design views to support incremental refresh rather than full rebuilds
  4. Documentation: Document all dimension values, measure calculations, and filter logic
  5. Naming Convention: Use metric_ prefix followed by descriptive name (e.g., metric_advocacy_activity)
  6. Testing: Validate measure calculations against source data to ensure accuracy

Query Optimization

For large-scale analytics, metric views can be materialized:

-- Materialize for fast querying
CREATE MATERIALIZED VIEW metric_advocacy_activity_mat AS
SELECT * FROM metric_advocacy_activity;

-- Refresh incrementally
REFRESH MATERIALIZED VIEW metric_advocacy_activity_mat;

-- Add indexes on common filter/join columns
CREATE INDEX idx_advocacy_state_year 
ON metric_advocacy_activity_mat(state_code, year);

CREATE INDEX idx_advocacy_jurisdiction 
ON metric_advocacy_activity_mat(jurisdiction_id);

πŸ”§ System-Internal Tables (OMOP-Inspired Vocabulary)

Purpose: Standardized terminology and concept management following OHDSI OMOP Common Data Model principles. These tables provide semantic interoperability and enable precise data linkage across all entities.

⚠️ Technical Note: These are internal reference tables used by data engineers and ETL pipelines. Non-technical users can ignore this section.

<ZoomableMermaid title="Vocabulary & Concept System (OMOP-Inspired)" initialScale={8} value={` erDiagram %% ======================================== %% VOCABULARY SYSTEM (OMOP-Inspired) %% Based on OHDSI Athena vocabulary structure %% ID range: 2,000,000,000+ for custom civic concepts %% ========================================

VOCABULARY ||--o{ CONCEPT : contains
CONCEPT_CLASS ||--o{ CONCEPT : categorizes
DOMAIN ||--o{ CONCEPT : groups
CONCEPT ||--o{ CONCEPT_RELATIONSHIP : source
CONCEPT ||--o{ CONCEPT_RELATIONSHIP : target

VOCABULARY {
    string vocabulary_id PK
    string vocabulary_name
    string vocabulary_reference
    string vocabulary_version
    string vocabulary_concept_id FK
}

DOMAIN {
    string domain_id PK
    string domain_name
    string domain_concept_id FK
}

CONCEPT_CLASS {
    string concept_class_id PK
    string concept_class_name
    string concept_class_concept_id FK
}

CONCEPT {
    int concept_id PK "Unique ID (2B+ for custom)"
    string concept_name "Display name"
    string domain_id FK "Jurisdiction, Nonprofit, Policy"
    string vocabulary_id FK "OCD_ID, IRS_NTEE, US_Census"
    string concept_class_id FK "City, County, 501c3, Mayor"
    string standard_concept "S=Standard, C=Classification"
    string concept_code "External identifier"
    datetime valid_start_date
    datetime valid_end_date
    string invalid_reason
}

CONCEPT_RELATIONSHIP {
    int concept_id_1 FK
    int concept_id_2 FK
    string relationship_id "Is part of, Regulates, etc"
    datetime valid_start_date
    datetime valid_end_date
    string invalid_reason
}

%% ========================================
%% CONCEPT MAPPING TO EXISTING TABLES
%% ========================================

CONCEPT ||--o{ JURISDICTION : city_concept_id
CONCEPT ||--o{ ORGANIZATION : organization_concept_id
CONCEPT ||--o{ LEADER : position_concept_id
CONCEPT ||--o{ POLICY_TOPIC : topic_concept_id

%% ========================================
%% DEMOGRAPHICS
%% ========================================
%% Comprehensive demographic data from U.S. Census Bureau
%% Separate entity for population characteristics

JURISDICTION ||--o| DEMOGRAPHICS : has_demographics
DEMOGRAPHICS {
    string demographics_id PK
    string jurisdiction_id FK
    string census_tract
    int total_population
    int population_2020
    int population_2024
    int median_age
    
    %% Race & Ethnicity
    int white_alone
    int black_alone
    int asian_alone
    int native_american_alone
    int pacific_islander_alone
    int other_race_alone
    int two_or_more_races
    int hispanic_latino
    int not_hispanic_latino
    
    %% Gender
    int male_population
    int female_population
    
    %% Age Distribution
    int under_18
    int age_18_to_64
    int age_65_plus
    
    %% Income & Economics
    float median_household_income
    float per_capita_income
    float poverty_rate
    int households_snap_benefits
    
    %% Education
    int high_school_graduate_pct
    int bachelors_degree_pct
    int graduate_degree_pct
    
    %% Housing
    int total_housing_units
    int owner_occupied_pct
    int renter_occupied_pct
    float median_home_value
    float median_rent
    
    %% Employment
    float unemployment_rate
    int labor_force_participation_pct
    
    %% Health Insurance
    int uninsured_pct
    int medicaid_pct
    int medicare_pct
    
    datetime census_year
    string acs_table_id "ACS Table reference"
}

`} />

Vocabulary Sources

Vocabulary ID Vocabulary Name Source Use Case
OCD_ID Open Civic Data Division IDs https://github.com/opencivicdata/ocd-division-ids Standard jurisdiction identifiers
IRS_NTEE IRS National Taxonomy of Exempt Entities IRS TEOS Nonprofit classification
US_Census U.S. Census Bureau Census Gazetteer, ACS Demographics, geography
NCES National Center for Education Statistics NCES CCD, F-33 School districts, education data
OHDSI_Gender OHDSI Gender Athena Standard gender concepts (interoperable with medical research)
OHDSI_Race OHDSI Race Athena Standard race concepts (OMB Classification)
OHDSI_Ethnicity OHDSI Ethnicity Athena Standard ethnicity concepts (Hispanic/Latino)
OpenNavigator Custom Civic Concepts Internal Cities, officials, topics (ID > 2,000,000,000)

Concept Classes for Civic Data

Concept Class Domain Examples
City Jurisdiction Incorporated places, consolidated city-counties
County Jurisdiction U.S. counties, county equivalents
State Jurisdiction U.S. states, territories, DC
School District Jurisdiction LEAs (Local Educational Agencies)
501c3 Nonprofit Tax-exempt charitable organizations
501c4 Nonprofit Social welfare organizations
Mayor Position Chief executive of city government
Council Member Position Legislative member
Superintendent Position School district chief administrator
Health Policy Topic Fluoridation, nutrition, dental care
Education Policy Topic School funding, curriculum, facilities

Example Concept Entries

-- Standard concept for a city
INSERT INTO CONCEPT VALUES (
    2000000001,                    -- concept_id (custom range)
    'Birmingham, Alabama',         -- concept_name
    'Jurisdiction',                -- domain_id
    'OCD_ID',                      -- vocabulary_id
    'City',                        -- concept_class_id
    'S',                           -- standard_concept (Standard)
    'ocd-division/country:us/state:al/place:birmingham',  -- concept_code
    '2020-01-01',                  -- valid_start_date
    '2099-12-31',                  -- valid_end_date
    NULL                           -- invalid_reason
);

-- Concept for a nonprofit cause
INSERT INTO CONCEPT VALUES (
    2000000101,                    -- concept_id
    'Animal Welfare',              -- concept_name
    'Nonprofit',                   -- domain_id
    'IRS_NTEE',                    -- vocabulary_id
    '501c3',                       -- concept_class_id
    'C',                           -- standard_concept (Classification)
    'D20',                         -- concept_code (NTEE code)
    '2020-01-01',
    '2099-12-31',
    NULL
);

-- Gender concept from OHDSI Athena (interoperable with medical research)
INSERT INTO CONCEPT VALUES (
    8507,                          -- concept_id (OHDSI standard)
    'MALE',                        -- concept_name
    'Gender',                      -- domain_id
    'OHDSI_Gender',                -- vocabulary_id
    'Gender',                      -- concept_class_id
    'S',
    'M',
    '1970-01-01',
    '2099-12-31',
    NULL
);

Concept Relationships

-- City is part of County
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
    2000000001,                    -- Birmingham, AL (concept_id_1)
    2000000050,                    -- Jefferson County, AL (concept_id_2)
    'Is part of',                  -- relationship_id
    '2020-01-01',
    '2099-12-31',
    NULL
);

-- Topic regulates Legislation
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
    2000000201,                    -- Water Fluoridation topic
    2000000305,                    -- Ordinance 101
    'Regulates',
    '2024-01-15',
    '2099-12-31',
    NULL
);

-- Organization addresses Topic
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
    2000000401,                    -- Dental Health Foundation
    2000000201,                    -- Water Fluoridation
    'Addresses',
    '2023-06-01',
    '2099-12-31',
    NULL
);

Implementation in Existing Tables

Updated JURISDICTION Schema

CREATE TABLE JURISDICTION (
    jurisdiction_id VARCHAR(255) PRIMARY KEY,
    name VARCHAR(255),
    jurisdiction_type VARCHAR(50),
    
    -- OMOP-style concept references
    city_concept_id INTEGER REFERENCES CONCEPT(concept_id),
    city_type_concept_id INTEGER REFERENCES CONCEPT(concept_id),  -- e.g., "Consolidated City-County"
    source_value VARCHAR(255),  -- Original text from data source (e.g., "SF, Calif")
    
    state_code VARCHAR(2),
    county_name VARCHAR(100),
    population INTEGER,
    -- ... other fields
);

Updated ORGANIZATION Schema

CREATE TABLE ORGANIZATION (
    org_id VARCHAR(255) PRIMARY KEY,
    ein VARCHAR(20),
    name VARCHAR(255),
    
    -- OMOP-style concept references
    organization_concept_id INTEGER REFERENCES CONCEPT(concept_id),
    cause_concept_id INTEGER REFERENCES CONCEPT(concept_id),  -- Links to NTEE concept
    org_type_concept_id INTEGER REFERENCES CONCEPT(concept_id),  -- 501c3, 501c4, etc.
    source_value VARCHAR(255),  -- Original organization name from IRS
    
    ntee_code VARCHAR(10),
    -- ... other fields
);

Updated DEMOGRAPHICS Schema

CREATE TABLE DEMOGRAPHICS (
    demographics_id VARCHAR(255) PRIMARY KEY,
    jurisdiction_id VARCHAR(255) REFERENCES JURISDICTION(jurisdiction_id),
    
    -- OMOP-style concept references (OHDSI Athena vocabularies)
    race_concept_id INTEGER REFERENCES CONCEPT(concept_id),       -- Standard OHDSI race codes
    ethnicity_concept_id INTEGER REFERENCES CONCEPT(concept_id),  -- Hispanic/Latino classification
    gender_concept_id INTEGER REFERENCES CONCEPT(concept_id),     -- MALE/FEMALE/OTHER
    
    race_source_value VARCHAR(100),      -- Census text (e.g., "White alone")
    ethnicity_source_value VARCHAR(100), -- Census text
    gender_source_value VARCHAR(50),     -- Census text
    
    -- ... other demographic fields
);

Benefits of OMOP-Style Vocabulary

  1. Semantic Interoperability: Civic data can be joined with healthcare research data using standard OHDSI demographic concepts
  2. Reproducible IDs: Deterministic hashing (uuid5) on strings like "JURISDICTION:CITY:NEW_YORK" generates consistent concept_id values
  3. Version Control: valid_start_date and valid_end_date track concept changes over time
  4. Relationship Tracking: CONCEPT_RELATIONSHIP table captures hierarchies (City β†’ County β†’ State) and associations (Topic β†’ Legislation)
  5. Source Traceability: source_value preserves original text while concept_id provides standardized reference

ETL Mapping Strategy

import uuid

def generate_concept_id(domain: str, type: str, identifier: str) -> int:
    """
    Generate deterministic concept_id using UUID5.
    Returns integer > 2,000,000,000 for custom civic concepts.
    """
    namespace = uuid.UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')  # OMOP namespace
    concept_string = f"{domain}:{type}:{identifier}".upper()
    concept_uuid = uuid.uuid5(namespace, concept_string)
    # Convert to integer in custom range
    return 2_000_000_000 + (int(concept_uuid.hex[:8], 16) % 1_000_000_000)

# Example usage
city_concept_id = generate_concept_id("JURISDICTION", "CITY", "NEW_YORK")
print(city_concept_id)  # e.g., 2045879021 (repeatable)

Downloading OHDSI Athena Vocabularies

  1. Visit https://athena.ohdsi.org/
  2. Select vocabularies:
    • βœ… Gender
    • βœ… Race
    • βœ… Ethnicity
    • βœ… Geography (US Counties, States)
  3. Download CSV files
  4. Import into vocabulary/ folder:
    • vocabulary_gender.parquet
    • vocabulary_race.parquet
    • vocabulary_ethnicity.parquet
    • vocabulary_geography.parquet

🎯 Missing Datasets to Add

High Priority

  • Ballot Measures - βœ… Added to data model! Fluoridation votes, bond measures
  • State Legislation - βœ… Added to data model! Open States API (FREE)
  • Policy Topics - βœ… Added to data model! Oral health advocacy tracking
  • Government Finances - βœ… Added to data model! City/county/state budgets, Census of Governments
  • School Finances - βœ… Added to data model! NCES F-33 per-pupil spending, revenues
  • Nonprofit Financials - βœ… Added to data model! Form 990 detailed financials (10M+ filings)
  • Census Demographics - Full census data per jurisdiction (beyond population)
  • Procurement Records - Government contracts
  • Election Results - Historical voting data
  • Health Outcomes - CDC PLACES data (oral health metrics!)
  • Environmental Data - EPA water quality (fluoridation levels)

Medium Priority

  • Property Records - Public assessment data
  • Crime Statistics - UCR/NIBRS data
  • Business Licenses - Local business registrations (dental clinics!)
  • Building Permits - Construction activity
  • Code Violations - Inspection records
  • Police Reports - Public safety incidents
  • Fire Department Data - Emergency response
  • Parks & Recreation - Facilities & programs
  • Transportation Data - Traffic & transit

Integration Improvements

  • Full Wikidata Sync - All civic entities
  • DBpedia Expansion - Complete local government coverage
  • Ballotpedia Data - (if budget allows) Electoral info & analysis
  • Court Records - Public dockets
  • Tax Records - Property tax data

πŸš€ Implementation Status

βœ… Completed

  • Jurisdiction discovery pipeline
  • YouTube channel discovery
  • Meeting platform detection
  • NCES school district ingestion
  • Open States API integration
  • Wikidata SPARQL queries
  • DBpedia Lookup API
  • Google Civic API (code ready)
  • Social media discovery
  • HuggingFace upload pipeline

πŸ”¨ In Progress

  • Meeting minutes extraction (Tuscaloosa pilot)
  • Video transcript processing
  • Document keyword detection
  • Nonprofit data enrichment

πŸ“‹ Planned

  • Automated meeting scraping at scale
  • Real-time meeting notifications
  • Budget document parsing
  • Full census integration
  • Health outcome correlation

πŸ“š Related Documentation

Data Standards & Specifications

  • Popolo Project - Open government data specification for people, organizations, and elected positions. Our LEADER, ORGANIZATION, and JURISDICTION entities follow Popolo schema conventions for maximum interoperability with civic tech platforms.

  • Schema.org - W3C structured data vocabulary for semantic web. Our entities map to Schema.org types (Event, Person, Organization, Legislation, ClaimReview, etc.) enabling SEO-optimized JSON-LD exports, Google Search rich results, and voice assistant compatibility.

  • Common Education Data Standards (CEDS) - U.S. Department of Education data standards for K-12, postsecondary, and workforce data. Our SCHOOL_DISTRICT entity aligns with CEDS Element IDs and NCES survey specifications (CCD, F-33 Finance).

  • Open Civic Data (OCD-IDs) - Standardized division identifiers for jurisdictions. Format: ocd-division/country:us/state:al/place:birmingham

Internal Documentation


Last Updated: {new Date().toISOString().split('T')[0]}

Data Model Version: 2.1 (Popolo-compatible)