---
displayed_sidebar: developersSidebar
sidebar_position: 1
---
import ZoomableMermaid from '@site/src/components/ZoomableMermaid';
# Data Model & Entity Relationship Diagram
Comprehensive overview of all data entities extracted, processed, and uploaded to HuggingFace datasets.
## οΏ½ HuggingFace Dataset Structure
### Current Datasets Being Uploaded
```
open-navigator-data/
βββ jurisdictions/ # ποΈ Core jurisdiction data
β βββ cities # 19,000+ incorporated places
β βββ counties # 3,144 U.S. counties
β βββ states # 50 states + DC, territories
β βββ school_districts # 13,000+ districts (NCES data)
β βββ census_data # Basic FIPS codes & census year reference
β
βββ demographics/ # π₯ Comprehensive demographic data (U.S. Census)
β βββ population # Total population, age distribution
β βββ race_ethnicity # Race and ethnicity breakdowns
β βββ income_economics # Income, poverty, SNAP benefits
β βββ education # Educational attainment levels
β βββ housing # Housing units, ownership, values
β βββ employment # Unemployment, labor force participation
β βββ health_insurance # Insurance coverage (uninsured, Medicaid, Medicare)
β
βββ social/ # π± Social media presence
β βββ twitter # Twitter/X accounts
β βββ facebook # Facebook pages
β βββ instagram # Instagram accounts
β βββ linkedin # LinkedIn pages
β
βββ videos/ # πΉ Video & streaming platforms
β βββ youtube_channels # Government YouTube channels
β βββ vimeo # Vimeo accounts
β βββ livestreams # Live meeting streams
β
βββ platforms/ # π₯οΈ Meeting management systems
β βββ legistar # Legistar URLs
β βββ granicus # Granicus links
β βββ suiteone # SuiteOne systems
β βββ civicplus # CivicPlus platforms
β
βββ domains/ # π Official government websites
β βββ gsa_domains # .gov domain registry
β βββ municipal_websites # City/county websites
β βββ state_portals # State government sites
β
βββ events/ # π Meetings, Hearings & Public Events
β βββ events # Government meetings, public hearings, community forums, town halls
β βββ event_participants # Officials and organizations participating in events
β βββ event_agenda_items # Individual agenda topics discussed
β βββ event_documents # Agendas, minutes, presentations, handouts
β βββ event_media # Video recordings, livestreams, audio files
β βββ event_bills # Bills discussed or considered at meetings
β
βββ contacts/ # π₯ All People - Officials, Candidates, Donors, Constituents
β βββ officials # Elected and appointed officials (mayors, council members, legislators)
β βββ official_roles # Current and historical positions held
β βββ official_contacts # Email, phone, office addresses
β βββ official_identifiers # External IDs (Twitter, OpenStates, Ballotpedia)
β βββ official_links # Websites, social media profiles
β βββ candidates # Political candidates (House, Senate, President - FEC data)
β βββ nonprofit_donors # Nonprofit leadership political giving (FEC analysis)
β βββ constituents # Donors, volunteers, members, beneficiaries (all people)
β
βββ nonprofits/ # π’ Nonprofit organizations & churches
β βββ irs_eobmf # IRS EO-BMF bulk data (1.9M+ organizations) - PRIMARY SOURCE
β βββ irs_nonprofits # Legacy IRS 990 data (deprecated - use irs_eobmf)
β βββ propublica_data # ProPublica API (financials, NTEE codes)
β βββ everyorg_data # Every.org API (missions, causes, logos)
β βββ nonprofit_990s # Detailed Form 990 financials (yearly filings)
β βββ congregations # π Church & congregation data (ARDA, HIFLD, NCS)
β βββ constituents # π€ Donors, volunteers, members, beneficiaries (Microsoft CDM)
β βββ donations # π Financial contributions and in-kind gifts (Microsoft CDM)
β βββ campaigns # π£ Fundraising campaigns and appeals (Microsoft CDM)
β βββ memberships # π« Member enrollment and renewals (Microsoft CDM)
β βββ volunteer_activities # π Volunteer hours and activities (Microsoft CDM)
β βββ program_delivery # π― Programs and services delivered (Microsoft CDM)
β βββ program_outcomes # π Impact metrics and outcome measurements (Microsoft CDM)
β
βββ grants/ # π΅ Grant funding transactions
β βββ nonprofit_grants # Grants to nonprofits (from 990 Schedule I)
β βββ government_grants # Government grants to orgs/jurisdictions
β βββ foundation_grants # Private foundation grants
β βββ federal_grants # Federal funding programs
β
βββ causes/ # π― Cause & category taxonomy
β βββ ntee_codes # IRS NTEE classification system
β βββ everyorg_causes # Every.org cause tags
β
βββ budgets/ # π° Government budgets & finances
β βββ city_budgets # City/municipal budgets & spending
β βββ county_budgets # County budgets & expenditures
β βββ state_budgets # State government finances
β βββ school_budgets # School district finances (NCES F-33)
β βββ bond_debt # Municipal bonds & debt obligations
β βββ budget_line_items # π Detailed budget categories & line items
β βββ budget_deltas # π Budget-to-Minutes Delta analysis (political economy)
β
βββ decisions/ # βοΈ Policy decisions & political economy analysis
β βββ policy_decisions # Extracted decisions from meetings
β βββ decision_frames # Frame analysis (rhetoric patterns)
β βββ decision_options # Options considered & rejected
β βββ decision_tradeoffs # Tradeoffs discussed (cost vs benefit, etc.)
β βββ stakeholder_positions # π₯ Who spoke for/against (Influence Radar)
β βββ decision_votes # Detailed vote records per decision
β βββ deferral_patterns # π
Stalling detection (same topic, multiple deferrals)
β βββ deferral_instances # Individual tabling events linked to patterns
β βββ keyword_density # Quantitative indicators (grant/taxpayer/emergency)
β βββ deferral_patterns # Tabled/delayed decisions (temporal analysis)
β
βββ elections/ # π
Election cycles & temporal analysis
β βββ election_cycles # Election dates & periods
β βββ election_influences # Pre/post-election decision patterns
β
βββ campaigns/ # π° Political campaign finance (FEC data)
β βββ committees # PACs, Super PACs, campaign committees
β βββ contributions # Individual political contributions $200+
β
βββ civic/ # π³οΈ Google Civic & Wikidata
β βββ civic_divisions # OCD divisions
β βββ representatives # From Google Civic API
β βββ wikidata_entities # Structured entities
β βββ dbpedia_resources # Wikipedia infobox data
β
βββ ballots/ # π³οΈ Ballot initiatives & referendums
β βββ state_measures # State propositions (fluoridation votes!)
β βββ local_measures # City/county ballot questions
β βββ election_results # Historical voting outcomes
β
βββ bills/ # π Legislation & Lawmaking
β βββ bills # Bills and resolutions (1.5M+ from all 50 states)
β βββ bill_actions # Bill history (introduced, committee, floor vote, signed)
β βββ bill_sponsorships # Primary sponsors and co-sponsors
β βββ bill_abstracts # Bill summaries and descriptions
β βββ bill_versions # Different versions of bill text (introduced, amended, enrolled)
β βββ bill_version_links # Links to PDF/HTML bill text
β βββ bill_documents # Supporting documents (fiscal notes, amendments, analysis)
β βββ bill_document_links # Links to supporting documents
β βββ bill_subjects # Subject/topic tags per bill
β βββ legislative_sessions # Session identifiers (2023 Regular, 2024 Special, etc.)
β βββ vote_events # Roll call votes on bills
β βββ vote_counts # Vote tallies (yes, no, abstain, absent)
β βββ individual_votes # Individual legislator votes (yes/no/abstain)
β
βββ topics/ # π― Advocacy causes & campaigns
β βββ topic_definitions # Validated survey questions from Roper Center
β βββ survey_questions # Public opinion question wording library
β βββ jurisdiction_topics # What each city is discussing
β βββ advocacy_alerts # Opportunities for engagement
β
βββ surveys/ # π Public opinion research & polling data
β βββ survey_providers # Polling organizations (Gallup, Pew, Roper, etc.)
β βββ survey_studies # Individual survey studies/waves
β βββ survey_variables # Questions/items asked in surveys
β βββ survey_responses # Aggregate and individual response data
β βββ ipoll_metadata # Roper iPoll catalog metadata
β βββ survey_crosstabs # Breakdowns by demographics, geography
β
βββ factchecks/ # β
Fact-checking & claim verification
β βββ claim_reviews # Google Fact Check API (ClaimReview schema)
β βββ politifact # PolitiFact Truth-O-Meter ratings
β βββ factcheck_org # FactCheck.org verified claims
β βββ verified_claims # Aggregated fact-check database
β
βββ civic_tech/ # π» Open source projects & hackathons
β βββ github_repositories # Civic tech projects (GitHub API)
β βββ project_metadata # Code for America, USDR, Civic Tech Field Guide
β βββ contributors # Maintainers and core contributors
β βββ project_issues # Good first issues, contribution opportunities
β βββ hackathons # Civic hackathon events
β βββ hackathon_projects # Projects built at hackathons
β βββ brigade_chapters # Code for America brigade locations
β βββ project_funding # GitHub Sponsors, grants, OpenCollective
β
βββ community_solutions/ # π Community engagement & use cases
β βββ engagement_spectrum # Spectrum of Community Engagement to Ownership
β βββ use_case_catalog # Harvard Data-Smart City Solutions examples
β βββ data_academies # Brookings Institution training programs
β βββ success_stories # Real-world outcomes (Providence, Portland, Tempe)
β βββ metric_templates # Pre-built analytics for common challenges
β βββ workflow_guides # Step-by-step community data workflows
β
βββ analytics/ # π Time dimensions & metric views
β βββ date_dimension # Date/time reference table (YYYY-MM-DD, day_of_week, fiscal_year)
β βββ temporal_relationships # Time-series joins for all entities
β βββ metric_views # Pre-computed analytics (advocacy, spending, nonprofit impact)
β βββ aggregated_stats # Monthly/quarterly/yearly rollups
β βββ dashboard_metrics # Real-time dashboard data feeds
β
βββ standards/ # π Schema.org, Popolo, CEDS, IATI exports
β βββ schema_org_jsonld # JSON-LD exports (Event, Person, Organization, Legislation, ClaimReview)
β βββ popolo_exports # Popolo-compliant JSON (Person, Organization, Membership, VoteEvent)
β βββ ceds_aligned # CEDS-compliant education data (Element IDs, Option Sets)
β βββ ocd_divisions # Open Civic Data division IDs
β βββ iati_activities # IATI Standard v2.03 XML (programs, grants, humanitarian aid)
β βββ rdf_triples # RDF/Turtle semantic web exports
β
βββ vocabulary/ # π§ OMOP-inspired concept & terminology (SYSTEM-INTERNAL)
β βββ concept # Master concept table (cities, causes, officials)
β βββ vocabulary # Vocabulary sources (OCD_ID, IRS_NTEE, US_Census)
β βββ concept_class # Concept classifications (City, County, 501c3, Mayor)
β βββ concept_relationship # Relationships (City β County, Topic β Legislation)
β βββ domain # Domain groupings (Jurisdiction, Nonprofit, Policy)
β
βββ exports/ # π€ API-ready formatted exports
βββ csv_bulk # CSV downloads for all datasets
βββ json_api # REST API JSON responses
βββ graphql_schema # GraphQL schema definitions
βββ parquet_optimized # Compressed Parquet (default format)
```
### Parquet File Naming Convention
**Rule:** Use underscores (`_`) consistently, NOT hyphens (`-`)
**Format:** `{category}_{subcategory}.parquet`
**Examples:**
```
β
CORRECT (using underscores):
jurisdictions_cities.parquet
jurisdictions_counties.parquet
jurisdictions_states.parquet
jurisdictions_school_districts.parquet
social_twitter.parquet
social_facebook.parquet
videos_youtube_channels.parquet
events_events.parquet
contacts_officials.parquet
bills_bills.parquet
nonprofits_organizations.parquet
nonprofits_financials.parquet
nonprofits_programs.parquet
nonprofits_locations.parquet
nonprofits_irs_eobmf.parquet
nonprofits_constituents.parquet
nonprofits_donations.parquet
nonprofits_campaigns.parquet # Nonprofit fundraising campaigns (NOT political)
contacts_candidates.parquet # Political candidates (FEC)
contacts_nonprofit_donors.parquet # Nonprofit leadership political giving (FEC analysis)
contacts_constituents.parquet # Donors, volunteers, members, beneficiaries
campaigns_committees.parquet # Political committees/PACs (FEC)
campaigns_contributions.parquet # Political contributions (FEC)
nonprofits_memberships.parquet
nonprofits_volunteer_activities.parquet
nonprofits_program_delivery.parquet
nonprofits_program_outcomes.parquet
grants_federal_grants.parquet
contacts_officials.parquet
contacts_official_roles.parquet
contacts_official_contacts.parquet
contacts_official_identifiers.parquet
contacts_official_links.parquet
contacts_candidates.parquet
contacts_nonprofit_donors.parquet
contacts_constituents.parquet
bills_bills.parquet
bills_bill_actions.parquet
bills_bill_sponsorships.parquet
bills_bill_abstracts.parquet
bills_bill_versions.parquet
bills_bill_version_links.parquet
bills_bill_documents.parquet
bills_bill_document_links.parquet
bills_bill_subjects.parquet
bills_legislative_sessions.parquet
bills_vote_events.parquet
bills_vote_counts.parquet
bills_individual_votes.parquet
events.parquet
event_participants.parquet
event_agenda_items.parquet
event_documents.parquet
event_media.parquet
event_bills.parquet
budgets_city_budgets.parquet
surveys_national_polls.parquet
surveys_roper_questions.parquet
surveys_survey_providers.parquet
surveys_survey_studies.parquet
surveys_survey_variables.parquet
surveys_survey_responses.parquet
surveys_ipoll_metadata.parquet
factchecks_claim_reviews.parquet
factchecks_politifact.parquet
analytics_date_dimension.parquet
analytics_metric_views.parquet
analytics_temporal_relationships.parquet
standards_schema_org_jsonld.parquet
standards_popolo_exports.parquet
standards_ceds_aligned.parquet
standards_iati_activities.parquet
vocabulary_concept.parquet
vocabulary_vocabulary.parquet
vocabulary_concept_class.parquet
vocabulary_concept_relationship.parquet
β INCORRECT (using hyphens):
jurisdictions-cities.parquet
social-twitter.parquet
meetings-government-meetings.parquet
surveys-national-polls.parquet
factchecks-claim-reviews.parquet
analytics-date-dimension.parquet
standards-schema-org.parquet
```
**Why Underscores?**
- β
Python-friendly variable names (can use `data.jurisdictions_cities`)
- β
SQL-compatible column names
- β
Consistent with folder structure (`school_districts`, not `school-districts`)
- β
Better for programmatic access
- β
Avoids shell escaping issues
**Repository Name Exception:**
- HuggingFace repo: `CommunityOne/open-navigator-data` (hyphen is fine for URLs)
- File names inside repo: Use underscores (`jurisdictions_cities.parquet`)
## π Data Extraction Pipeline
### Phase 1: Discovery (Bronze Layer)
1. **Census Data** β Jurisdictions list
2. **GSA Domains** β Government websites
3. **NCES** β School districts with financial data (F-33 forms)
4. **IRS EO-BMF** β ALL 1.9M+ U.S. tax-exempt organizations (PRIMARY SOURCE)
5. **IRS TEOS** β Legacy nonprofit EINs (deprecated - use IRS EO-BMF)
6. **Census of Governments** β Municipal budgets & finances
7. **URL Discovery** β Meeting platforms, YouTube, budget PDFs
8. **Social Media** β Twitter, Facebook accounts
### Phase 2: Enrichment (Silver Layer)
1. **IRS EO-BMF** β Complete nonprofit registry with 28 data fields per organization
2. **ProPublica Nonprofit Explorer** β Enhanced financial data, detailed 990 filings
3. **Every.org API** β Nonprofit causes, missions, logos
3. **ARDA (Association of Religion Data Archives)** β Congregation characteristics, health ministries
4. **HIFLD Places of Worship** β Geospatial church locations (350K+ congregations)
5. **National Congregations Study** β Social service provision patterns
6. **NCES F-33 Finance Survey** β School district budgets, per-pupil spending
7. **Census Annual Survey** β State/local government finances
8. **Municipal Securities Rulemaking Board (EMMA)** β Bond debt data
9. **YouTube API** β Channel statistics
10. **Open States PostgreSQL Database** β Complete legislative data (~10 GB monthly dump)
- **8,600+ people** (legislators, governors, mayors) across all 50 states + DC + Puerto Rico
- **1.5M+ bills** with full text and history
- **13M+ bill actions** (introduced, committee, amendments, floor votes, signed)
- **7.2M+ sponsorships** (primary sponsors and co-sponsors)
- **3.5M+ bill versions** (as introduced, committee substitute, enrolled, enacted)
- **180K+ events** (legislative meetings, hearings, committee sessions)
- **835K+ event participants** (who spoke, testified, or attended)
- **524K+ agenda items** from meetings
- **Vote events** with individual legislator positions
- **Organizations** (legislative bodies, committees)
- **Jurisdictions** (states, territories)
- Updated monthly from https://data.openstates.org/postgres/monthly/
11. **OpenStates People Repository** β Current legislator contact info
- GitHub repo: https://github.com/openstates/people
- YAML files with email, phone, district offices
- Social media profiles and website links
- Updated daily via automated scrapers
12. **Wikidata SPARQL** β Entity relationships
12. **DBpedia** β Wikipedia structured data
13. **Google Civic** β Representatives
14. **OpenFEC API** β Political contributions, candidates, committees (campaign finance)
15. **GitHub API** β Civic tech projects, contributors, issues
16. **Civic Tech Field Guide** β Curated project taxonomy
17. **Code for America** β Brigade projects and hackathons
18. **Digital Public Goods Alliance** β DPG-certified open source projects
### Phase 3: Processing (Gold Layer)
1. **Meeting Extraction** β Agenda/minutes text
2. **Video Transcripts** β YouTube captions
3. **Document Analysis** β Keyword detection
4. **Relationship Mapping** β Entity connections
5. **Oral Health Filtering** β Topic classification
6. **Temporal Indexing** β Date dimension table, time-series relationships
7. **Metric View Creation** β Pre-computed analytics (advocacy activity, government spending, nonprofit impact)
8. **Schema.org JSON-LD** β Structured data exports (Event, Person, Organization, Legislation, ClaimReview)
9. **Popolo Compliance** β Open government standard exports (Person, Organization, Membership, VoteEvent)
10. **CEDS Alignment** β Education data mapping to NCES Element IDs and Option Sets
### New Dataset Categories Explained
#### π Analytics Datasets
**Purpose:** Enable time-series analysis, trend detection, and dashboard metrics without complex SQL queries.
| Dataset | Description | Refresh Frequency |
|---------|-------------|-------------------|
| `analytics_date_dimension` | Calendar reference table with fiscal years, quarters, day-of-week, holidays | Static (updated annually) |
| `analytics_temporal_relationships` | Pre-joined date keys for all time-based entities (meetings, votes, budgets, filings) | Daily |
| `analytics_metric_views` | Pre-computed analytics like advocacy_activity, government_spending, nonprofit_impact | Hourly |
| `analytics_aggregated_stats` | Monthly/quarterly/yearly rollups (meeting counts, budget totals, grant sums) | Daily |
| `analytics_dashboard_metrics` | Real-time feeds for dashboards (active meetings today, trending topics, hot ballot measures) | Every 5 minutes |
**Example Use Case:**
```sql
-- Instead of complex joins, use metric view:
SELECT * FROM analytics_metric_views
WHERE metric_name = 'advocacy_activity'
AND jurisdiction_id = 'ocd-division/country:us/state:al/place:birmingham'
AND date_period = '2024-Q1';
```
#### π Standards-Compliant Exports
**Purpose:** Maximum interoperability with civic tech platforms, search engines, and semantic web tools.
| Dataset | Standard | Use Case | Consumers |
|---------|----------|----------|-----------|
| `standards_schema_org_jsonld` | Schema.org JSON-LD | Google Search rich results, voice assistants | Google, Bing, Alexa, Siri |
| `standards_popolo_exports` | Popolo Project | Civic tech platform integration | mySociety, OpenNorth, Sunlight Foundation |
| `standards_ceds_aligned` | Common Education Data Standards | Education data exchange, NCES reporting | State education depts, Ed-Fi, IMS Global |
| `standards_ocd_divisions` | Open Civic Data IDs | Cross-platform jurisdiction referencing | Google Civic, Ballotpedia, Vote Smart |
| `standards_rdf_triples` | RDF/Turtle | Linked open data, knowledge graphs | DBpedia, Wikidata, SPARQL endpoints |
**Example Schema.org Export:**
```json
{
"@context": "https://schema.org",
"@type": "GovernmentOrganization",
"name": "Birmingham City Council",
"address": {
"@type": "PostalAddress",
"addressLocality": "Birmingham",
"addressRegion": "AL"
},
"event": [{
"@type": "Event",
"name": "Regular City Council Meeting",
"startDate": "2024-01-15T18:00:00-06:00"
}]
}
```
#### β
Fact-Checking Datasets
**Purpose:** Verify claims made in meetings, legislation, and political speech.
| Dataset | Source | Fields | Update Frequency |
|---------|--------|--------|------------------|
| `factchecks_claim_reviews` | Google Fact Check API | claimReviewed, reviewRating, author, datePublished | Daily |
| `factchecks_politifact` | PolitiFact web scraping | claim, ruling, truth_o_meter, context | Daily |
| `factchecks_factcheck_org` | FactCheck.org API/scraping | claim, verdict, analysis, sources | Daily |
| `factchecks_verified_claims` | Aggregated + deduplicated | claim_text, consensus_rating, verification_sources | Daily |
**Integration with Meetings:**
- Cross-reference meeting transcripts with verified claims
- Flag unverified statements in legislative debates
- Track politician accuracy scores over time
## οΏ½π Complete Data Model (ERD)
## βοΈ Political Economy Analysis Framework
The ERD includes specialized entities for **political economy analysis** - understanding the "WHY" behind government decisions, not just the "WHAT". These entities support a 4-step advocacy framework to expose gaps between rhetoric and reality:
### The 4-Step Framework for Effective Change
#### Step 1: Rhetoric Gap - **Frame Analysis**
**Goal:** Establish they ALREADY agree it's important (stop the "need" debate)
**ERD Support:**
- **DECISION_FRAME**: Tracks how decisions are framed ("public health" vs "fiscal responsibility" vs "equity")
- **KEYWORD_DENSITY**: Measures rhetoric patterns ("priority", "essential", "critical" per 1000 words)
- **POLICY_DECISION.primary_frame**: Primary framing language used
**Analysis Output:**
```
Frame Distribution:
12x public health
8x fiscal responsibility
5x equity/access
3x economic development
β They SAY oral health is a "priority" - hold them to it!
```
#### Step 2: Displacement Matrix - **Budget-to-Minutes Delta**
**Goal:** Show they HAD the money (stop the "budget constraint" excuse)
**ERD Support:**
- **BUDGET_LINE_ITEM**: Detailed budget categories with year-over-year changes
- **BUDGET_DELTA**: Compares meeting rhetoric to actual funding changes
- **BUDGET_DELTA.delta_type**: Classifies as "Expansion", "Lip Service", or "Hidden Priority"
- **BUDGET_DELTA.delta_score**: Quantifies rhetoric vs reality gap (-1 to +1)
**Analysis Output:**
```
π Lip Service Detected:
β’ School dental program: -$50,000 decrease
Mentioned 12x in meetings as "critical for children"
Logic: PERFORMATIVE POLITICS
π° Hidden Priority (Where the money REALLY went):
β’ IT Infrastructure: +$200,000 increase
Only mentioned 1x in meetings
Logic: Bureaucratic inertia - avoiding scrutiny
```
#### Step 3: Influence Radar - **Stakeholder Analysis**
**Goal:** Name who's blocking it (force personal accountability)
**ERD Support:**
- **STAKEHOLDER_POSITION**: Who spoke for/against with their arguments
- **STAKEHOLDER_POSITION.speaking_order**: Track who speaks when (early speakers often most influential)
- **DECISION_VOTE**: Individual vote records with stated reasons
- **DECISION_VOTE.switched_position**: Flag when officials change their stance
**Analysis Output:**
```
π₯ Blocking Coalition:
β’ Taxpayer Association (opponent) - spoke 1st
Argument: "Cost concerns in tight budget"
Counter: School nurses budget UP $300K same meeting
β’ Council Member Smith (voted NO)
Stated reason: "Need more study"
Pattern: Has voted NO on 3 oral health items since 2022
```
#### Step 4: Deferral Pattern - **Temporal Voting Analysis**
**Goal:** Show they're stalling, not studying (expose the tactic)
**ERD Support:**
- **DEFERRAL_PATTERN**: Tracks same topic across multiple meetings
- `first_mentioned`: When decision was first introduced
- `last_discussed`: Most recent tabling date
- `total_deferrals`: How many times tabled/postponed
- `months_in_limbo`: Time elapsed since first mention
- `pattern_type`: "Rationale of Attrition" / "Sincere Analysis" / "Political Timing"
- `strategic_inference`: Inferred reason for delay
- `next_review_date`: When scheduled to revisit (if stated)
- **DEFERRAL_INSTANCE**: Individual tabling events with dates
- `deferral_date`: When it was tabled
- `stated_reason`: Official justification given
- `speaker`: Who gave the reason
- `months_since_first`: Time elapsed
- `reason_changed`: Flag for shifting justifications
- **POLICY_DECISION.outcome**: Captures "tabled/deferred/postponed"
- **DECISION_OPTION.rejection_reason**: Stated excuses for delay
- **ELECTION_CYCLE**: Links decisions to election timelines
- **ELECTION_CYCLE.pre_election_spike_detected**: Flag incumbent protection patterns
**Complete Date Tracking:**
```sql
-- All dates needed to track stalling:
MEETING.meeting_date -- When meeting occurred
POLICY_DECISION.meeting_date -- When decision was discussed
DEFERRAL_PATTERN.first_mentioned -- First introduction date
DEFERRAL_PATTERN.last_discussed -- Most recent tabling
DEFERRAL_INSTANCE.deferral_date -- Each individual deferral
ELECTION_CYCLE.election_date -- Election timing context
DECISION_VOTE.days_since_election -- Temporal political context
```
**Analysis Output:**
```
π
Stalling Pattern DETECTED:
β’ Topic: "Fluoridation proposal"
β’ First mentioned: 2020-03-15 (4 years, 1 month ago)
β’ Total deferrals: 4 times
β’ Pattern type: Rationale of Attrition
π Timeline of Shifting Justifications:
- 2020-03: "Need more study" (12mo before election)
- 2020-09: "Budget constraints" (6mo before election)
- 2022-01: "Need public input" (new council members)
- 2024-05: "Awaiting staff report" (still pending)
π Strategic Inference:
"They're NOT studying - they're AVOIDING! The board isn't
debating the merit; they're waiting for the advocate's
momentum to fade before the next election cycle."
π¨ Discomfort Score: 10/10 (Extremely politically sensitive)
```
### Quantitative "Why" Indicators
Additional metrics to infer governance logic:
| Entity | Metric | Reveals |
|--------|--------|---------|
| **POLICY_DECISION.contention_score** | Ratio of dissent (0-100) | Political sensitivity of topic |
| **KEYWORD_DENSITY** | "grant" vs "taxpayer" frequency | Decision driver: outside funding vs local demand |
| **KEYWORD_DENSITY** | "emergency" occurrence | Reactive vs planned governance |
| **ELECTION_CYCLE.avg_project_cost_before** | Spending spikes pre-election | Incumbency protection tactics |
| **BUDGET_DELTA.underlying_logic** | Genuine vs Performative vs Bureaucratic | Real priorities revealed |
### Implementation Files
These analyses are **fully implemented** in the codebase:
- `extraction/decision_analyzer.py` - Frame analysis, stakeholder extraction
- `extraction/budget_analyzer.py` - Budget delta calculation, opportunity cost mapping
- `extraction/temporal_analyzer.py` - Election cycle analysis, deferral patterns
- `examples/tuscaloosa_political_economy.py` - Complete end-to-end analysis
See **[Political Economy Analysis Guide](/docs/guides/political-economy)** for detailed implementation status and usage examples.
---
## π Data Standards & Interoperability
### Popolo Project Alignment
Our data model follows the [Popolo Project](https://www.popoloproject.com/) specification for representing people, organizations, and elected positions. Popolo is an international open government data standard adopted by 30+ civic tech organizations worldwide including mySociety, Sunlight Foundation, OpenNorth, and Civic Commons.
#### Popolo Class Mappings
| Popolo Class | Our Entity | Description | Key Fields |
|--------------|------------|-------------|------------|
| **Person** | LEADER | Elected officials, government employees | `full_name`, `email`, `phone`, `photo_url` |
| **Organization** | ORGANIZATION | Nonprofits, government agencies, companies | `name`, `org_type`, `address`, `website` |
| **Membership** | LEADER β ORGANIZATION | Relationship between people and organizations | `leader_id`, `jurisdiction_id`, `term_start`, `term_end` |
| **Post** | LEADER.position_type | Positions within organizations | `title`, `office`, `position_type` |
| **Contact Detail** | Embedded fields | Communication methods | `email`, `phone`, `website` in multiple entities |
| **Motion** | AGENDA, LEGISLATION | Formal proposals for decision | `title`, `description`, `status` |
| **Vote Event** | VOTE | Voting on motions/bills | `vote_date`, `item_description` |
| **Count** | VOTE, LEGISLATION | Vote tallies | `vote_yes`, `vote_no`, `vote_value` |
| **Area** | JURISDICTION | Geographic/political boundaries | `jurisdiction_type`, `state_code`, `county_name` |
| **Event** | MEETING | Gatherings with agendas | `meeting_date`, `meeting_type`, `body_name` |
| **Speech** | MINUTES, VIDEO | Spoken statements | `transcript_text`, `summary_text` |
#### Schema.org Type Mappings
Our entities map to [Schema.org](https://schema.org/) types for SEO-optimized structured data and semantic web compatibility:
| Our Entity | Schema.org Type | Properties | JSON-LD Export |
|------------|----------------|------------|----------------|
| JURISDICTION | [AdministrativeArea](https://schema.org/AdministrativeArea) | name, address, geo, telephone, url | β
City/county pages |
| MEETING | [Event](https://schema.org/Event) | name, startDate, endDate, location, organizer | β
Google Calendar rich results |
| LEADER | [Person](https://schema.org/Person) + [GovernmentOfficial](https://schema.org/GovernmentOfficial) | name, email, telephone, jobTitle | β
Official profiles |
| ORGANIZATION | [Organization](https://schema.org/Organization) + [NGO](https://schema.org/NGO) | name, address, telephone, foundingDate | β
Nonprofit listings |
| LEGISLATION | [Legislation](https://schema.org/Legislation) | name, legislationDate, legislationPassedBy | β
Bill tracking |
| BALLOT_MEASURE | [Legislation](https://schema.org/Legislation) | name, datePosted, legislationChanges | β
Ballot guides |
| VOTE | [VoteAction](https://schema.org/VoteAction) | agent, candidate, actionOption | β
Voting records |
| FACT_CHECK | [ClaimReview](https://schema.org/ClaimReview) | claimReviewed, reviewRating, author | β
Google Fact Check Explorer |
| SCHOOL_DISTRICT | [EducationalOrganization](https://schema.org/EducationalOrganization) | name, numberOfStudents, address | β
School district info |
| VIDEO | [VideoObject](https://schema.org/VideoObject) | name, description, uploadDate, duration | β
YouTube integration |
| DOCUMENT | [DigitalDocument](https://schema.org/DigitalDocument) | name, fileFormat, datePublished | β
Document library |
| **Microsoft CDM Nonprofit Entities** | | | |
| CONSTITUENT | [Person](https://schema.org/Person) | name, email, telephone, address | β
Donor/volunteer profiles |
| DONATION | [DonateAction](https://schema.org/DonateAction) | agent (Person), recipient (Organization), price | β
Donation receipts |
| CAMPAIGN | [FundingScheme](https://schema.org/FundingScheme) | name, startDate, endDate, url | β
Fundraising campaigns |
| MEMBERSHIP | [ProgramMembership](https://schema.org/ProgramMembership) | member (Person), hostingOrganization, membershipNumber | β
Member cards |
| VOLUNTEER_ACTIVITY | [VolunteerAction](https://schema.org/VolunteerAction) | agent (Person), startTime, endTime, location | β
Volunteer tracking |
| PROGRAM_DELIVERY | [Service](https://schema.org/Service) | name, provider, serviceType, areaServed | β
Program catalog |
| PROGRAM_OUTCOME | [Observation](https://schema.org/Observation) | measurementTechnique, measuredValue, observationDate | β
Impact reporting |
**Example: Meeting as Schema.org Event**
```json
{
"@context": "https://schema.org",
"@type": "Event",
"@id": "https://www.communityone.com/meetings/city-council-2024-01-15",
"name": "Birmingham City Council Regular Meeting",
"description": "Monthly city council meeting covering budget, zoning, and public health initiatives",
"startDate": "2024-01-15T18:00:00-06:00",
"endDate": "2024-01-15T20:30:00-06:00",
"eventStatus": "https://schema.org/EventScheduled",
"eventAttendanceMode": "https://schema.org/MixedEventAttendanceMode",
"location": {
"@type": "Place",
"name": "Birmingham City Hall",
"address": {
"@type": "PostalAddress",
"streetAddress": "710 N 20th St",
"addressLocality": "Birmingham",
"addressRegion": "AL",
"postalCode": "35203"
}
},
"organizer": {
"@type": "GovernmentOrganization",
"name": "Birmingham City Council",
"url": "https://www.birminghamal.gov/council/"
},
"recordedIn": {
"@type": "VideoObject",
"name": "City Council Meeting Recording",
"uploadDate": "2024-01-16",
"duration": "PT2H30M",
"thumbnailUrl": "https://example.com/thumbnail.jpg",
"contentUrl": "https://youtube.com/watch?v=example"
},
"subEvent": [
{
"@type": "Event",
"name": "Public Comment Period",
"startDate": "2024-01-15T18:15:00-06:00"
}
]
}
```
#### Common Education Data Standards (CEDS) Alignment
Our SCHOOL_DISTRICT entity follows [CEDS](https://ceds.ed.gov/) specifications for education data interoperability:
| Our Field | CEDS Element | Element ID | NCES Alignment |
|-----------|--------------|------------|----------------|
| `nces_id` | LEA Identifier (NCES) | 000827 | CCD LEA ID |
| `district_name` | Name of Institution | 000168 | Official district name |
| `district_type` | LEA Type | 000108 | Regular/Specialized/Service Agency |
| `total_students` | Student Count | 001475 | Fall enrollment |
| `total_schools` | Number of Schools | 000856 | Operational schools count |
| `total_revenue` | Total Revenue | 000612 | F-33 Survey Line A09 |
| `total_expenditures` | Total Expenditures | 000611 | F-33 Survey Line B13 |
| `per_pupil_spending` | Expenditure per Student | 000613 | Total exp / enrollment |
| `federal_revenue` | Federal Revenue | 000614 | ESEA Title I, IDEA |
| `state_revenue` | State Revenue | 000615 | State aid formulas |
| `local_revenue` | Local Revenue | 000616 | Property tax, bonds |
| `superintendent` | Chief Administrator | 000240 | District superintendent |
| `school_year` | School Year | 000243 | YYYY-YYYY format |
**CEDS Option Sets:**
- **LEA Type** (000108): Regular local school district, Specialized (charter/magnet), Supervisory Union, Service Agency, State/Federal Agency
- **Operational Status** (000533): Open, Closed, New, Changed Agency, Temporarily Closed
- **Locale Type** (001315): City (Large/Midsize/Small), Suburb, Town, Rural (NCES Urban-centric codes)
**Benefits:**
- β
Compatible with NCES Common Core of Data (CCD) and F-33 Finance Survey
- β
Aligns with Ed-Fi Alliance, IMS Global, and SIF Association standards
- β
Supports federal reporting for ESSA, Title I, IDEA compliance
#### Microsoft Common Data Model for Nonprofits
Our nonprofit constituent management entities follow [Microsoft's Common Data Model for Nonprofits](https://github.com/microsoft/Nonprofits/), enabling seamless integration with Dynamics 365 and Power Platform:
| Our Entity | Microsoft CDM Entity | Description | Key Relationships |
|------------|---------------------|-------------|------------------|
| CONSTITUENT | Constituent | Donors, volunteers, members, beneficiaries | β DONATION, MEMBERSHIP, VOLUNTEER_ACTIVITY |
| DONATION | Donation | Financial contributions and in-kind gifts | β CONSTITUENT, β CAMPAIGN, β DESIGNATION |
| CAMPAIGN | Campaign | Fundraising campaigns and appeals | β DONATION |
| DESIGNATION | Designation | Fund allocation (programs, unrestricted, endowment) | β DONATION |
| MEMBERSHIP | Membership | Member enrollment and renewals | β CONSTITUENT, β ORGANIZATION |
| VOLUNTEER_ACTIVITY | Volunteer Preference | Volunteer activities and hours | β CONSTITUENT |
| PROGRAM_DELIVERY | Delivery Framework | Programs and services delivered | β ORGANIZATION, β PROGRAM_OUTCOME |
| PROGRAM_OUTCOME | Objective | Measurable impact and KPIs | β PROGRAM_DELIVERY |
**Microsoft CDM Core Patterns:**
1. **Constituent-Centric Design**: All engagement activities (donations, volunteering, membership) link to CONSTITUENT
2. **Designation-Based Accounting**: Donations are allocated to specific designations (funds, programs, campaigns)
3. **Campaign Tracking**: Multi-channel fundraising campaigns track goals, raised amounts, donor counts
4. **Outcome Measurement**: Programs track objectives with target vs. actual metrics
5. **Temporal Tracking**: Start/end dates on memberships, campaigns, and programs for lifecycle management
**Integration Points:**
| Microsoft Product | Integration Type | Use Case |
|------------------|------------------|----------|
| **Dynamics 365 Nonprofit** | Native CDM compatibility | CRM for constituent relationship management |
| **Power BI** | Direct data connection | Fundraising dashboards, donor analytics |
| **Power Apps** | Low-code app builder | Volunteer management apps, event registration |
| **Power Automate** | Workflow automation | Donation receipts, membership renewals |
| **Azure Synapse** | Cloud analytics | Large-scale constituent analytics |
**Example: Constituent-Donation Relationship**
```sql
-- Find top 10 donors by lifetime giving
SELECT
c.constituent_id,
c.first_name,
c.last_name,
c.email,
c.lifetime_giving_total,
COUNT(d.donation_id) as donation_count,
AVG(d.amount) as avg_donation,
MAX(d.donation_date) as last_donation_date
FROM CONSTITUENT c
JOIN DONATION d ON c.constituent_id = d.constituent_id
WHERE c.constituent_type = 'Donor'
GROUP BY c.constituent_id, c.first_name, c.last_name, c.email, c.lifetime_giving_total
ORDER BY c.lifetime_giving_total DESC
LIMIT 10;
```
**Benefits:**
- β
**Microsoft Ecosystem**: Native compatibility with Dynamics 365, Power Platform, Azure
- β
**Industry Standard**: Used by large nonprofits (United Way, Boys & Girls Clubs, etc.)
- β
**Grant Reporting**: Built-in support for outcome tracking and funder reporting
- β
**LYBNT Analysis**: "Last Year But Not This Year" donor reactivation queries
- β
**Constituent 360**: Unified view of all engagement touchpoints
#### Underlying Standards
Popolo builds upon W3C, IETF, and DCMI specifications for maximum interoperability:
| Standard | Use Case | Example in Our Model |
|----------|----------|---------------------|
| **FOAF** (Friend of a Friend) | Social network, people relationships | LEADER connections, ORGANIZATION networks |
| **vCard** (IETF RFC 6350) | Contact information | email, phone, address fields across entities |
| **Schema.org** | Structured web data | Meeting metadata, organization profiles for SEO |
| **DCMI Terms** | Metadata, provenance | `created_at`, `updated_at`, `source_url` timestamps |
| **W3C Organization Ontology** | Hierarchical organizations | Government hierarchy, nonprofit structures |
| **ISA Location Core Vocabulary** | Address standardization | `address`, `city`, `state_code`, `latitude`, `longitude` |
| **GeoNames Ontology** | Geographic identifiers | Place names, jurisdiction boundaries |
| **SKOS** | Taxonomies and classification | NTEE codes, policy topic categories |
#### Benefits of Standards Compliance
1. **Interoperability**: Data can be easily shared with other civic tech platforms
2. **API Compatibility**: Standard field names work with existing tools (e.g., EveryPolitician, Open Civic Data)
3. **Semantic Web**: RDF/JSON-LD export capabilities for linked open data
4. **Tooling**: Existing libraries and validators (e.g., `pupa`, `everypolitician-popolo`)
5. **Documentation**: Well-documented schemas reduce onboarding time
#### Example: Popolo-Compatible JSON-LD Export
```json
{
"@context": "http://www.popoloproject.com/contexts/person.jsonld",
"@type": "Person",
"id": "ocd-person/12345678-90ab-cdef-1234-567890abcdef",
"name": "Jane Doe",
"email": "jane.doe@example.gov",
"links": [{
"note": "official website",
"url": "https://example.gov/mayor"
}],
"memberships": [{
"@type": "Membership",
"organization_id": "ocd-organization/jurisdiction/us/city/springfield",
"post_id": "mayor",
"role": "Mayor",
"start_date": "2022-01-01",
"end_date": "2026-12-31"
}],
"contact_details": [{
"type": "email",
"value": "jane.doe@example.gov",
"note": "official"
}, {
"type": "voice",
"value": "+1-555-123-4567"
}],
"sources": [{
"url": "https://example.gov/government/officials",
"note": "Official city website"
}]
}
```
### Open Civic Data (OCD-ID) Identifiers
We use OCD-IDs for jurisdiction identifiers following [OCDEP 2](https://open-civic-data.readthedocs.io/en/latest/proposals/0002.html):
```
Format: ocd-division/country:/:
Examples:
- State: ocd-division/country:us/state:al
- County: ocd-division/country:us/state:al/county:jefferson
- City: ocd-division/country:us/state:al/place:birmingham
- School District: ocd-division/country:us/state:al/school_district:birmingham_city
```
**Normalization Rules:**
- Lowercase ASCII characters (a-z)
- Numbers (0-9)
- Valid punctuation: `._~-`
- Spaces β underscores
- Remove special characters
## π Data Statistics
| Entity Type | Estimated Count | Source |
|------------|----------------|--------|
| Jurisdictions | 22,000+ | Census Gazetteer |
| Counties | 3,144 | FIPS codes |
| Cities | 19,000+ | Incorporated places |
| School Districts | 13,000+ | NCES CCD |
| School District Budgets | 13,000+ | NCES F-33 Finance Survey |
| Government Budgets | 22,000+ | Census of Governments |
| Municipal Bonds | TBD | EMMA (MSRB) |
| Nonprofits | 3,000,000+ | IRS TEOS |
| Nonprofit 990 Filings | 10,000,000+ | ProPublica (10+ years) |
| **Microsoft CDM: Nonprofit Engagement** | | |
| Constituents | TBD | Donors, volunteers, members, beneficiaries (Microsoft CDM) |
| Donations | TBD | Financial contributions and in-kind gifts (Microsoft CDM) |
| Campaigns | TBD | Fundraising campaigns and appeals (Microsoft CDM) |
| Memberships | TBD | Member enrollments and renewals (Microsoft CDM) |
| Volunteer Activities | TBD | Volunteer hours and service events (Microsoft CDM) |
| Program Delivery Records | TBD | Programs and services delivered (Microsoft CDM) |
| Program Outcomes | TBD | Impact metrics and KPIs (Microsoft CDM) |
| Grants (Individual Awards) | TBD | IRS 990-I, USASpending.gov, Foundation Center |
| Federal Grants | 100,000+ | USASpending.gov API |
| Nonprofit Causes | 600+ | NTEE + Every.org |
| YouTube Channels | 5,000+ | Discovery pipeline |
| Meeting Platforms | 10,000+ | URL detection |
| State Legislators | 7,300+ | Open States |
| Meetings & Events | 500,000+ | Scraped (govt, hearings, events, trainings) |
| Trainings | TBD | Professional development, workshops |
| Documents | 2,000,000+ | PDF extraction |
| Ballot Measures | TBD | State/local election sites |
| State Bills | 100,000+ | Open States API |
| Policy Topics | ~50 | Curated + extracted |
| **Analytics & Standards** | | |
| Date Dimension Records | ~7,300 | 20 years (2010-2030) |
| Metric Views | ~100 | Pre-computed analytics definitions |
| Temporal Relationships | ~1M+ | Date keys for all time-based entities |
| Schema.org JSON-LD Exports | ~500K+ | Event, Person, Organization, Legislation, ClaimReview |
| Popolo Exports | ~100K+ | Person, Organization, Membership, VoteEvent |
| CEDS-Aligned Records | 13,000+ | School districts with NCES Element IDs |
| OCD Division IDs | 22,000+ | All jurisdictions with standardized identifiers |
| IATI Activity Files | TBD | Programs, grants, humanitarian aid (v2.03 XML) |
| **Fact-Checking** | | |
| Verified Claims | ~50K+ | Google Fact Check API |
| PolitiFact Ratings | ~20K+ | Truth-O-Meter rulings |
| FactCheck.org Articles | ~10K+ | Verified fact-checks |
| **Vocabulary & Concepts (OMOP-Inspired)** | | |
| Concept Entries | ~2M+ | Cities, nonprofits, officials, topics, demographics |
| Vocabularies | 10+ | OCD_ID, IRS_NTEE, US_Census, NCES, OHDSI (Gender/Race/Ethnicity) |
| Concept Classes | 20+ | City, County, 501c3, Mayor, Health Policy, etc. |
| Concept Relationships | ~5M+ | Hierarchies (CityβCountyβState), Associations (TopicβLegislation) |
| OHDSI Gender Concepts | 3 | MALE, FEMALE, OTHER (Athena standard) |
| OHDSI Race Concepts | 20+ | Census OMB categories (Athena standard) |
| OHDSI Ethnicity Concepts | 2 | Hispanic or Latino, Not Hispanic or Latino (Athena standard) |
## οΏ½ See Also
:::tip[Complete Citations & Attributions]
For full citations, licenses, BibTeX references, and detailed attribution for all data sources, standards, and research:
π **[View Citations & Data Sources](./citations.md)**
Includes academic research, government data APIs, civic tech standards (OCD-ID, Popolo, Schema.org, CEDS, IATI), Microsoft CDM, OMOP CDM, fact-checking sources, and more.
:::
## οΏ½π Meeting & Event Types
### Event Categories in the MEETING Entity
The MEETING entity tracks **4 main event categories** to capture all civic engagement opportunities:
#### 1. **Government Meetings** (`event_category: "government_meeting"`)
- City council meetings, school board meetings, county commissions
- Official business conducted by elected bodies
- **Fields:** `body_name`, `meeting_type` (regular, special, emergency)
- **Example:** "Tuscaloosa City Council Regular Meeting - 3rd Tuesday"
#### 2. **Public Hearings** (`event_category: "public_hearing"`)
- Public comment sessions on specific issues
- Budget hearings, zoning hearings, policy feedback
- **Fields:** `meeting_type` (budget, zoning, policy)
- **Example:** "Public Hearing on FY2026 Water System Fluoridation Budget"
#### 3. **Community Events** (`event_category: "community_event"`)
- Town halls, community forums, listening sessions
- Informal engagement between government and citizens
- **Fields:** `location_type` (in-person, virtual, hybrid)
- **Example:** "Town Hall on Community Health Priorities"
#### 4. **Trainings** (`event_category: "training"`) β NEW
- Professional development workshops
- Continuing education for healthcare workers, teachers, officials
- Certification courses, skill-building sessions
- **Fields:**
- `training_topic` - Subject matter (e.g., "Pediatric Oral Health", "Water Fluoridation Safety")
- `target_audience` - Who should attend (e.g., "Dental Hygienists", "School Nurses", "Water Operators")
- `presenter` - Trainer/instructor name or organization
- `requires_registration` - Boolean flag
- `registration_fee` - Cost to attend (0 for free)
- `max_capacity` - Attendance limit
- `end_date` - Training end time (multi-day events)
- **Example:** "Fluoride Varnish Application Training for School Nurses (3 CEU)"
### Why Trainings Matter for Advocacy
**Capacity Building:**
- β
Identify training gaps ("No fluoride varnish training in past 2 years")
- β
Track professional development opportunities
- β
Monitor continuing education credits (CEUs) offered
**Stakeholder Engagement:**
- β
Find healthcare workers trained in specific skills
- β
Identify champions (frequent training attendees)
- β
Target outreach to trained professionals
**Policy Implementation:**
- β
"City wants dental screenings but no trained staff" β Show available trainings
- β
Track certification status (who's qualified to implement policy)
- β
Link training availability to policy feasibility
**Example Questions Now Answerable:**
1. "What oral health trainings are offered in Alabama?" β Filter by `training_topic` LIKE '%oral%'
2. "Which jurisdictions offer free fluoride training?" β `registration_fee = 0` AND `training_topic` LIKE '%fluoride%'
3. "How many school nurses attended varnish training last year?" β Count attendees by `target_audience`
4. "Are there upcoming water fluoridation operator trainings?" β `training_topic` AND `meeting_date` > TODAY
### Meeting Types Within Each Category
**Government Meetings:**
- Regular sessions, special sessions, emergency meetings
- Work sessions, committee meetings, executive sessions
**Public Hearings:**
- Budget hearings, zoning hearings, policy feedback sessions
- Environmental impact hearings, license applications
**Community Events:**
- Town halls, listening sessions, community forums
- Neighborhood meetings, stakeholder roundtables
**Trainings:**
- Professional development workshops
- Certification courses (CPR, fluoride application, etc.)
- Continuing education (CEU/CME credits)
- Skill-building sessions (motivational interviewing, cultural competency)
## οΏ½π° Nonprofit Funding Source Tracking
### Revenue Source Breakdown (Form 990 Data)
The NONPROFIT_FINANCES entity tracks **10 different revenue sources** to understand how nonprofits are funded:
#### 1. **Grant Revenue** (Institutional Funding)
- `government_grants` - Federal, state, local government grants
- `foundation_grants` - Private foundation grants (Gates, Ford, etc.)
- **Why it matters:** Grant-dependent orgs may be less sustainable, more restrictive
#### 2. **Donation Revenue** (Community Funding)
- `individual_donations` - Direct donations from people
- `corporate_donations` - Corporate giving programs
- `membership_dues` - Member subscriptions/fees
- **Why it matters:** Grassroots funding = community support, more flexible use
#### 3. **Earned Revenue** (Self-Sufficiency)
- `program_service_revenue` - Fees for services (clinic visits, classes, etc.)
- `special_events_revenue` - Galas, fundraisers, events
- `rental_income` - Property rentals
- `sale_of_assets` - Asset sales
- **Why it matters:** Self-generated revenue = sustainability, independence
#### 4. **Investment Revenue**
- `investment_income` - Interest, dividends, capital gains
- **Why it matters:** Endowment size, financial health
#### 5. **Other Revenue**
- `other_revenue` - Miscellaneous sources
- **Why it matters:** Unusual funding patterns
### Calculated Metrics
- **`overhead_ratio`** = (admin_expenses + fundraising_expenses) / total_expenses
- Lower = more efficient (more goes to programs)
- Industry benchmark: <25% overhead is "good"
- **`fundraising_efficiency`** = contributions_received / fundraising_expenses
- Higher = better (more money raised per dollar spent)
- Industry benchmark: $4+ raised per $1 spent
### Why This Matters for Advocacy
**Find sustainable partners:**
- β
High individual donations = community trust
- β
Diversified revenue = financial stability
- β οΈ Single-grant dependent = risky partnership
**Evaluate efficiency:**
- β
Low overhead ratio = more program dollars
- β
High fundraising efficiency = good stewardship
- β οΈ High admin costs = potential waste
**Identify funding gaps:**
- Compare similar nonprofits' revenue mix
- Find underutilized funding sources (e.g., membership programs)
- Target corporate donation opportunities
**Example Questions Now Answerable:**
1. "Which dental nonprofits have the most individual donors?" (community support)
2. "What's the average overhead for oral health organizations?" (efficiency benchmark)
3. "Are dental nonprofits more grant-dependent or self-sufficient?" (sustainability)
4. "Which funders support oral health work?" (foundation grants analysis)
## π΅ Grant Tracking System
### Individual Grant Transactions (GRANT Entity)
The GRANT entity tracks **individual grant awards** beyond just aggregate 990 financials. This provides transaction-level detail for:
#### Grant Fields
- **Recipient Info:** `recipient_ein`, `recipient_name`, `recipient_type` (nonprofit, government, etc.)
- **Funder Info:** `funder_name`, `funder_ein`, `funder_type` (foundation, government, corporate)
- **Grant Details:** `grant_amount`, `grant_purpose`, `program_area`
- **Timeline:** `award_date`, `start_date`, `end_date`, `grant_duration_months`
- **Status:** `grant_status` (active, completed, terminated)
- **Type:** `funding_source` (federal, state, foundation, corporate)
- **Restrictions:** `multi_year`, `restrictions`, `reporting_requirements`
#### Data Sources
**IRS Form 990 Schedule I:**
- Grants PAID by nonprofits to other organizations
- Required for organizations granting >$5,000/year
- Shows foundation giving patterns
**USASpending.gov API (FREE):**
- All federal grants to states, localities, nonprofits
- Contract and grant transactions $25K+
- Real-time data updated daily
**Foundation Center/Candid:**
- Private foundation grants (990-PF data)
- Grant descriptions, amounts, recipients
**State Grant Databases:**
- State-level grant programs
- Varies by state
### Why Grant Tracking Matters
**Follow the Money:**
- β
"Who funds oral health work in Alabama?" β Track all grants by `program_area`
- β
"Which foundations support fluoridation?" β Search grant purposes
- β
"How much federal money goes to dental access?" β Sum `funding_source = federal`
**Find Funding Opportunities:**
- β
Identify active grant programs (similar grants to similar orgs)
- β
Discover new funders entering a program area
- β
Track grant sizes and typical durations
**Partnership Intelligence:**
- β
"Who else is this foundation funding?" β Find collaborators
- β
"What's this nonprofit's grant portfolio?" β Assess stability
- β
Multi-year grants = long-term commitment signal
**Policy Implementation:**
- β
"Is there grant funding for this program?" β Search active grants
- β
"Which jurisdictions received similar grants?" β Learn from others
- β
Track grant requirements and restrictions
#### Example Questions Now Answerable:
1. **"What federal grants support dental health in Alabama schools?"**
β `funding_source = 'federal'` AND `program_area` LIKE '%dental%' AND `recipient_type = 'school_district'`
2. **"Which foundations give the largest oral health grants?"**
β GROUP BY `funder_name` WHERE `program_area` LIKE '%oral health%' ORDER BY SUM(`grant_amount`)
3. **"How long do typical dental access grants last?"**
β AVG(`grant_duration_months`) WHERE `program_area` = 'dental access'
4. **"Which nonprofits receive multi-year fluoridation funding?"**
β `multi_year = true` AND `grant_purpose` LIKE '%fluoride%'
5. **"What grants end in the next 6 months?"**
β `end_date` BETWEEN NOW() AND NOW() + 6 MONTHS (renewal opportunities!)
### Dataset Structure
```
grants/
βββ nonprofit_grants # Grants TO nonprofits (Schedule I recipients)
βββ government_grants # Federal/state grants to jurisdictions
βββ foundation_grants # Private foundation giving (990-PF)
βββ federal_grants # USASpending.gov federal grants
```
## β° Time Dimension Modeling
To enable robust time-series analysis, trend tracking, and temporal comparisons, we implement a comprehensive time dimension alongside our fact tables.
### Time Dimension Table
```sql
DATE_DIMENSION {
date date PK
year int
quarter int
quarter_name string
month int
month_name string
month_abbr string
day_of_month int
day_of_week int
day_name string
day_abbr string
week_of_year int
fiscal_year int
fiscal_quarter int
fiscal_month int
is_weekend boolean
is_holiday boolean
holiday_name string
is_business_day boolean
days_in_month int
year_month string
year_quarter string
}
```
### Temporal Relationships
All time-bound entities link to the date dimension for consistent temporal analysis:
```mermaid
erDiagram
DATE_DIMENSION ||--o{ MEETING : "meeting_date"
DATE_DIMENSION ||--o{ GOVERNMENT_BUDGET : "fiscal_year_start"
DATE_DIMENSION ||--o{ BALLOT_MEASURE : "election_date"
DATE_DIMENSION ||--o{ LEGISLATION : "introduced_date"
DATE_DIMENSION ||--o{ GRANT : "start_date, end_date"
DATE_DIMENSION ||--o{ NONPROFIT_FILING : "tax_period_end"
DATE_DIMENSION ||--o{ POLICY_TRACKER : "tracked_date"
DATE_DIMENSION ||--o{ SURVEY : "field_date_start, field_date_end"
DATE_DIMENSION ||--o{ FACT_CHECK : "published_date"
DATE_DIMENSION {
date date PK
year int
quarter int
month int
fiscal_year int
is_business_day boolean
}
```
### Temporal Analysis Patterns
**Year-over-Year Comparisons:**
```sql
SELECT
d.year,
d.quarter_name,
COUNT(m.meeting_id) as meeting_count,
COUNT(m.meeting_id) - LAG(COUNT(m.meeting_id)) OVER (ORDER BY d.year, d.quarter) as yoy_change
FROM MEETING m
JOIN DATE_DIMENSION d ON m.meeting_date = d.date
WHERE d.year BETWEEN 2023 AND 2025
GROUP BY d.year, d.quarter, d.quarter_name
ORDER BY d.year, d.quarter;
```
**Fiscal Period Aggregation:**
```sql
SELECT
d.fiscal_year,
d.fiscal_quarter,
SUM(b.total_expenditures) as total_spending,
AVG(b.total_expenditures) as avg_spending
FROM GOVERNMENT_BUDGET b
JOIN DATE_DIMENSION d ON b.fiscal_year = d.fiscal_year
WHERE b.jurisdiction_type = 'city'
GROUP BY d.fiscal_year, d.fiscal_quarter;
```
**Trend Detection:**
```sql
-- Identify growing advocacy momentum
SELECT
d.year_month,
COUNT(DISTINCT pt.topic_id) as active_topics,
COUNT(m.meeting_id) as related_meetings,
COUNT(bm.measure_id) as ballot_initiatives
FROM DATE_DIMENSION d
LEFT JOIN MEETING m ON m.meeting_date = d.date AND m.oral_health_related = true
LEFT JOIN POLICY_TRACKER pt ON d.date BETWEEN pt.start_date AND COALESCE(pt.end_date, CURRENT_DATE)
LEFT JOIN BALLOT_MEASURE bm ON bm.election_date = d.date
WHERE d.date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
GROUP BY d.year_month
ORDER BY d.year_month;
```
## π Metric Views
Metric views provide pre-aggregated, analysis-ready datasets combining multiple source tables with built-in dimensions, measures, and filters.
### Core Metric View Components
| Component | Description | Example |
|-----------|-------------|---------|
| **Source** | Base table, view, or SQL query containing the data | `MEETING`, `GOVERNMENT_BUDGET`, `NONPROFIT_FILING` |
| **Dimensions** | Column attributes used to segment or group metrics | `jurisdiction_type`, `fiscal_year`, `policy_topic` |
| **Measures** | Column aggregations that produce metrics | `COUNT(meeting_id) as meeting_count`, `SUM(grant_amount) as total_funding` |
| **Filters** | Conditions applied to source data to define scope | `oral_health_related = true`, `fiscal_year > 2020` |
| **Joins** | Relationships between tables to enrich data | `JOIN JURISDICTION ON meeting.jurisdiction_id = jurisdiction.jurisdiction_id` |
### Example Metric Views
#### 1. Advocacy Activity Metrics
**Purpose:** Track oral health advocacy momentum across jurisdictions
```sql
CREATE VIEW metric_advocacy_activity AS
SELECT
-- Dimensions
j.jurisdiction_id,
j.jurisdiction_type,
j.state_code,
j.county_name,
d.year,
d.quarter_name,
d.month_name,
pt.topic_name,
-- Measures
COUNT(DISTINCT m.meeting_id) as meeting_count,
COUNT(DISTINCT bm.measure_id) as ballot_measure_count,
COUNT(DISTINCT l.bill_id) as legislation_count,
COUNT(DISTINCT fc.claim_id) as fact_check_count,
-- Calculated Metrics
SUM(CASE WHEN m.oral_health_related THEN 1 ELSE 0 END) as oral_health_meeting_count,
AVG(CASE WHEN bm.result = 'passed' THEN 1 ELSE 0 END) as ballot_success_rate,
COUNT(DISTINCT n.nonprofit_id) as active_nonprofit_count
FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.date BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 365 DAY) AND CURRENT_DATE
LEFT JOIN MEETING m ON m.jurisdiction_id = j.jurisdiction_id AND m.meeting_date = d.date
LEFT JOIN POLICY_TRACKER pt ON pt.jurisdiction_id = j.jurisdiction_id
LEFT JOIN BALLOT_MEASURE bm ON bm.jurisdiction_id = j.jurisdiction_id AND bm.election_date = d.date
LEFT JOIN LEGISLATION l ON l.state_code = j.state_code AND l.introduced_date = d.date
LEFT JOIN FACT_CHECK fc ON fc.published_date = d.date
LEFT JOIN NONPROFIT n ON n.jurisdiction_id = j.jurisdiction_id
WHERE
-- Filters
d.is_business_day = true
AND (
m.oral_health_related = true
OR pt.topic_area LIKE '%oral health%'
OR pt.topic_area LIKE '%dental%'
OR pt.topic_area LIKE '%fluoride%'
)
GROUP BY
j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.county_name,
d.year, d.quarter_name, d.month_name, pt.topic_name;
```
**Usage:**
```sql
-- Find top 10 most active jurisdictions for oral health advocacy
SELECT
jurisdiction_id,
state_code,
jurisdiction_type,
SUM(meeting_count) as total_meetings,
SUM(ballot_measure_count) as total_ballot_measures,
SUM(oral_health_meeting_count) as oral_health_meetings
FROM metric_advocacy_activity
WHERE year = 2025
GROUP BY jurisdiction_id, state_code, jurisdiction_type
ORDER BY total_meetings DESC
LIMIT 10;
```
#### 2. Government Spending Metrics
**Purpose:** Analyze government budget allocations and trends
```sql
CREATE VIEW metric_government_spending AS
SELECT
-- Dimensions
j.jurisdiction_id,
j.jurisdiction_type,
j.state_code,
j.population,
d.fiscal_year,
d.fiscal_quarter,
bc.category_name as budget_category,
-- Measures
SUM(gb.total_revenue) as total_revenue,
SUM(gb.total_expenditures) as total_expenditures,
SUM(gb.total_debt) as total_debt,
SUM(gb.property_tax_revenue) as property_tax_revenue,
SUM(gb.federal_grants) as federal_grants,
SUM(gb.state_grants) as state_grants,
-- Calculated Metrics
SUM(gb.total_revenue) / NULLIF(j.population, 0) as revenue_per_capita,
SUM(gb.total_expenditures) / NULLIF(j.population, 0) as spending_per_capita,
SUM(gb.total_debt) / NULLIF(j.population, 0) as debt_per_capita,
(SUM(gb.total_revenue) - SUM(gb.total_expenditures)) as budget_surplus_deficit,
SUM(bc.amount) / NULLIF(SUM(gb.total_expenditures), 0) * 100 as category_pct_of_budget
FROM JURISDICTION j
JOIN DATE_DIMENSION d ON d.fiscal_year BETWEEN 2020 AND 2025
JOIN GOVERNMENT_BUDGET gb ON gb.jurisdiction_id = j.jurisdiction_id AND gb.fiscal_year = d.fiscal_year
LEFT JOIN BUDGET_CATEGORY bc ON bc.budget_id = gb.budget_id
WHERE
-- Filters
gb.total_expenditures > 0
AND j.population > 0
GROUP BY
j.jurisdiction_id, j.jurisdiction_type, j.state_code, j.population,
d.fiscal_year, d.fiscal_quarter, bc.category_name;
```
**Usage:**
```sql
-- Compare spending per capita across jurisdiction types
SELECT
jurisdiction_type,
fiscal_year,
AVG(spending_per_capita) as avg_spending_per_capita,
AVG(revenue_per_capita) as avg_revenue_per_capita,
AVG(debt_per_capita) as avg_debt_per_capita
FROM metric_government_spending
GROUP BY jurisdiction_type, fiscal_year
ORDER BY fiscal_year DESC, avg_spending_per_capita DESC;
```
#### 3. Nonprofit Impact Metrics
**Purpose:** Measure nonprofit activity, funding, and service delivery
```sql
CREATE VIEW metric_nonprofit_impact AS
SELECT
-- Dimensions
n.nonprofit_id,
n.organization_name,
n.state,
n.city,
nc.ntee_code,
nc.category_name,
d.tax_year,
-- Measures
SUM(nf.total_revenue) as total_revenue,
SUM(nf.total_expenses) as total_expenses,
SUM(nf.total_assets) as total_assets,
SUM(nf.program_service_expenses) as program_expenses,
SUM(nf.fundraising_expenses) as fundraising_expenses,
SUM(nf.management_expenses) as management_expenses,
COUNT(DISTINCT g.grant_id) as grants_received_count,
SUM(g.grant_amount) as total_grants_received,
-- Calculated Metrics
SUM(nf.program_service_expenses) / NULLIF(SUM(nf.total_expenses), 0) * 100 as program_expense_ratio,
SUM(nf.fundraising_expenses) / NULLIF(SUM(nf.total_revenue), 0) * 100 as fundraising_efficiency,
(SUM(nf.total_revenue) - SUM(nf.total_expenses)) as net_income,
COUNT(DISTINCT nf.year) as years_active
FROM NONPROFIT n
JOIN DATE_DIMENSION d ON d.year BETWEEN 2020 AND 2025
JOIN NONPROFIT_FILING nf ON nf.ein = n.ein AND nf.tax_year = d.year
LEFT JOIN NONPROFIT_CAUSE nc ON nc.cause_id = n.primary_cause_id
LEFT JOIN GRANT g ON g.recipient_ein = n.ein AND g.tax_year = d.year
WHERE
-- Filters
nf.total_revenue > 0
AND (
nc.category_name LIKE '%health%'
OR nc.category_name LIKE '%dental%'
OR n.mission_statement LIKE '%oral health%'
)
GROUP BY
n.nonprofit_id, n.organization_name, n.state, n.city,
nc.ntee_code, nc.category_name, d.tax_year;
```
**Usage:**
```sql
-- Identify high-performing health nonprofits by program efficiency
SELECT
organization_name,
state,
tax_year,
total_revenue,
program_expense_ratio,
fundraising_efficiency
FROM metric_nonprofit_impact
WHERE
tax_year = 2024
AND total_revenue > 1000000
AND program_expense_ratio > 75 -- More than 75% goes to programs
ORDER BY program_expense_ratio DESC, total_revenue DESC
LIMIT 20;
```
### Metric View Best Practices
1. **Grain Definition**: Clearly define the granularity of each metric view (e.g., per jurisdiction per month)
2. **Performance**: Pre-aggregate expensive calculations to improve query performance
3. **Incremental Updates**: Design views to support incremental refresh rather than full rebuilds
4. **Documentation**: Document all dimension values, measure calculations, and filter logic
5. **Naming Convention**: Use `metric_` prefix followed by descriptive name (e.g., `metric_advocacy_activity`)
6. **Testing**: Validate measure calculations against source data to ensure accuracy
### Query Optimization
For large-scale analytics, metric views can be materialized:
```sql
-- Materialize for fast querying
CREATE MATERIALIZED VIEW metric_advocacy_activity_mat AS
SELECT * FROM metric_advocacy_activity;
-- Refresh incrementally
REFRESH MATERIALIZED VIEW metric_advocacy_activity_mat;
-- Add indexes on common filter/join columns
CREATE INDEX idx_advocacy_state_year
ON metric_advocacy_activity_mat(state_code, year);
CREATE INDEX idx_advocacy_jurisdiction
ON metric_advocacy_activity_mat(jurisdiction_id);
```
## π§ System-Internal Tables (OMOP-Inspired Vocabulary)
**Purpose:** Standardized terminology and concept management following OHDSI OMOP Common Data Model principles. These tables provide semantic interoperability and enable precise data linkage across all entities.
**β οΈ Technical Note:** These are internal reference tables used by data engineers and ETL pipelines. Non-technical users can ignore this section.
### Vocabulary Sources
| Vocabulary ID | Vocabulary Name | Source | Use Case |
|---------------|-----------------|--------|----------|
| **OCD_ID** | Open Civic Data Division IDs | https://github.com/opencivicdata/ocd-division-ids | Standard jurisdiction identifiers |
| **IRS_NTEE** | IRS National Taxonomy of Exempt Entities | IRS TEOS | Nonprofit classification |
| **US_Census** | U.S. Census Bureau | Census Gazetteer, ACS | Demographics, geography |
| **NCES** | National Center for Education Statistics | NCES CCD, F-33 | School districts, education data |
| **OHDSI_Gender** | OHDSI Gender | Athena | Standard gender concepts (interoperable with medical research) |
| **OHDSI_Race** | OHDSI Race | Athena | Standard race concepts (OMB Classification) |
| **OHDSI_Ethnicity** | OHDSI Ethnicity | Athena | Standard ethnicity concepts (Hispanic/Latino) |
| **OpenNavigator** | Custom Civic Concepts | Internal | Cities, officials, topics (ID > 2,000,000,000) |
### Concept Classes for Civic Data
| Concept Class | Domain | Examples |
|---------------|--------|----------|
| **City** | Jurisdiction | Incorporated places, consolidated city-counties |
| **County** | Jurisdiction | U.S. counties, county equivalents |
| **State** | Jurisdiction | U.S. states, territories, DC |
| **School District** | Jurisdiction | LEAs (Local Educational Agencies) |
| **501c3** | Nonprofit | Tax-exempt charitable organizations |
| **501c4** | Nonprofit | Social welfare organizations |
| **Mayor** | Position | Chief executive of city government |
| **Council Member** | Position | Legislative member |
| **Superintendent** | Position | School district chief administrator |
| **Health Policy** | Topic | Fluoridation, nutrition, dental care |
| **Education Policy** | Topic | School funding, curriculum, facilities |
### Example Concept Entries
```sql
-- Standard concept for a city
INSERT INTO CONCEPT VALUES (
2000000001, -- concept_id (custom range)
'Birmingham, Alabama', -- concept_name
'Jurisdiction', -- domain_id
'OCD_ID', -- vocabulary_id
'City', -- concept_class_id
'S', -- standard_concept (Standard)
'ocd-division/country:us/state:al/place:birmingham', -- concept_code
'2020-01-01', -- valid_start_date
'2099-12-31', -- valid_end_date
NULL -- invalid_reason
);
-- Concept for a nonprofit cause
INSERT INTO CONCEPT VALUES (
2000000101, -- concept_id
'Animal Welfare', -- concept_name
'Nonprofit', -- domain_id
'IRS_NTEE', -- vocabulary_id
'501c3', -- concept_class_id
'C', -- standard_concept (Classification)
'D20', -- concept_code (NTEE code)
'2020-01-01',
'2099-12-31',
NULL
);
-- Gender concept from OHDSI Athena (interoperable with medical research)
INSERT INTO CONCEPT VALUES (
8507, -- concept_id (OHDSI standard)
'MALE', -- concept_name
'Gender', -- domain_id
'OHDSI_Gender', -- vocabulary_id
'Gender', -- concept_class_id
'S',
'M',
'1970-01-01',
'2099-12-31',
NULL
);
```
### Concept Relationships
```sql
-- City is part of County
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000001, -- Birmingham, AL (concept_id_1)
2000000050, -- Jefferson County, AL (concept_id_2)
'Is part of', -- relationship_id
'2020-01-01',
'2099-12-31',
NULL
);
-- Topic regulates Legislation
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000201, -- Water Fluoridation topic
2000000305, -- Ordinance 101
'Regulates',
'2024-01-15',
'2099-12-31',
NULL
);
-- Organization addresses Topic
INSERT INTO CONCEPT_RELATIONSHIP VALUES (
2000000401, -- Dental Health Foundation
2000000201, -- Water Fluoridation
'Addresses',
'2023-06-01',
'2099-12-31',
NULL
);
```
### Implementation in Existing Tables
#### Updated JURISDICTION Schema
```sql
CREATE TABLE JURISDICTION (
jurisdiction_id VARCHAR(255) PRIMARY KEY,
name VARCHAR(255),
jurisdiction_type VARCHAR(50),
-- OMOP-style concept references
city_concept_id INTEGER REFERENCES CONCEPT(concept_id),
city_type_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- e.g., "Consolidated City-County"
source_value VARCHAR(255), -- Original text from data source (e.g., "SF, Calif")
state_code VARCHAR(2),
county_name VARCHAR(100),
population INTEGER,
-- ... other fields
);
```
#### Updated ORGANIZATION Schema
```sql
CREATE TABLE ORGANIZATION (
org_id VARCHAR(255) PRIMARY KEY,
ein VARCHAR(20),
name VARCHAR(255),
-- OMOP-style concept references
organization_concept_id INTEGER REFERENCES CONCEPT(concept_id),
cause_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Links to NTEE concept
org_type_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- 501c3, 501c4, etc.
source_value VARCHAR(255), -- Original organization name from IRS
ntee_code VARCHAR(10),
-- ... other fields
);
```
#### Updated DEMOGRAPHICS Schema
```sql
CREATE TABLE DEMOGRAPHICS (
demographics_id VARCHAR(255) PRIMARY KEY,
jurisdiction_id VARCHAR(255) REFERENCES JURISDICTION(jurisdiction_id),
-- OMOP-style concept references (OHDSI Athena vocabularies)
race_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Standard OHDSI race codes
ethnicity_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- Hispanic/Latino classification
gender_concept_id INTEGER REFERENCES CONCEPT(concept_id), -- MALE/FEMALE/OTHER
race_source_value VARCHAR(100), -- Census text (e.g., "White alone")
ethnicity_source_value VARCHAR(100), -- Census text
gender_source_value VARCHAR(50), -- Census text
-- ... other demographic fields
);
```
### Benefits of OMOP-Style Vocabulary
1. **Semantic Interoperability**: Civic data can be joined with healthcare research data using standard OHDSI demographic concepts
2. **Reproducible IDs**: Deterministic hashing (uuid5) on strings like "JURISDICTION:CITY:NEW_YORK" generates consistent concept_id values
3. **Version Control**: `valid_start_date` and `valid_end_date` track concept changes over time
4. **Relationship Tracking**: `CONCEPT_RELATIONSHIP` table captures hierarchies (City β County β State) and associations (Topic β Legislation)
5. **Source Traceability**: `source_value` preserves original text while `concept_id` provides standardized reference
### ETL Mapping Strategy
```python
import uuid
def generate_concept_id(domain: str, type: str, identifier: str) -> int:
"""
Generate deterministic concept_id using UUID5.
Returns integer > 2,000,000,000 for custom civic concepts.
"""
namespace = uuid.UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8') # OMOP namespace
concept_string = f"{domain}:{type}:{identifier}".upper()
concept_uuid = uuid.uuid5(namespace, concept_string)
# Convert to integer in custom range
return 2_000_000_000 + (int(concept_uuid.hex[:8], 16) % 1_000_000_000)
# Example usage
city_concept_id = generate_concept_id("JURISDICTION", "CITY", "NEW_YORK")
print(city_concept_id) # e.g., 2045879021 (repeatable)
```
### Downloading OHDSI Athena Vocabularies
1. Visit https://athena.ohdsi.org/
2. Select vocabularies:
- β
Gender
- β
Race
- β
Ethnicity
- β
Geography (US Counties, States)
3. Download CSV files
4. Import into `vocabulary/` folder:
- `vocabulary_gender.parquet`
- `vocabulary_race.parquet`
- `vocabulary_ethnicity.parquet`
- `vocabulary_geography.parquet`
## π― Missing Datasets to Add
### High Priority
- [x] **Ballot Measures** - β
Added to data model! Fluoridation votes, bond measures
- [x] **State Legislation** - β
Added to data model! Open States API (FREE)
- [x] **Policy Topics** - β
Added to data model! Oral health advocacy tracking
- [x] **Government Finances** - β
Added to data model! City/county/state budgets, Census of Governments
- [x] **School Finances** - β
Added to data model! NCES F-33 per-pupil spending, revenues
- [x] **Nonprofit Financials** - β
Added to data model! Form 990 detailed financials (10M+ filings)
- [ ] **Census Demographics** - Full census data per jurisdiction (beyond population)
- [ ] **Procurement Records** - Government contracts
- [ ] **Election Results** - Historical voting data
- [ ] **Health Outcomes** - CDC PLACES data (oral health metrics!)
- [ ] **Environmental Data** - EPA water quality (fluoridation levels)
### Medium Priority
- [ ] **Property Records** - Public assessment data
- [ ] **Crime Statistics** - UCR/NIBRS data
- [ ] **Business Licenses** - Local business registrations (dental clinics!)
- [ ] **Building Permits** - Construction activity
- [ ] **Code Violations** - Inspection records
- [ ] **Police Reports** - Public safety incidents
- [ ] **Fire Department Data** - Emergency response
- [ ] **Parks & Recreation** - Facilities & programs
- [ ] **Transportation Data** - Traffic & transit
### Integration Improvements
- [ ] **Full Wikidata Sync** - All civic entities
- [ ] **DBpedia Expansion** - Complete local government coverage
- [ ] **Ballotpedia Data** - (if budget allows) Electoral info & analysis
- [ ] **Court Records** - Public dockets
- [ ] **Tax Records** - Property tax data
## π Implementation Status
### β
Completed
- Jurisdiction discovery pipeline
- YouTube channel discovery
- Meeting platform detection
- NCES school district ingestion
- Open States API integration
- Wikidata SPARQL queries
- DBpedia Lookup API
- Google Civic API (code ready)
- Social media discovery
- HuggingFace upload pipeline
### π¨ In Progress
- Meeting minutes extraction (Tuscaloosa pilot)
- Video transcript processing
- Document keyword detection
- Nonprofit data enrichment
### π Planned
- Automated meeting scraping at scale
- Real-time meeting notifications
- Budget document parsing
- Full census integration
- Health outcome correlation
## π Related Documentation
### Data Standards & Specifications
- **[Popolo Project](https://github.com/popolo-project/popolo-spec)** - Open government data specification for people, organizations, and elected positions. Our LEADER, ORGANIZATION, and JURISDICTION entities follow Popolo schema conventions for maximum interoperability with civic tech platforms.
- **[Schema.org](https://schema.org/)** - W3C structured data vocabulary for semantic web. Our entities map to Schema.org types (Event, Person, Organization, Legislation, ClaimReview, etc.) enabling SEO-optimized JSON-LD exports, Google Search rich results, and voice assistant compatibility.
- **[Common Education Data Standards (CEDS)](https://ceds.ed.gov/)** - U.S. Department of Education data standards for K-12, postsecondary, and workforce data. Our SCHOOL_DISTRICT entity aligns with CEDS Element IDs and NCES survey specifications (CCD, F-33 Finance).
- **[Open Civic Data (OCD-IDs)](https://open-civic-data.readthedocs.io/en/latest/proposals/0002.html)** - Standardized division identifiers for jurisdictions. Format: `ocd-division/country:us/state:al/place:birmingham`
### Internal Documentation
- [HuggingFace Publishing Guide](../guides/huggingface-publishing.md)
- [Data Sources Overview](./overview.md)
- [Discovery Pipeline](./jurisdiction-discovery.md)
---
**Last Updated:** {new Date().toISOString().split('T')[0]}
**Data Model Version:** 2.1 (Popolo-compatible)