--- sidebar_position: 2 sidebar_label: Data and Citations --- # Data and Citations :::tip[Why This Page Matters] **All data used in Open Navigator is properly cited and attributed.** This page provides complete citations, licenses, BibTeX references, and links to original sources for academic research, government data, data sharing standards, and more. **Use this page to:** - βœ… Cite data sources in your research or publications - βœ… Understand licensing and usage terms - βœ… Find original dataset documentation - βœ… Access API documentation and technical specs ::: This page documents all data sources, standards, and research contributions used in **Open Navigator**. All datasets and specifications are properly attributed with citations, licenses, and usage notes. ## πŸ“‘ Quick Navigation
πŸŽ“ Academic Research
MeetingBank, LocalView, CivicSearch, Datamuse API, Roper Center, CDP, City Scrapers
πŸ›οΈ Government Data
U.S. Census, IRS, Open States, LegiScan
🌐 Data Sharing Standards
OCD-ID, Popolo, Schema.org, CEDS, OMOP CDM
πŸ—³οΈ Election & Advocacy
Ballotpedia, MIT Election Lab, OpenElections
🏒 Nonprofit & Philanthropy
IRS EO-BMF (1.9M+ orgs), Google BigQuery (5M+ Form 990s), GivingTuesday Data Lake (5.4M+ raw XMLs), ProPublica (Nonprofits, Congress, Campaign Finance, Vital Signs), Every.org, Findhelp, 211, Microsoft CDM, ARDA, HIFLD, NCS
βœ… Fact-Checking
Google, PolitiFact, FactCheck.org
πŸ’» Civic Tech & Open Source
GitHub, Code for America, Hackathons, Microsoft, Google, AWS, Databricks, DPGA
🌟 Community Solutions & Use Cases
Spectrum of Engagement, Harvard, Brookings, Open Data Impact, IATI
πŸ™ Acknowledgments
Organizations & individuals
--- ## πŸŽ“ Academic Research **In this section:** - [MeetingBank Dataset](#meetingbank-dataset) - [LocalView Dataset (Harvard Dataverse)](#localview-dataset-harvard-dataverse) - [Council Data Project (CDP)](#council-data-project-cdp) - [City Scrapers / Documenters.org](#city-scrapers--documentersorg) - [Roper Center for Public Opinion Research](#roper-center-for-public-opinion-research) - [Harvard Dataverse](#harvard-dataverse) - [CivicSearch (School Board Meeting Platform)](#civicsearch-school-board-meeting-platform) - [Datamuse API (Word-Finding Engine)](#datamuse-api-word-finding-engine) ### MeetingBank Dataset **What we use:** 1,366 city council meetings from 6 U.S. cities with transcripts and summaries for meeting discovery, transcript analysis, and summarization benchmarking. **Citation:** > Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu. "MeetingBank: A Benchmark Dataset for Meeting Summarization" In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), July 2023, Toronto, Canada. **BibTeX:** ```bibtex @inproceedings{hu-etal-2023-meetingbank, title = "MeetingBank: A Benchmark Dataset for Meeting Summarization", author = "Yebowen Hu and Tim Ganter and Hanieh Deilamsalehy and Franck Dernoncourt and Hassan Foroosh and Fei Liu", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)", month = July, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", } ``` **Resources:** - [πŸ“„ Paper](https://arxiv.org/abs/2305.17529) - [πŸ’Ύ Dataset](https://huggingface.co/datasets/huuuyeah/meetingbank) - [πŸ“¦ Zenodo](https://zenodo.org/record/7989108) --- ### LocalView Dataset (Harvard Dataverse) **Organization:** Harvard University Mellon Urbanism Lab **What we use:** 1,000+ municipalities with meeting videos and automated transcripts for large-scale civic data analysis. - **Website:** https://www.localview.net/ - **Dataverse:** https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NJTBEM - **GitHub:** https://mellonurbanism.harvard.edu/localview - **Coverage:** Thousands of U.S. jurisdictions with continuous data collection - **Data Included:** - Meeting videos (YouTube URLs) - Automated transcripts via speech-to-text - Metadata (meeting dates, agencies, agendas) - Quality tracking per jurisdiction - **License:** Research use (Harvard Dataverse) - **Research-grade:** Designed for large-scale quantitative analysis **BibTeX:** ```bibtex @dataset{localview_2024, author = {{Harvard Mellon Urbanism Lab}}, title = {LocalView: Municipal Meeting Videos and Transcripts}, year = {2024}, publisher = {Harvard Dataverse}, doi = {10.7910/DVN/NJTBEM}, url = {https://www.localview.net/} } ``` --- ### Council Data Project (CDP) **Organization:** Open-source civic tech collaboration **What we use:** 20+ cities with complete data pipelines - meeting transcripts, videos, voting records, and legislation tracking. - **Website:** https://councildataproject.org/ - **GitHub:** https://github.com/CouncilDataProject - **Coverage:** 20+ major U.S. cities with full infrastructure - **Data Included:** - Meeting transcripts (searchable, indexed) - Video recordings with timestamps - Voting records and roll calls - Legislation text and tracking - Councilmember information - **Infrastructure:** Complete ETL pipelines for each city - **License:** Open source (MIT) - **API:** Per-city deployments (e.g., https://councildataproject.org/seattle) **BibTeX:** ```bibtex @software{council_data_project, title = {Council Data Project}, author = {{Council Data Project Contributors}}, year = {2024}, url = {https://councildataproject.org/}, license = {MIT} } ``` --- ### City Scrapers / Documenters.org **Organization:** Documenters Network (civic journalism collaboration) **What we use:** 100-500 validated government agency URLs across 5 major cities for automated meeting discovery. - **Website:** https://cityscrapers.org/ - **Documenters:** https://www.documenters.org/ - **GitHub:** https://github.com/City-Bureau - **Coverage:** Chicago, Pittsburgh, Detroit, Cleveland, Los Angeles - **Data Included:** - Government agency URLs (start_urls from spider files) - Granicus video page URLs with YouTube embeds - Meeting event schemas - Scraper patterns for common platforms - **Cities Covered:** - Chicago City Scrapers - Pittsburgh City Scrapers - Detroit Documenters - Cleveland City Scrapers - LA Metro Documenters - **License:** Open source (MIT) - **Use Case:** Pre-validated URLs for quality meeting discovery **BibTeX:** ```bibtex @software{city_scrapers, title = {City Scrapers}, author = {{City Bureau and Documenters Network}}, year = {2024}, url = {https://cityscrapers.org/}, license = {MIT} } ``` --- ### Roper Center for Public Opinion Research **Organization:** Cornell University **What we use:** Scientifically validated survey questions and public opinion baselines for topic definitions and messaging optimization. - **Source:** https://ropercenter.cornell.edu/ - **iPoll Database:** https://ropercenter.cornell.edu/ipoll/ - **Coverage:** 500,000+ survey questions from 1930s-present, all major polling organizations - **License:** Free public search (metadata and question wording), full data requires institutional membership - **Citation:** "Roper Center for Public Opinion Research, Cornell University. iPoll Databank. https://ropercenter.cornell.edu/ipoll/" --- ### Harvard Dataverse **What we use:** Meeting datasets and civic engagement research. - **Source:** https://dataverse.harvard.edu/ - **License:** Varies by dataset - **Coverage:** Academic research datasets on local government, public meetings, civic participation --- ### CivicSearch (School Board Meeting Platform) **Organization:** Datamuse, Inc. **What we use:** Aggregated school board meeting transcripts, agendas, and videos for tracking education policy and local governance. - **Website:** https://schools.civicsearch.org/ - **Platform:** Datamuse-powered civic search interface - **Coverage:** School districts nationwide with meeting transcripts and videos - **Data Included:** - School board meeting transcripts (AI-indexed) - Meeting agendas and minutes - Video recordings (when available) - Searchable text across multiple districts - Meeting dates and attendance - **Example:** [Tuscaloosa City Schools](https://schools.civicsearch.org/tuscaloosa-city-alabama) - **License:** Free public access for search; bulk/API access requires case-by-case approval - **Use Case:** Education policy tracking, school board decision analysis, parent/community engagement **Access Tiers:** - **Public Search:** Free access via web interface - **Bulk Data/API:** Contact Datamuse for research or civic organization partnerships - **Commercial Use:** Licensing required for commercial applications **Data Privacy:** - Public meeting transcripts are public record - Datamuse indexing and presentation subject to their site terms - No user-uploaded data sold to third parties **Attribution Requirements:** ``` Data source: CivicSearch (Datamuse, Inc.) https://schools.civicsearch.org/ School board meeting transcripts and agendas ``` **Terms of Service:** - ❌ **No automated scraping** - Use official API when available - βœ… **Attribution required** - Link back to CivicSearch for data used - βœ… **Public record data** - Meeting transcripts are generally public domain - ⚠️ **Bulk access** - Requires partnership agreement for large-scale data extraction **BibTeX:** ```bibtex @misc{civicsearch_datamuse, author = {{Datamuse, Inc.}}, title = {CivicSearch: School Board Meeting Platform}, year = {2026}, url = {https://schools.civicsearch.org/}, note = {AI-indexed school board meeting transcripts and agendas} } ``` **Contact for Data Partnerships:** For bulk data access, API integration, or civic tech collaborations, reach out to Datamuse directly as a "civic technologist" or research organization. There is no standard commercial checkout - partnerships are handled case-by-case. --- ### Datamuse API (Word-Finding Engine) **Organization:** Datamuse, Inc. **What we use:** Natural language processing tools for text analysis, word associations, rhyme detection, and semantic search in meeting transcripts and policy documents. - **API Documentation:** https://www.datamuse.com/api/ - **Developer Site:** https://www.datamuse.com/ - **Use Cases:** Dictionary apps, RhymeZone, word associations, semantic search - **Coverage:** English language word relationships, definitions, pronunciations, usage frequency - **License:** Free tier for most applications; paid tier for high-volume commercial use **API Endpoints:** - `/words` - Word finding based on constraints (rhymes, similar meaning, etc.) - `/sug` - Word suggestions for autocomplete - Query parameters for semantic relationships, phonetic matching, vocabulary **Pricing Tiers:** | Tier | Cost | Limits | Use Case | |------|------|--------|----------| | **Free** | $0 | 100,000 requests/day | Non-commercial, small commercial apps | | **Professional** | Contact for pricing | Unlimited + support | High-volume commercial applications | **Free Tier Details:** - βœ… **100,000 requests per day** - Generous limit for most applications - βœ… **Commercial use allowed** - Can use in commercial apps under daily limit - βœ… **No API key required** - Simple HTTP GET requests - βœ… **Fast response times** - Optimized for real-time applications **Paid Tier (High-Volume):** - Exceeding 100,000 requests/day requires paid tier - Contact Datamuse for custom pricing and SLA - Dedicated support and guaranteed uptime **Attribution Requirements:** - βœ… **Link to Datamuse:** Required (or strongly requested) for free tier users - βœ… **Credit in documentation:** Mention "Powered by Datamuse API" - Example: `Powered by Datamuse API` **Restrictions:** - ❌ **No scraping of web interfaces** - Use official API, not web scraping - ❌ **Rate limiting enforced** - Exceeding 100K/day will be throttled - βœ… **Caching allowed** - Can cache results to reduce API calls **Terms of Service:** - Free tier subject to daily quota - No sale of user-uploaded data - Commercial use allowed within free tier limits - Bulk/enterprise usage requires paid license **Example API Call:** ```bash # Find words that mean "government" and sound like "regime" curl "https://api.datamuse.com/words?ml=government&sl=regime" # Find words that rhyme with "policy" curl "https://api.datamuse.com/words?rel_rhy=policy" # Word associations for "civic engagement" curl "https://api.datamuse.com/words?ml=civic+engagement&max=10" ``` **BibTeX:** ```bibtex @misc{datamuse_api, author = {{Datamuse, Inc.}}, title = {Datamuse API: Word-Finding Query Engine}, year = {2026}, url = {https://www.datamuse.com/api/}, note = {Free tier: 100,000 requests/day. Commercial use allowed.} } ``` **Integration Use Cases:** - **Meeting Transcript Analysis:** Identify policy-related terms and semantic relationships - **Search Enhancement:** Improve search with synonym expansion and related terms - **Topic Modeling:** Extract key themes from public comments and testimony - **Accessibility:** Provide word suggestions for users with cognitive disabilities - **Multilingual Support:** Word associations for translation assistance **Datamuse.ai (Separate Product):** Note: Datamuse.ai is a distinct SaaS product for natural language exploration: - **Starter:** ~$29/month (100 queries/month) - **Professional:** ~$99/month (unlimited queries + API access) - **Free Trial:** Available for testing This is separate from the word-finding API and has different pricing. --- ## πŸ›οΈ Government Data **In this section:** - [U.S. Census Bureau](#us-census-bureau) - [IRS Tax-Exempt Organization Search (TEOS)](#irs-tax-exempt-organization-search-teos) - [Open States API](#open-states-api) - [LegiScan](#legiscan-) ### U.S. Census Bureau **What we use:** Geographic boundaries, demographic data, population estimates, and economic indicators. - **Source:** https://www.census.gov/ - **License:** Public Domain (U.S. Government) - **Datasets:** Census Gazetteer, American Community Survey (ACS), Decennial Census - **Coverage:** All 50 states, 3,144 counties, 19,000+ incorporated places --- ### IRS Exempt Organizations Business Master File (EO-BMF) **Organization:** Internal Revenue Service (IRS), U.S. Department of Treasury **What we use:** **PRIMARY BULK DATA SOURCE** for comprehensive nonprofit data - ALL 1.9M+ U.S. tax-exempt organizations with EIN, NTEE codes, financial data, subsection classification, and geographic location. - **Source:** https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf - **Search Tool:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search - **Bulk Downloads:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads - **API Documentation:** See [IRS Bulk Data Integration](./irs-bulk-data.md) - **Coverage:** 1,952,238 organizations (as of April 2026) - **Churches & Religious Organizations:** 300,000+ (NTEE codes X, X20, X21, X22, X30, X40) - **Health Organizations:** 80,000+ (NTEE codes E, E20-E99) - **Human Services:** 200,000+ (NTEE codes P, P20-P99) - **All Other Categories:** 1.3M+ (Education, Arts, Environment, etc.) - **Update Frequency:** Monthly - **License:** Public domain (U.S. government data) - **Format:** CSV (regional files), convertible to Parquet - **Record Count:** 1.9M+ total nonprofits across 4 regional files **Data Fields (28 columns):** - **Identification:** EIN, organization name, sort name - **Location:** Street address, city, state, ZIP code - **Classification:** NTEE code, subsection (501(c)(3), etc.), foundation code - **Financial:** Asset amount, income amount, revenue amount - **Status:** Tax-exempt status, deductibility status, ruling date - **Organization:** Organization code, activity codes, group affiliation **NTEE Codes for Churches:** - **X** - Religion Related, Spiritual Development - **X20** - Christian (churches, ministries) - **X21** - Protestant - **X22** - Roman Catholic - **X30** - Jewish - **X40** - Islamic **Use Cases:** - **Bulk Download:** Get ALL nonprofits in a state (e.g., 26,148 in Alabama vs 25 from ProPublica API) - **Comprehensive Coverage:** 1,000x more data per request than API methods - **Offline Analysis:** Download once, query locally forever (cached as Parquet) - **NTEE Filtering:** Filter by category code (health, education, religion, etc.) - **Geographic Analysis:** Complete state/city/ZIP coverage for spatial mapping **BibTeX Citation:** ```bibtex @misc{irs_eobmf_2026, title = {Exempt Organizations Business Master File Extract (EO-BMF)}, author = {{Internal Revenue Service}}, year = {2026}, month = {April}, url = {https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf}, note = {Record count: 1,952,238 organizations. Updated monthly.} } ``` **Integration:** - **ProPublica API** complements with detailed Form 990 financials and mission statements - **Every.org** adds human-readable descriptions and cause tags - **IRS EO-BMF** provides the complete foundation layer with all organizations **Complements:** - **ARDA** for congregation characteristics and health ministry programs - **HIFLD** for geospatial location data - **National Congregations Study** for social service provision patterns - **ProPublica API** for detailed financial breakdowns and executive compensation --- ### Open States / Plural Policy ⭐ **Organization:** Plural Policy (formerly Open States Foundation) **What we use:** State and local legislative information - bulk downloads of bills, votes, legislators, and legislative sessions for all 50 states. - **Website:** https://openstates.org/ - **API Documentation:** https://openstates.org/api/ - **Bulk Downloads:** https://open.pluralpolicy.com/data/ ⭐ **Recommended approach** - **Scrapers Repository:** https://github.com/openstates/openstates-scrapers - **Local Database Setup:** https://docs.openstates.org/contributing/local-database/ - **Code of Conduct:** https://docs.openstates.org/code-of-conduct/ - **Schema Documentation:** https://github.com/openstates/people/blob/master/schema.md **Coverage:** - **All 50 states** + DC + Puerto Rico - **7,300+ state legislators** with committee assignments - **Millions of bills** with full text, votes, and sponsors - **Monthly PostgreSQL dumps** (9.8GB+) for complete local analysis - **Video sources** (YouTube channels, Granicus portals) **License:** - **Bulk data:** Public Domain (preferred method) - **API content:** Varies by state - **API Key:** Free tier (50,000 requests/month) **Bulk Data Formats:** 1. **CSV:** Complete legislative sessions per state - URL: https://data.openstates.org/session/csv/ - Best for: Spreadsheet analysis, quick exploration 2. **JSON:** Bills with full text and metadata - URL: https://data.openstates.org/session/json/ - Best for: Application integration, detailed parsing 3. **PostgreSQL:** Monthly database dumps - URL: https://data.openstates.org/postgres/monthly/ - Best for: SQL analysis, local development, complete schema - Size: 9.8GB+ (complete legislative database) - No rate limits on bulk downloads **What We Use:** - PostgreSQL monthly dumps for local database (see `scripts/bulk_legislative_download.py`) - CSV/JSON session data for specific state analysis - Video source discovery (YouTube channels, Granicus portals) - Legislator contact information and committee assignments **Potential Contributions:** Our project could contribute back to the OpenStates ecosystem: - **Scraper patterns** for video sources and meeting archives - **Meeting video discovery** to enhance their data - **Granicus/YouTube integrations** for automated tracking - We follow their [Code of Conduct](https://docs.openstates.org/code-of-conduct/) for all contributions **Local Database Setup:** We use the PostgreSQL dumps following their [local database documentation](https://docs.openstates.org/contributing/local-database/): ```bash # Download monthly dump python scripts/bulk_legislative_download.py --postgres --month 2026-04 # Restore to PostgreSQL ./scripts/setup_openstates_db.sh ``` **BibTeX:** ```bibtex @software{openstates, title = {Open States}, author = {{Plural Policy}}, year = {2024}, url = {https://openstates.org/}, note = {Comprehensive state legislative data for all 50 U.S. states} } ``` ### LegiScan ⭐ **Organization:** LegiScan **What we use:** Comprehensive state legislative tracking with bill text, votes, people, and datasets for all 50 states. - **Website:** https://legiscan.com/ - **API Documentation:** https://legiscan.com/legiscan - **Dataset Downloads:** https://legiscan.com/datasets - **People Database:** https://legiscan.com/legiscan/people - **Bill Search:** https://legiscan.com/ **Coverage:** - **All 50 states** + DC + U.S. Congress - **Current and historical legislation** back to 2011 - **Bill text, sponsors, votes, amendments** with full tracking - **370,000+ legislators** (current and historical) - **Roll call votes** with individual legislator positions - **Committee assignments** and hearing schedules - **Fiscal notes** and impact statements **Available Datasets:** 1. **National Dataset:** Complete legislative data for all states - All bills, resolutions, and legislative documents - Updated daily during legislative sessions - Includes bill text, sponsors, status tracking 2. **State-Specific Datasets:** Per-state downloads - Session-specific or multi-year data - Optimized for state-level analysis 3. **People Dataset:** Legislator information - Contact details, committee assignments - District information and party affiliation - Historical legislator records 4. **Roll Call Dataset:** Voting records - Individual votes on bills and amendments - Voting patterns and trends - Committee and floor votes **API Access:** - **Free Tier:** 30,000 requests per month - **API Key:** Required (free registration) - **Bulk Downloads:** Available for subscribers - **Real-time Updates:** Daily synchronization during sessions **Data Format:** - **JSON API** for programmatic access - **CSV/Excel** exports for datasets - **SQL dumps** available for subscribers - **RSS feeds** for bill monitoring **What We Use:** - Bill text and status for legislative tracking (see `scripts/legislative_tracker.py`) - Legislator contact information for advocacy features - Roll call votes for voting pattern analysis - Dataset downloads for bulk legislative analysis **Comparison with Open States:** - **LegiScan:** More detailed bill tracking, commercial support, datasets for download - **Open States:** Free bulk PostgreSQL dumps, open source scrapers, community-driven Both are complementary - we use Open States for bulk data and LegiScan for detailed bill tracking and datasets. **License:** - API data: Terms of Service apply (https://legiscan.com/legiscan) - Datasets: Subscription required for bulk downloads - API Key: Free tier available **Use Cases:** - Track legislation by keyword (e.g., "fluoridation", "oral health") - Monitor bill progress across multiple states - Analyze legislator voting patterns - Build advocacy alerts and notifications - Research legislative trends over time **BibTeX:** ```bibtex @misc{legiscan, title = {LegiScan: State and Federal Legislative Tracking}, author = {{LegiScan}}, year = {2024}, url = {https://legiscan.com/}, note = {Comprehensive legislative data for all 50 U.S. states and Congress} } ``` **Documentation:** https://legiscan.com/legiscan **Support:** support@legiscan.com **Terms:** https://legiscan.com/legiscan --- ## 🌐 Data Sharing Standards **In this section:** - [Open Civic Data (OCD) Standards](#open-civic-data-ocd-standards) - [Popolo Project](#popolo-project) - [Schema.org](#schemaorg) - [Common Education Data Standards (CEDS)](#common-education-data-standards-ceds) - [OMOP Common Data Model (OHDSI)](#omop-common-data-model-ohdsi) ### Open Civic Data (OCD) Standards **What we use:** Standardized jurisdiction identifiers for cross-platform compatibility. **Standard:** [OCDEP 2 - Division Identifiers](https://open-civic-data.readthedocs.io/en/latest/proposals/0002.html) - **Repository:** https://github.com/opencivicdata/ocd-division-ids - **License:** Open source - **Format:** `ocd-division/country:us/state:al/place:birmingham` - **Coverage:** All U.S. jurisdictions (cities, counties, states, school districts) **Example Implementation:** ``` ocd-division/country:us/state:al # State ocd-division/country:us/state:al/county:jefferson # County ocd-division/country:us/state:al/place:birmingham # City ocd-division/country:us/state:al/school_district:birmingham_city # School District ``` ### Popolo Project **What we use:** International open government data specification for people, organizations, and elected positions. - **Specification:** https://www.popoloproject.com/ - **GitHub:** https://github.com/popolo-project/popolo-spec - **License:** Creative Commons Attribution 4.0 International - **Adoption:** Used by Civic Commons, OpenNorth, mySociety, Sunlight Foundation, and 30+ civic tech organizations worldwide **Popolo Classes Implemented:** | Popolo Class | Our Entity | Use Case | |--------------|------------|----------| | **Person** | LEADER | Elected officials, appointees | | **Organization** | ORGANIZATION | Nonprofits, government agencies | | **Membership** | LEADER ↔ ORGANIZATION | Relationships with roles and terms | | **Post** | LEADER.position_type | Positions like "Mayor", "Council Member" | | **VoteEvent** | VOTE | Voting records on motions/bills | | **Motion** | AGENDA, LEGISLATION | Formal proposals | | **Area** | JURISDICTION | Geographic/political boundaries | | **Event** | MEETING | Public meetings with agendas |
Popolo Dependencies (15 W3C/IETF Standards) | Standard | Prefix | Use Case | |----------|--------|----------| | [FOAF](http://xmlns.com/foaf/0.1/) | `foaf` | People, social networks | | [vCard](https://www.rfc-editor.org/rfc/rfc6350.html) | `vcard` | Contact information (IETF RFC 6350) | | [Schema.org](https://schema.org/) | `schema` | Structured web data | | [DCMI Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) | `dcterms` | Metadata, provenance | | [W3C Organization Ontology](https://www.w3.org/TR/vocab-org/) | `org` | Organizational structures | | [ISA Location](https://www.w3.org/ns/locn) | `locn` | Addresses, geographic data | | [GeoNames](http://www.geonames.org/ontology/) | `gn` | Geographic identifiers | | [SKOS](https://www.w3.org/2004/02/skos/) | `skos` | Taxonomies, classification | | [BIO](http://purl.org/vocab/bio/0.1/) | `bio` | Life events, relationships | | [BIBFRAME](https://www.loc.gov/bibframe/) | `bf` | Bibliographic references | | [W3C Contact](http://www.w3.org/2000/10/swap/pim/contact#) | `con` | Contact utility concepts | | [NEPOMUK Calendar](http://www.semanticdesktop.org/ontologies/ncal/) | `ncal` | Events, meetings | | [ISA Person](http://www.w3.org/ns/person) | `person` | Person attributes | | [RDF Schema](https://www.w3.org/TR/rdf-schema/) | `rdfs` | Semantic web foundation | | [ODRS](http://schema.theodi.org/odrs) | `odrs` | Data licensing |
### Schema.org **Organization:** W3C Community Group (sponsors: Google, Microsoft, Yahoo, Yandex) **What we use:** SEO-optimized structured data, JSON-LD exports, semantic web compatibility. - **Source:** https://schema.org/ - **License:** Creative Commons Attribution-ShareAlike License (CC BY-SA 3.0) - **Coverage:** 800+ types, 1,400+ properties **Our Schema.org Type Mappings:** | Our Entity | Schema.org Type | Use Case | |------------|----------------|----------| | JURISDICTION | [AdministrativeArea](https://schema.org/AdministrativeArea) | City/county pages | | MEETING | [Event](https://schema.org/Event) | Google Calendar rich results | | LEADER | [Person](https://schema.org/Person) + [GovernmentOfficial](https://schema.org/GovernmentOfficial) | Official profiles | | ORGANIZATION | [Organization](https://schema.org/Organization) + [NGO](https://schema.org/NGO) | Nonprofit listings | | LEGISLATION | [Legislation](https://schema.org/Legislation) | Bill tracking | | BALLOT_MEASURE | [Legislation](https://schema.org/Legislation) | Ballot guides | | VOTE | [VoteAction](https://schema.org/VoteAction) | Voting records | | FACT_CHECK | [ClaimReview](https://schema.org/ClaimReview) | Google Fact Check Explorer | | SCHOOL_DISTRICT | [EducationalOrganization](https://schema.org/EducationalOrganization) | School district info | | VIDEO | [VideoObject](https://schema.org/VideoObject) | YouTube integration | | DOCUMENT | [DigitalDocument](https://schema.org/DigitalDocument) | Document library | | CONSTITUENT | [Person](https://schema.org/Person) | Donor/volunteer profiles | | DONATION | [DonateAction](https://schema.org/DonateAction) | Donation receipts | | CAMPAIGN | [FundingScheme](https://schema.org/FundingScheme) | Fundraising campaigns | | PROGRAM_DELIVERY | [Service](https://schema.org/Service) | Program catalog | **Benefits:** - βœ… Google Search rich results - βœ… Voice assistant compatibility (Alexa, Google Assistant) - βœ… Knowledge Graph integration - βœ… Cross-platform (Apple, Bing, Yandex) ### Common Education Data Standards (CEDS) **Organization:** U.S. Department of Education, National Center for Education Statistics (NCES) **What we use:** School district data modeling, NCES interoperability, education finance tracking. - **Source:** https://ceds.ed.gov/ - **GitHub:** https://github.com/CEDStandards - **License:** Public Domain (U.S. Government) - **Coverage:** 2,300+ data elements, 500+ option sets **CEDS Alignment:** | Our Field | CEDS Element ID | CEDS Element Name | |-----------|----------------|-------------------| | `nces_id` | 000827 | LEA Identifier (NCES) | | `district_name` | 000168 | Name of Institution | | `total_students` | 001475 | Student Count | | `total_revenue` | 000612 | Total Revenue | | `per_pupil_spending` | 000613 | Expenditure per Student | **Benefits:** - βœ… NCES Common Core of Data (CCD) compatibility - βœ… F-33 Finance Survey alignment - βœ… Federal grant reporting (ESSA, Title I, IDEA) --- ### OMOP Common Data Model (OHDSI) **Organization:** Observational Health Data Sciences and Informatics (OHDSI) **What we use:** Vocabulary and terminology standardization system - CONCEPT, VOCABULARY, CONCEPT_CLASS, CONCEPT_RELATIONSHIP tables for consistent data classification. - **Source:** https://ohdsi.github.io/CommonDataModel/ - **GitHub:** https://github.com/OHDSI/CommonDataModel - **Vocabulary Documentation:** https://ohdsi.github.io/TheBookOfOhdsi/StandardizedVocabularies.html - **License:** Apache License 2.0 - **Coverage:** Comprehensive vocabulary system for standardizing concepts across domains **OMOP CDM Tables We Implement:** | Table | Purpose | Our Use Case | |-------|---------|-------------| | **CONCEPT** | Master vocabulary list | Standardized codes for topics, demographics, classifications | | **VOCABULARY** | Source vocabularies | Track origin of concepts (NTEE, FIPS, Schema.org, etc.) | | **CONCEPT_CLASS** | Categorization | Group concepts by type (demographic, geographic, topic) | | **CONCEPT_RELATIONSHIP** | Linkages | Map relationships between concepts (is-a, maps-to, subsumes) | **Our OMOP-Inspired Vocabularies:** | Vocabulary ID | Description | Concept Count | |---------------|-------------|---------------| | `NTEE` | National Taxonomy of Exempt Entities | 600+ | | `FIPS` | Federal Information Processing Standards | 90,000+ | | `Schema.org` | Structured data types | 800+ | | `Popolo` | Open government data specs | 15+ | | `OCD-ID` | Open Civic Data identifiers | 22,000+ | | `CEDS` | Common Education Data Standards | 2,300+ | | `CENSUS` | U.S. Census categories | 1,000+ | | `INTERNAL` | Custom platform classifications | 500+ | **Example Implementation:** ```sql -- CONCEPT table: standardized demographics concept_id | concept_name | vocabulary_id | concept_class_id -----------|---------------------------|---------------|------------------ 100001 | Race: White | CENSUS | Demographic 100002 | Race: Black/African Amer. | CENSUS | Demographic 100003 | Hispanic/Latino Ethnicity | CENSUS | Demographic 100004 | Gender: Male | CENSUS | Demographic -- CONCEPT_RELATIONSHIP: hierarchies concept_id_1 | concept_id_2 | relationship_id -------------|--------------|---------------- 100002 | 100000 | Is a # Race category 100003 | 100010 | Is a # Ethnicity category ``` **Benefits:** - βœ… Consistent terminology across all datasets - βœ… Hierarchical concept relationships - βœ… Traceable concept provenance (source vocabularies) - βœ… Industry-standard approach used by healthcare and research institutions - βœ… Supports multiple classification systems simultaneously **BibTeX:** ```bibtex @misc{ohdsi_omop_cdm, author = {{Observational Health Data Sciences and Informatics (OHDSI)}}, title = {OMOP Common Data Model}, year = {2024}, url = {https://ohdsi.github.io/CommonDataModel/}, license = {Apache-2.0} } ``` --- ## πŸ—³οΈ Election & Advocacy **In this section:** - [Ballotpedia](#ballotpedia) - [MIT Election Data + Science Lab](#mit-election-data--science-lab) - [OpenElections](#openelections) ### Ballotpedia **Organization:** Lucy Burns Institute **What we use:** Ballot measures, referendums, propositions for fluoridation tracking and health policy analysis. - **Source:** https://ballotpedia.org/ - **API:** https://ballotpedia.org/API-documentation - **Coverage:** All 50 states, historical measures back to 1990s - **License:** API access limited at scale (paid tier available) ### MIT Election Data + Science Lab **Organization:** Massachusetts Institute of Technology **What we use:** County-level election results for political composition analysis. - **Source:** https://electionlab.mit.edu/data - **Repository:** https://github.com/MEDSL/official-returns - **Coverage:** 1976-present, presidential/congressional/gubernatorial results - **License:** Free for research and commercial use ### OpenElections **What we use:** State-by-state certified election results in standardized CSV format. - **Source:** https://openelections.net/ - **GitHub:** https://github.com/openelections - **Coverage:** All 50 states (various completion levels), precinct-level data - **License:** Open source (varies by state) --- ## 🏒 Nonprofit & Philanthropy **In this section:** - [IRS Exempt Organizations Business Master File (EO-BMF)](#irs-exempt-organizations-business-master-file-eo-bmf) - **PRIMARY BULK DATA SOURCE (1.9M+ orgs)** - [Google BigQuery IRS 990 Data](#google-bigquery-irs-990-data) - **RECOMMENDED FOR BULK FORM 990 ENRICHMENT (5M+ filings)** - [GivingTuesday 990 Data Infrastructure](#givingtuesday-990-data-infrastructure) - **AWS S3 DATA LAKE (5.4M+ raw Form 990 XMLs)** - [ProPublica Nonprofit Explorer](#propublica-nonprofit-explorer) - [ProPublica Congress API](#propublica-congress-api) - [ProPublica Campaign Finance API](#propublica-campaign-finance-api) - [ProPublica Vital Signs API](#propublica-vital-signs-api) - [Every.org Charity API](#everyorg-charity-api) - [Findhelp.org (Aunt Bertha)](#findhelporg-aunt-bertha) - [211 Regional Directories](#211-regional-directories) - [Association of Religion Data Archives (ARDA)](#association-of-religion-data-archives-arda) - [Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship](#homeland-infrastructure-foundation-level-data-hifld-places-of-worship) - [National Congregations Study (NCS)](#national-congregations-study-ncs) - [Microsoft Common Data Model for Nonprofits](#microsoft-common-data-model-for-nonprofits) ### IRS Exempt Organizations Business Master File (EO-BMF) **Organization:** Internal Revenue Service (IRS), U.S. Department of Treasury **What we use:** **PRIMARY BULK DATA SOURCE** for comprehensive nonprofit data - ALL 1.9M+ U.S. tax-exempt organizations with EIN, NTEE codes, financial data, subsection classification, and geographic location. - **Source:** https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf - **Search Tool:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search - **Bulk Downloads:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads - **API Documentation:** See [IRS Bulk Data Integration](./irs-bulk-data.md) - **Coverage:** 1,952,238 organizations (as of April 2026) - **Churches & Religious Organizations:** 300,000+ (NTEE codes X, X20, X21, X22, X30, X40) - **Health Organizations:** 80,000+ (NTEE codes E, E20-E99) - **Human Services:** 200,000+ (NTEE codes P, P20-P99) - **All Other Categories:** 1.3M+ (Education, Arts, Environment, etc.) - **Update Frequency:** Monthly - **License:** Public domain (U.S. government data) - **Format:** CSV (regional files), convertible to Parquet - **Record Count:** 1.9M+ total nonprofits across 4 regional files **Data Fields (28 columns):** - **Identification:** EIN, organization name, sort name - **Location:** Street address, city, state, ZIP code - **Classification:** NTEE code, subsection (501(c)(3), etc.), foundation code - **Financial:** Asset amount, income amount, revenue amount - **Status:** Tax-exempt status, deductibility status, ruling date - **Organization:** Organization code, activity codes, group affiliation **NTEE Codes for Churches:** - **X** - Religion Related, Spiritual Development - **X20** - Christian (churches, ministries) - **X21** - Protestant - **X22** - Roman Catholic - **X30** - Jewish - **X40** - Islamic **Use Cases:** - **Bulk Download:** Get ALL nonprofits in a state (e.g., 26,148 in Alabama vs 25 from ProPublica API) - **Comprehensive Coverage:** 1,000x more data per request than API methods - **Offline Analysis:** Download once, query locally forever (cached as Parquet) - **NTEE Filtering:** Filter by category code (health, education, religion, etc.) - **Geographic Analysis:** Complete state/city/ZIP coverage for spatial mapping **BibTeX Citation:** ```bibtex @misc{irs_eobmf_2026, title = {Exempt Organizations Business Master File Extract (EO-BMF)}, author = {{Internal Revenue Service}}, year = {2026}, month = {April}, url = {https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf}, note = {Record count: 1,952,238 organizations. Updated monthly.} } ``` **Integration:** - **ProPublica API** complements with detailed Form 990 financials and mission statements - **Every.org** adds human-readable descriptions and cause tags - **IRS EO-BMF** provides the complete foundation layer with all organizations **Complements:** - **ARDA** for congregation characteristics and health ministry programs - **HIFLD** for geospatial location data - **National Congregations Study** for social service provision patterns - **ProPublica API** for detailed financial breakdowns and executive compensation --- ### Google BigQuery IRS 990 Data **Organization:** Google Cloud Platform (IRS data mirrored by Google) **What we use:** **RECOMMENDED FOR BULK FORM 990 ENRICHMENT** - SQL-queryable IRS Form 990 electronic filings with detailed financial data, mission statements, and program descriptions. - **Source:** https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990 - **Documentation:** https://cloud.google.com/bigquery/docs/irs-990-dataset - **BigQuery Dataset:** `bigquery-public-data.irs_990` - **Coverage:** 5,000,000+ Form 990 electronic filings (2011-present) - **Update Frequency:** Annually (updated when IRS publishes new data) - **License:** Public domain (U.S. government data, hosted by Google) - **Format:** SQL-queryable tables in BigQuery - **Cost:** Free tier includes 1 TB of queries per month **Available Tables:** - **`irs_990.irs_990_2013` - `irs_990.irs_990_2024`** - Individual years (2013-2024) - **`irs_990.irs_990_ein`** - All filings aggregated by EIN - **`irs_990.irs_990_pf_2013` - `irs_990.irs_990_pf_2024`** - Private foundation filings (Form 990-PF) **Data Fields (100+ columns):** - **Identification:** EIN, organization name, tax year - **Financials:** - Total revenue, contributions, program service revenue, investment income - Total expenses, program expenses, management expenses, fundraising expenses - Total assets, total liabilities, net assets - Grants paid, grants received - **Mission & Programs:** - Mission description (text field) - Program service accomplishments (up to 10 programs with descriptions) - Program service expenses per program - **Governance:** - Number of voting members, independent members - Officer and director compensation - Key employee information - **Activities:** - Legislative activities, political expenditures, lobbying - Foreign operations, foreign grants - Website URL - **Compliance:** - Public inspection policies - Conflict of interest policies - Whistleblower policies **Key Advantages:** - **Serverless SQL:** Query 5M+ records without downloading files - **Mission Extraction:** Get mission statements and program descriptions in bulk - **Website URLs:** Extract organization websites (not in EO-BMF) - **Historical Data:** 10+ years of financial trends per organization - **Scalable:** Process thousands of nonprofits in a single query - **No API Rate Limits:** Unlike ProPublica's 25-record limit **Example Use Cases:** - **Bulk Mission Enrichment:** Extract mission statements for all health nonprofits in Alabama - **Website Discovery:** Get organization websites for outreach campaigns - **Financial Trend Analysis:** Track revenue/expense trends over 10 years - **Program Service Analysis:** Identify nonprofits by specific program keywords - **Grant Analysis:** Find organizations that award grants vs. receive grants **Setup Requirements:** 1. Create a Google Cloud project 2. Enable BigQuery API 3. Authenticate: ```bash # Option A: Application default credentials gcloud auth application-default login # Option B: Service account key export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json" ``` **Example Query (Extract Alabama Health Nonprofits with Missions):** ```sql SELECT ein, organization_name, tax_year, mission_description, website, total_revenue, total_expenses, program_service_expenses FROM `bigquery-public-data.irs_990.irs_990_2023` WHERE state = 'AL' AND mission_description LIKE '%health%' AND total_revenue > 100000 ORDER BY total_revenue DESC LIMIT 1000 ``` **BibTeX Citation:** ```bibtex @misc{google_bigquery_irs990, title = {IRS 990 Dataset}, author = {{Google Cloud Platform} and {Internal Revenue Service}}, year = {2024}, url = {https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990}, note = {BigQuery public dataset: bigquery-public-data.irs\_990. Coverage: 5M+ Form 990 electronic filings (2011-present)} } ``` **Integration:** - **IRS EO-BMF** provides the complete organization registry (1.9M+ orgs) - **Google BigQuery** enriches with mission statements, websites, and detailed financials - **ProPublica API** adds executive compensation and recent filing details - **GivingTuesday Data Lake** provides raw XML for custom field extraction **Complements:** - See [Form 990 XML Data (GivingTuesday Data Lake)](./form-990-xml.md) for alternative bulk download approach - See [IRS Bulk Data Integration](./irs-bulk-data.md) for EO-BMF foundation layer **Cost Estimates:** - **Free tier:** 1 TB queries/month = ~2-4 million nonprofit records - **Beyond free tier:** $5 per TB after first 1 TB - **Example:** Enriching 100,000 nonprofits with missions = ~20 GB = **Free** --- ### GivingTuesday 990 Data Infrastructure **Organization:** GivingTuesday **What we use:** Raw Form 990 XML filings from AWS S3 for detailed financial data extraction, custom field parsing, and comprehensive nonprofit analysis. - **Website:** https://990data.givingtuesday.org/ - **Data Lake:** `s3://gt990datalake-rawdata` (AWS S3, us-east-1 Virginia, Public Access) - **Console:** https://us-east-1.console.aws.amazon.com/s3/buckets/gt990datalake-rawdata - **Coverage:** 5.4M+ e-filed Form 990s (2011-present, ~300K new filings/year) - **Scale:** ~10 TB of raw XML data - **Update Frequency:** Ongoing (as IRS publishes new e-filings) - **License:** Public domain (IRS data) + Open source tools - **Access:** Free, no AWS credentials required (anonymous access via `--no-sign-request`) - **Format:** XML files (1-2 MB each) + CSV/Parquet indices **Data Lake Structure:** ``` s3://gt990datalake-rawdata/ β”œβ”€β”€ EfileData/ β”‚ β”œβ”€β”€ XmlFiles/ # Individual 990 XMLs (~5.4M files, ~10 TB) β”‚ β”‚ └── [OBJECT_ID]_public.xml (e.g., 202233259349300703_public.xml) β”‚ └── XmlZips/ # ZIP archives (97 files, ~38 GB β†’ ~95 GB uncompressed) β”‚ └── YYYY_TEOS_XML_*.zip (e.g., 2023_TEOS_XML_01A.zip ~400 MB) └── Indices/ └── 990xmls/ # CSV indices with metadata └── index_all_years_efiledata_xmls_created_on_2023-10-29.csv (~925 MB) ``` **Download Strategies:** | Approach | Best For | Time | Bandwidth | Storage | |----------|----------|------|-----------|---------| | **Individual XMLs** | Single state or targeted | ~2 hrs (22K orgs) | 32 GB | 32 GB | | **ZIP Archives** | All states / nationwide | ~6 hrs total | 38 GB | 95 GB | **Choose Individual XMLs when:** - You need data for 1-5 states only - You want to download only specific EINs - Storage space is limited - You want incremental caching **Choose ZIP Archives when:** - You need all 50 states - You're building a comprehensive database - You have 100+ GB storage - You want offline access to all filings **What You Can Extract:** - **Financials:** Revenue, expenses, assets, liabilities, net income, grants paid/received - **Programs:** Detailed program descriptions, accomplishments, expenses per program (up to 10) - **Governance:** Officer compensation, board members, key employees (with names and titles) - **Activities:** Legislative activities, lobbying expenses, political contributions - **Mission:** Organization mission statement and activity descriptions - **Website:** Organization website URLs - **Grants:** List of grant recipients with amounts (for grantmaking organizations) - **Custom Fields:** Any field in the IRS Form 990 schema (990, 990-EZ, 990-PF) **S3 Access Examples:** **Individual XMLs (for single state or targeted download):** ```bash # List index files (no credentials needed) aws s3 ls s3://gt990datalake-rawdata/Indices/990xmls/ --no-sign-request # Download index (~925 MB) aws s3 cp s3://gt990datalake-rawdata/Indices/990xmls/index_all_years_efiledata_xmls_created_on_2023-10-29.csv . --no-sign-request # Download specific XML aws s3 cp s3://gt990datalake-rawdata/EfileData/XmlFiles/202233259349300703_public.xml . --no-sign-request # Batch download for single state (using our script) python scripts/batch_download_990s.py --state MA --health-only --concurrent 1000 ``` **ZIP Archives (for all states / nationwide):** ```bash # Download all 97 ZIPs (~38 GB) to local directory ./scripts/download_990_zips.sh # Extract all ZIPs to get ~384K XMLs (~95 GB) ./scripts/extract_990_zips.sh # Build local index for fast lookup python scripts/build_990_local_index.py # Now enrich from local files (no network needed!) python scripts/enrich_all_states_990.py ``` **Python Access:** ```python import boto3 from botocore import UNSIGNED from botocore.config import Config # Configure anonymous S3 client s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED)) # Download individual XML xml_obj = s3.get_object( Bucket='gt990datalake-rawdata', Key='EfileData/XmlFiles/202233259349300703_public.xml' ) xml_content = xml_obj['Body'].read() # Download ZIP zip_obj = s3.get_object( Bucket='gt990datalake-rawdata', Key='EfileData/XmlZips/2023_TEOS_XML_01A.zip' ) zip_content = zip_obj['Body'].read() ``` **Index Schema:** The CSV index contains: `EIN`, `TaxPeriod`, `ObjectId`, `URL`, `FormType`, `OrganizationName`, `DLN`, `SubmittedOn` **Key Advantages:** - **Raw XML Access:** Extract ANY field from Form 990, including custom/rare fields - **No Query Costs:** Download once, parse locally (unlike BigQuery queries) - **Offline Processing:** Process on your own infrastructure without rate limits - **Complete Historical Data:** All e-filed 990s since 2011 - **Batch Downloads:** Download thousands of XMLs in parallel - **No Authentication:** Public S3 bucket (no AWS account needed) **Use Cases:** - **Custom Field Extraction:** Parse fields not available in BigQuery (e.g., specific schedules) - **Bulk Enrichment:** Download and process thousands of nonprofits locally - **Offline Analysis:** Build your own database from raw XML - **Historical Trends:** Analyze 10+ years of financial data - **Grant Research:** Extract detailed grant recipient lists from Form 990 Schedule I **BibTeX Citation:** ```bibtex @misc{givingtuesday990data, title = {GivingTuesday 990 Data Infrastructure}, author = {{GivingTuesday}}, year = {2023}, url = {https://990data.givingtuesday.org/}, note = {AWS S3 data lake of IRS Form 990 XML filings. Bucket: s3://gt990datalake-rawdata. Coverage: 5.4M+ filings (2011-present)} } ``` **Integration:** - **IRS EO-BMF** provides the complete organization registry (1.9M+ orgs) - **GivingTuesday Data Lake** enriches with raw XML for custom parsing - **Google BigQuery** offers SQL interface for standard fields - **ProPublica API** adds web-friendly access for individual lookups **Complements:** - See [Form 990 XML Data (GivingTuesday Data Lake)](./form-990-xml.md) for detailed integration guide - See [Form 990 Enrichment Guide](../guides/form-990-enrichment.md) for usage examples - See [IRS Bulk Data Integration](./irs-bulk-data.md) for EO-BMF foundation layer **Attribution:** When publishing analyses using this data, please cite: 1. GivingTuesday 990 Data Infrastructure: https://990data.givingtuesday.org/ 2. Our enrichment tools: https://github.com/getcommunityone/open-navigator-for-engagement --- ### ProPublica Nonprofit Explorer **Organization:** ProPublica, Inc. **What we use:** Enhanced financial data and detailed Form 990 filings to complement IRS EO-BMF bulk data. - **Source:** https://projects.propublica.org/nonprofits/ - **API Documentation:** https://projects.propublica.org/nonprofits/api - **Coverage:** 3,000,000+ organizations, 10+ years of historical data - **Data Included:** - Total revenue, expenses, assets, liabilities - Executive compensation (top 5 highest paid) - Program service expenses vs. administrative overhead - NTEE classification codes (National Taxonomy of Exempt Entities) - EIN (Employer Identification Number) for verification - **Rate Limits:** Free, unlimited access (respectful use recommended: ~1 req/sec) - **API Limitation:** Returns max 25 results per request, no pagination (use IRS EO-BMF for bulk downloads) - **License:** Free for research and commercial use **BibTeX:** ```bibtex @misc{propublica_nonprofits, author = {{ProPublica}}, title = {Nonprofit Explorer}, year = {2024}, url = {https://projects.propublica.org/nonprofits/}, note = {Accessed: 2024} } ``` --- ### ProPublica Congress API **Organization:** ProPublica, Inc. **What we use:** Legislative data including roll-call votes, member information, bills, and congressional activity to link policy decisions to government meetings. - **Source:** https://projects.propublica.org/api-docs/congress-api/ - **API Documentation:** https://projects.propublica.org/api-docs/congress-api/ - **Coverage:** U.S. Congress data from 102nd Congress (1991) to present - **Data Included:** - Roll-call votes by member and bill - Bill information, status, and amendments - Member biographical data and voting records - Committee assignments and leadership - Congressional statements and floor appearances - **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/propublica-congress-api) - **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY` - **Rate Limits:** 5,000 requests per day - **License:** Free for non-commercial and commercial use with attribution **Use Cases:** - Link local government meetings to federal legislation - Track how elected officials vote on issues discussed locally - Correlate campaign contributions with voting patterns **BibTeX:** ```bibtex @misc{propublica_congress, author = {{ProPublica}}, title = {Congress API}, year = {2024}, url = {https://projects.propublica.org/api-docs/congress-api/}, note = {Accessed: 2024} } ``` --- ### Federal Election Commission (FEC) - Bulk Data & OpenFEC API **Organization:** Federal Election Commission (FEC), U.S. Government **What we use:** **PRIMARY SOURCE** for campaign finance data - individual contributions, candidate filings, committee data, and political expenditures for comprehensive campaign finance analysis. - **OpenFEC API:** https://api.open.fec.gov/developers/ - **Bulk Data Portal:** https://www.fec.gov/data/browse-data/?tab=bulk-data - **Documentation:** https://www.fec.gov/campaign-finance-data/ - **Coverage:** Complete FEC data from 1980s to present (updated nightly) - **Data Included:** - **Individual contributions** $200+ (Schedule A) - **Operating expenditures** (Schedule B) - **Candidate master files** (House, Senate, Presidential) - **Committee master files** (PACs, Super PACs, party committees) - **Campaign finance totals** by election cycle - **Independent expenditures** and electioneering communications - **Access Methods:** - **Bulk Downloads:** Free, unlimited, no API key (CSV and FEC format) - **OpenFEC API:** Free with API key (1,000 requests/hour) - **Demo Key:** 30 requests/hour (no registration) - **API Key:** Free at https://api.data.gov/signup/ - **License:** Public Domain (U.S. Government) - **Update Frequency:** Nightly (most datasets) **Use Cases:** - Map donor networks and political influence patterns - Link nonprofit leadership donations to policy decisions - Track campaign finance in health advocacy organizations - Analyze funding sources for ballot initiatives - Cross-reference contributions with government grant awards - "Follow the money" from donor to policy outcome **Critical Policy Restriction:** - ⚠️ **Cannot use contributor data for commercial solicitation or fundraising** - FEC data is for transparency and research, not marketing **BibTeX:** ```bibtex @misc{fec_data_2024, author = {{Federal Election Commission}}, title = {Campaign Finance Data and Bulk Downloads}, year = {2024}, url = {https://www.fec.gov/data/}, note = {Updated nightly. Accessed: 2024} } @misc{openfec_api_2024, author = {{Federal Election Commission}}, title = {OpenFEC API}, year = {2024}, url = {https://api.open.fec.gov/developers/}, note = {RESTful API for campaign finance data. Accessed: 2024} } ``` **Integration:** `discovery/fec_integration.py` --- ### ProPublica Campaign Finance API **Organization:** ProPublica, Inc. **What we use:** Simplified access to FEC data with pre-aggregated summaries and top donor analysis (complements direct FEC data access). - **Source:** https://projects.propublica.org/api-docs/campaign-finance/ - **API Documentation:** https://projects.propublica.org/api-docs/campaign-finance/ - **Coverage:** FEC data from 2000 election cycle to present - **Data Included:** - Candidate financial summaries and filings - Committee information and contributions - Individual and organizational donor data - Independent expenditures and disbursements - Top donors by industry and geography - **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/campaign-finance-api) - **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY` - **Rate Limits:** 5,000 requests per day - **License:** Free for non-commercial and commercial use with attribution **Note:** ProPublica API provides easier-to-use summaries of FEC data. For bulk analysis, use FEC Bulk Downloads directly. **Use Cases:** - Quick lookups of candidate finance summaries - Pre-aggregated top donor analysis - Industry contribution patterns - Journalist-friendly data formatting **BibTeX:** ```bibtex @misc{propublica_campaign_finance, author = {{ProPublica}}, title = {Campaign Finance API}, year = {2024}, url = {https://projects.propublica.org/api-docs/campaign-finance/}, note = {Accessed: 2024} } ``` --- ### ProPublica Vital Signs API **Organization:** ProPublica, Inc. **What we use:** Healthcare provider data including doctors, facilities, disciplinary actions, and Medicare participation to support oral health policy analysis. - **Source:** https://projects.propublica.org/vital-signs/ - **API Documentation:** https://projects.propublica.org/api-docs/vital-signs/ - **Coverage:** 1,000,000+ healthcare providers across the United States - **Data Included:** - Doctor biographical information and specialties - Medical school and residency training - Hospital affiliations and group practices - State medical board disciplinary actions - Medicare participation and payments - Malpractice claims and settlements - **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/vital-signs-api) - **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY` - **Rate Limits:** 5,000 requests per day - **License:** Free for non-commercial and commercial use with attribution **Use Cases:** - Map dental care access and provider availability - Link health policy discussions to provider networks - Identify healthcare deserts and underserved areas - Track quality metrics for oral health providers - Correlate public health outcomes with provider density **BibTeX:** ```bibtex @misc{propublica_vital_signs, author = {{ProPublica}}, title = {Vital Signs: Health Care Provider Data}, year = {2024}, url = {https://projects.propublica.org/vital-signs/}, note = {Accessed: 2024} } ``` --- ### Every.org Charity API **Organization:** Every.org (Public Benefit Corporation) **What we use:** Human-readable mission statements, organization logos, cause categories, cleaner metadata than raw IRS filings. - **API Documentation:** https://www.every.org/nonprofit-api - **Coverage:** 1,000,000+ verified nonprofits - **Data Included:** - Mission statements and descriptions - Organization logos and images - Cause tags (health, education, environment, etc.) - Social media links - **Access:** API key required (free tier available) - **License:** API Terms of Service --- ### Findhelp.org (Aunt Bertha) **Organization:** Findhelp (formerly Aunt Bertha) **What we use:** Comprehensive directory of local social services - specific programs, hours, eligibility requirements, contact information. - **Source:** https://www.findhelp.org/ - **Coverage:** 400,000+ community programs across the United States - **Data Included:** - Program descriptions and services offered - Days/hours of operation - Eligibility requirements - Languages spoken - Insurance accepted - Contact information (phone, email, address) - **Access:** Public search available, API access by request - **Use Case:** Manual enrichment of ProPublica financial data with service delivery details **Example:** https://www.findhelp.org/search?query=dental&location=Tuscaloosa,%20AL --- ### 211 Regional Directories **What we use:** Regional social services directories with detailed program information, crisis hotlines, local resources. - **Source:** https://www.211.org/ (national network) - **Example:** https://www.211connects.org (Alabama) - **Coverage:** Local services in most U.S. cities and counties - **Data Included:** - Specific services and programs - Hours of operation - Eligibility criteria - Languages and accessibility - **Access:** Public search, some regions offer data partnerships - **License:** Varies by region --- ### Association of Religion Data Archives (ARDA) **Organization:** Pennsylvania State University **What we use:** U.S. Congregational Life Survey and denominational data for understanding church characteristics, programs, and community services including health ministries. - **Source:** https://www.thearda.com/ - **Data Portal:** https://www.thearda.com/data-archive - **U.S. Congregational Life Survey:** https://www.thearda.com/Archive/Files/Descriptions/USCONGLIFE.asp - **Coverage:** 300,000+ congregations with detailed program data - **License:** Free for research and non-commercial use **Key Datasets:** | Dataset | Coverage | Variables | |---------|----------|----------| | **U.S. Congregations** | All denominations, 50 states | Congregation size, programs, community services | | **Religious Congregations & Membership Study** | County-level data | Adherents, congregations by denomination | | **National Congregations Study** | Representative sample of 1,200+ | Worship, programs, social services | **What We Extract:** - Congregation size and attendance - Health ministry programs (dental, medical, mental health) - Food programs and community meals - Youth and senior programs - Community outreach budget - Social service partnerships **Example Use Case:** Identify churches with active health ministries in Tuscaloosa, AL that provide free dental kits, health screenings, or partner with mobile dental units. **Citation:** ```bibtex @misc{arda_congregations, author = {{Association of Religion Data Archives}}, title = {U.S. Congregational Life Survey}, year = {2024}, publisher = {Pennsylvania State University}, url = {https://www.thearda.com/}, note = {Free for research use} } ``` --- ### Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship **Organization:** U.S. Department of Homeland Security (DHS) **What we use:** Geospatial database of 350,000+ places of worship for mapping faith-based health service locations and identifying service gaps. - **Source:** https://hifld-geoplatform.opendata.arcgis.com/ - **Dataset:** https://hifld-geoplatform.opendata.arcgis.com/datasets/places-of-worship - **Format:** Shapefile, GeoJSON, CSV - **Coverage:** 350,000+ churches, mosques, synagogues, temples - **License:** Public Domain (U.S. Government) **Fields Available:** - Name of place of worship - Address (street, city, state, ZIP) - Latitude/Longitude (precise geolocation) - Denomination - Religious tradition - Facility type **Use Cases:** 1. **Map faith-based health providers** - Overlay churches with health ministries on city maps 2. **Identify service deserts** - Find areas underserved by both clinics and church programs 3. **Route mobile dental units** - Plan stops at large congregations 4. **Partnership outreach** - Locate churches near schools or clinics **Citation:** > "Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship. U.S. Department of Homeland Security. https://hifld-geoplatform.opendata.arcgis.com/" --- ### National Congregations Study (NCS) **Organization:** Duke University **What we use:** Representative survey of U.S. congregations to understand social service provision, health programs, and civic engagement patterns. - **Source:** https://sites.duke.edu/ncsweb/ - **Principal Investigator:** Mark Chaves, Duke University Divinity School - **Coverage:** 1,200+ congregations (representative sample) - **Waves:** 1998, 2006-07, 2012, 2018-19 - **License:** Free for academic and research use **Key Findings:** - **60% of congregations** provide social services (food, housing, health) - **15% of congregations** have health-related programs - **Large urban churches** (500+ attendees) more likely to have formal health ministries - **25% collaborate** with clinics, hospitals, or health departments **Variables We Use:** | Variable | Description | Relevance | |----------|-------------|-----------| | `HLTHPROG` | Has health-related program | Health ministry presence | | `FOODPROG` | Operates food program | Nutrition education opportunity | | `YOUTHPROG` | Youth programs | Reach children for dental education | | `SENIORPROG` | Senior programs | Medicare enrollment help | | `PARTNERSHIP` | Partners with nonprofits | Collaboration potential | **Citation:** ```bibtex @misc{ncs_2018, author = {Chaves, Mark and Anderson, Shawna}, title = {National Congregations Study: Cumulative Dataset (1998, 2006-07, 2012, 2018-19)}, year = {2020}, publisher = {Duke University}, url = {https://sites.duke.edu/ncsweb/}, doi = {10.1093/soc/swaa029} } ``` --- ### Microsoft Common Data Model for Nonprofits **Organization:** Microsoft Corporation **What we use:** Nonprofit data standardization, constituent relationship management, donor tracking, program outcome measurement. - **Repository:** https://github.com/microsoft/Nonprofits/tree/master/CommonDataModelforNonprofits - **ERD Documentation:** [common-data-model-for-nonprofits-erds.pdf](https://github.com/microsoft/Nonprofits/blob/master/CommonDataModelforNonprofits/Documents/common-data-model-for-nonprofits-erds.pdf) - **License:** MIT License - **Coverage:** Donor management, fundraising, program delivery, volunteer management, impact measurement **Microsoft CDM Entities Implemented:** | Microsoft CDM Entity | Our Entity | Description | |---------------------|------------|-------------| | Constituent | CONSTITUENT | Donors, volunteers, members, beneficiaries | | Donation | DONATION | Financial contributions and in-kind gifts | | Campaign | CAMPAIGN | Fundraising campaigns and appeals | | Designation | DESIGNATION | Fund allocation (unrestricted, restricted, endowment) | | Membership | MEMBERSHIP | Member enrollment and renewals | | Volunteer Preference | VOLUNTEER_ACTIVITY | Volunteer hours and activities | | Delivery Framework | PROGRAM_DELIVERY | Programs and services delivered | | Objective | PROGRAM_OUTCOME | Measurable impact and KPIs | **Integration Benefits:** - βœ… Dynamics 365 Nonprofit compatibility - βœ… Power Platform (Power BI, Power Apps, Power Automate) - βœ… Azure Synapse analytics - βœ… Constituent 360 view --- ## βœ… Fact-Checking **In this section:** - [Google Fact Check Tools API](#google-fact-check-tools-api) - [FactCheck.org](#factcheckorg) - [PolitiFact](#politifact) ### Google Fact Check Tools API **Organization:** Google LLC **What we use:** Aggregated fact-checking data for verifying claims from meetings and legislation. - **Source:** https://toolbox.google.com/factcheck/explorer - **API:** https://developers.google.com/fact-check/tools/api - **Schema:** https://developers.google.com/search/docs/appearance/structured-data/factcheck - **Coverage:** 100+ fact-checking organizations worldwide - **License:** Free API (10,000 queries/day quota) ### FactCheck.org **Organization:** Annenberg Public Policy Center, University of Pennsylvania **What we use:** Nonpartisan fact-checking of political claims and health policy verification. - **Source:** https://www.factcheck.org/ - **Coverage:** National politics, health claims, science, viral content (2003-present) - **License:** Free (web scraping allowed with rate limiting) ### PolitiFact **Organization:** Poynter Institute (Pulitzer Prize-winning) **What we use:** State-level fact-checking, Truth-O-Meter ratings for ballot measures. - **Source:** https://www.politifact.com/ - **Coverage:** All 50 states, federal politics (2007-present) - **Rating Scale:** True, Mostly True, Half True, Mostly False, False, Pants on Fire - **License:** Free (web scraping allowed with rate limiting) --- ## πŸ’» Civic Tech & Open Source **In this section:** - [Cloud & Data Platforms](#cloud--data-platforms) - [Civic Tech Field Guide](#civic-tech-field-guide) - [Code for America: Brigade Network](#code-for-america-brigade-network) - [U.S. Digital Response (USDR)](#us-digital-response-usdr) - [Digital Public Goods Alliance (DPGA)](#digital-public-goods-alliance-dpga) ### Cloud & Data Platforms **Organization:** Microsoft Corporation / GitHub, Inc. **What we use:** GitHub REST and GraphQL APIs for tracking civic tech projects, hackathons, contributors, and open source development. - **Source:** https://docs.github.com/en/rest - **GraphQL API:** https://docs.github.com/en/graphql - **Rate Limits:** 5,000 requests/hour (authenticated) - **License:** Free (API usage subject to GitHub Terms of Service) **Data Extracted:** | Dataset | Description | Fields Tracked | |---------|-------------|----------------| | `github_repositories` | Civic tech projects and repos | name, stars, forks, topics, language, license | | `contributors` | Project maintainers and contributors | login, contributions, role, github_sponsor_enabled | | `project_issues` | Good first issues, help wanted | labels, state, title, created_at | | `project_funding` | GitHub Sponsors, OpenCollective | funding_type, sponsor_count, monthly_amount | | `hackathon_projects` | Projects built at civic hackathons | hackathon_id, project_name, repo_url, demo_url | **Civic Tech Topics Tracked:** - `civic-tech`, `open-government`, `government-transparency` - `public-data`, `open-data`, `civic-engagement` - `democracy`, `accountability`, `policy-analysis` **Why GitHub API:** - **Discovery:** Find civic tech projects and open source tools - **Collaboration:** Track contributors and maintainers - **Opportunities:** Surface "good first issue" labels for new contributors - **Funding:** Identify projects needing financial support - **Hackathons:** Document projects built at civic hackathon events **Implementation:** ```python # Our platform uses: - /civic_tech/github_repositories # Project metadata - /civic_tech/contributors # Maintainer info - /civic_tech/project_issues # Contribution opportunities - /civic_tech/project_funding # Financial support - /civic_tech/hackathon_projects # Hackathon outputs ``` **Citation:** ```bibtex @misc{github_api, author = {{GitHub, Inc.}}, title = {GitHub REST API and GraphQL API}, year = {2024}, url = {https://docs.github.com/en/rest}, note = {API for accessing repository data, issues, contributors, and project metadata} } ``` --- ### Civic Tech Field Guide **Organization:** Compiler LA **What we use:** Curated directory of 1,000+ civic technology projects categorized by issue area and impact. - **Source:** https://civictech.guide/ - **Dataset:** https://airtable.com/shr8yfQ5p3CJGMnCs/tblv0VlP8vVGIBYI6 - **Format:** CSV, Airtable API - **License:** Open Database License (ODbL) **Categories:** - Democracy & Voting - Environment & Climate - Housing & Homelessness - Criminal Justice - Education - Health & Safety - Economic Justice - Infrastructure **Notable Projects Catalogued:** - OpenBudget Oakland (Budget transparency) - Food Oasis (Food access mapping) - Health Equity Tracker (CDC data visualization) - City Scrapers (Meeting minutes automation) - Documenters Network (Public meeting coverage) **Why Civic Tech Field Guide:** - **Taxonomy:** Standardized categorization of civic tech projects - **Discovery:** Find existing tools before building new ones - **Inspiration:** Learn from successful civic tech implementations - **Collaboration:** Connect with project maintainers **Citation:** ```bibtex @misc{civic_tech_field_guide, author = {{Compiler LA}}, title = {Civic Tech Field Guide}, year = {2024}, url = {https://civictech.guide/}, note = {Curated directory of 1,000+ civic technology projects} } ``` --- ### Code for America: Brigade Network **Organization:** Code for America **What we use:** Brigade chapter locations, hackathon events, and civic tech projects built by local volunteer groups. - **Source:** https://brigade.codeforamerica.org/ - **Brigades:** https://brigade.codeforamerica.org/brigades - **Projects:** https://brigade.codeforamerica.org/projects - **License:** Public information, project-specific licenses vary **Brigade Network:** - **80+ active brigades** across the United States - Monthly civic hack nights and community meetups - Annual **National Day of Civic Hacking** - **CodeAcross** weekend hackathons **Notable Brigade Projects:** | Project | Brigade | Impact | |---------|---------|--------| | **OpenBudget Oakland** | Code for Oakland | Budget transparency & visualization | | **Food Oasis** | Hack for LA | Map food resources (300+ locations) | | **Health Equity Tracker** | Code for America | CDC health disparities data | | **BallotNav** | National | Ballot drop-off location finder | | **Documenters** | City Bureau (Chicago) | Public meeting coverage network | **Hackathon Events Tracked:** | Event | Frequency | Focus | |-------|-----------|-------| | **National Day of Civic Hacking** | Annual (June) | Nationwide simultaneous hackathons | | **CodeAcross** | Annual (February) | Local government collaboration | | **Monthly Hack Nights** | Monthly | Ongoing project development | **Brigade Data in Our Platform:** ```python # We track: - /civic_tech/brigade_chapters # 80+ locations with contact info - /civic_tech/hackathons # Events: CodeAcross, NDoCH - /civic_tech/hackathon_projects # Projects built at events - /civic_tech/hackathon_participants # Contributors and attendees ``` **Citation:** ```bibtex @misc{code_for_america_brigade, author = {{Code for America}}, title = {Brigade Network: Volunteer Civic Technology}, year = {2024}, url = {https://brigade.codeforamerica.org/}, note = {80+ local volunteer groups building civic technology} } ``` --- ### U.S. Digital Response (USDR) **Organization:** U.S. Digital Response **What we use:** Emergency civic tech projects and rapid-response open source tools for government needs. - **Source:** https://www.usdigitalresponse.org/ - **Projects:** https://github.com/usdigitalresponse - **License:** Varies by project (mostly MIT, Apache 2.0) **Key Projects:** | Project | Purpose | Tech Stack | |---------|---------|------------| | **grants-ingest** | Federal grant opportunity aggregation | Python, PostgreSQL | | **usdr-gost** | Grant opportunity management system | TypeScript, React | | **cpf-reporter** | Compliance reporting automation | Node.js | **Focus Areas:** - **COVID-19 Response:** Vaccine distribution, testing sites - **Emergency Management:** Disaster response coordination - **Grants & Funding:** Grant opportunity discovery - **Government Modernization:** UI/UX improvements for gov services **Why USDR:** - **Rapid Response:** Builds tools during emergencies - **Open Source:** All code publicly available - **Government Partnership:** Works directly with agencies - **Reusable Tools:** Solutions applicable to multiple jurisdictions **Citation:** ```bibtex @misc{us_digital_response, author = {{U.S. Digital Response}}, title = {Open Source Civic Technology for Emergency Response}, year = {2024}, url = {https://www.usdigitalresponse.org/}, note = {Rapid-response civic tech projects for government needs} } ``` --- ### Digital Public Goods Alliance (DPGA) **Organization:** United Nations Development Programme (UNDP), Norway, Sierra Leone, Germany **What we use:** Registry of 500+ Digital Public Goods (DPGs) certified as open source projects meeting UN Sustainable Development Goals. - **Source:** https://digitalpublicgoods.net/ - **Registry:** https://digitalpublicgoods.net/registry/ - **Standard:** https://digitalpublicgoods.net/standard/ - **License:** CC0 1.0 Universal (registry data) **DPG Standard Requirements:** 1. βœ… **Open License:** OSI-approved, Creative Commons 2. βœ… **Open Source:** Public code repositories 3. βœ… **Documentation:** Clear usage instructions 4. βœ… **Privacy & Security:** Data protection mechanisms 5. βœ… **Standards:** Adheres to relevant standards 6. βœ… **SDG Alignment:** Supports UN Sustainable Development Goals **Notable Digital Public Goods:** | DPG | Category | Impact | |-----|----------|--------| | **OpenStreetMap** | Geographic data | Global collaborative mapping | | **DHIS2** | Health information | Used in 100+ countries | | **Open Food Network** | Food systems | Local food marketplace platform | | **Ushahidi** | Crisis response | Crowdsourced incident reporting | | **Khan Academy** | Education | Free online learning platform | **Why DPGA:** - **Certification:** Vetted open source projects - **SDG Alignment:** Projects tied to development goals - **Sustainability:** Focus on long-term viability - **Global Impact:** International collaboration **Our Use Case:** ```python # We track DPG-certified civic tech projects: - /civic_tech/github_repositories (dpg_certified = true) - /civic_tech/project_metadata (sdg_goals = [...]) ``` **Citation:** ```bibtex @misc{digital_public_goods_alliance, author = {{Digital Public Goods Alliance}}, title = {Digital Public Goods Registry}, year = {2024}, url = {https://digitalpublicgoods.net/}, note = {500+ open source projects certified as Digital Public Goods} } ``` --- ## 🌟 Community Solutions & Use Cases **In this section:** - [Spectrum of Community Engagement to Ownership](#spectrum-of-community-engagement-to-ownership) - [Harvard Ash Center: Data-Smart City Solutions (Archived)](#harvard-ash-center-data-smart-city-solutions-archived) - [Brookings Institution: Data-Driven Policymaking](#brookings-institution-data-driven-policymaking) - [Open Data Impact: Evidence-Based Research](#open-data-impact-evidence-based-research) ### Spectrum of Community Engagement to Ownership **Organization:** Facilitating Power, Rosa GonzΓ‘lez **What we use:** Framework for community-driven governance that maps to our data structure (nonprofits, jurisdictions, grants, officials). - **Source:** https://movementstrategy.org/ - **Framework:** https://movementstrategy.org/b/wp-content/uploads/2021/08/Spectrum-of-Community-Engagement-to-Ownership.pdf - **Article:** "From Community Engagement to Ownership" - **License:** Creative Commons (educational use) **The Spectrum Framework - Four Key Sectors:** | Sector | Maps to Our Data | Community Role | |--------|------------------|----------------| | **Community-Based Organizations** | `/nonprofits` | Grassroots leadership, lived experience | | **City/County Staff** | `/jurisdictions` | Government accountability, service delivery | | **Philanthropic Partners** | `/grants` | Resource allocation, funding equity | | **Facilitative Leaders** | `/officials` | Elected officials, decision makers | **Engagement Levels:** 1. **Inform** β†’ One-way communication 2. **Consult** β†’ Gather input, government decides 3. **Involve** β†’ Work together on solutions 4. **Collaborate** β†’ Shared decision-making 5. **Defer to** β†’ Community-driven governance **Real-World Case Studies:** **Providence, RI: Racial and Environmental Justice Committee** - **Challenge:** Environmental hazards disproportionately affect communities of color - **Our Data:** `/jurisdictions/demographics` + `/nonprofits/environmental_orgs` + `/meetings/public_hearings` - **Outcome:** Moved from "consulting" to "community-driven" - residents now co-chair committee - **Metrics:** Track using `/analytics/metric_views` - meeting attendance, community proposals adopted **Portland, OR: Equity Working Group** - **Challenge:** Budget decisions lacked community input - **Our Data:** `/budgets/city_budgets` + `/nonprofits/advocacy_orgs` + `/officials/city_council` - **Outcome:** Participatory budgeting with community ownership - **Metrics:** Track using `/analytics/dashboard_metrics` - community budget proposals, funding allocated **How Our Platform Supports the Spectrum:** - **Inform:** `/meetings/agendas` + `/documents` for transparency - **Consult:** `/surveys` + `/factchecks` for informed input - **Involve:** `/civic_tech/hackathons` + `/nonprofits/volunteer_activities` - **Collaborate:** `/grants/participatory_budgeting` + `/legislation/co-creation` - **Defer to:** `/analytics/community_impact_metrics` **Citation:** ```bibtex @article{gonzalez_spectrum, author = {GonzΓ‘lez, Rosa}, title = {Spectrum of Community Engagement to Ownership}, organization = {Facilitating Power}, year = {2021}, url = {https://movementstrategy.org/} } ``` --- ### Harvard Ash Center: Data-Smart City Solutions (Archived) **Organization:** Harvard Kennedy School Ash Center for Democratic Governance and Innovation **What we use:** Research on how data engineering impacts community outcomes - informs our `/analytics/metric_views` templates. - **Source:** https://ash.harvard.edu/ - **Note:** Data-Smart City Solutions initiative (archived) - use cases based on historical civic data research - **License:** Educational use **Example Use Cases from Data-Smart Research:** **Use Case 1: Youth Obesity Prevention (Austin, TX)** **Problem:** Childhood obesity rates 30% higher in low-income neighborhoods **Data Integration:** ```python # Our platform combines: - /jurisdictions/demographics # BMI, income, age - /nonprofits (NTEE K30) # Food access programs - /civic_tech/food_oasis # Food desert mapping - /meetings/school_board # Nutrition policy discussions ``` **Outcome:** - Identified 15 "food deserts" lacking fresh produce - Partnered with 8 nonprofits to launch mobile markets - School board approved healthier lunch standards **Metrics We Track:** - **Metric View:** `youth_nutrition_access` - **KPIs:** Fresh food outlets per capita, school lunch quality scores, childhood obesity trends - **Dashboard:** `/analytics/dashboard_metrics/health_equity` --- **Use Case 2: College Readiness (Mesa Public Schools, AZ)** **Problem:** 40% of students off-track for college by 9th grade **Data Integration:** ```python # Our platform combines: - /school_districts/nces_data # Enrollment, demographics - /school_districts/budgets # Per-pupil spending, program funding - /analytics/date_dimension # Time-series tracking - /surveys/student_surveys # Student engagement, aspirations ``` **Outcome:** - Early warning system identifies at-risk students - Targeted interventions (tutoring, mentorship) - College enrollment increased 15% **Metrics We Track:** - **Metric View:** `college_readiness_pipeline` - **KPIs:** On-track percentage, intervention effectiveness, college enrollment rates - **Dashboard:** `/analytics/dashboard_metrics/education_outcomes` --- **Our Use Case Template:** For each community challenge, we provide: 1. **Problem Definition** β†’ What data shows the issue 2. **Data Integration** β†’ Which datasets to combine 3. **Analytics View** β†’ Pre-built metric views 4. **Action Pathways** β†’ Nonprofits, officials, meetings to engage 5. **Success Metrics** β†’ How to measure impact **Citation:** ```bibtex @misc{harvard_datasmart_use_cases, author = {{Harvard Kennedy School Ash Center}}, title = {Data-Smart City Solutions: Civic Data Use Cases}, year = {2016}, note = {Archived civic data research initiative}, url = {https://ash.harvard.edu/} } ``` --- ### Brookings Institution: Data-Driven Policymaking **Organization:** Brookings Institution, Center on Regulation and Markets **What we use:** Data Academy model for turning "Open Data" into "Accessible Data" - validates our `/domains` and `/standards` architecture. - **Source:** https://www.brookings.edu/ - **Article:** "How Citizens and Local Governments Advance Data-Driven Policymaking" - **License:** Public research **The Data Academy Model:** **Case Study: Tempe, AZ** **Workflow:** ``` City Creates Dashboard β†’ Residents Attend Data Academy β†’ Data Informs Policy (/standards) (/meetings/trainings) (/legislation) ``` **Our Platform Support:** | Stage | City Action | Our Data | Resident Outcome | |-------|-------------|----------|------------------| | **1. Publish** | Open data portal | `/standards/schema_org_jsonld` | Machine-readable datasets | | **2. Train** | Data Academy | `/meetings/trainings` | Residents learn SQL, Tableau | | **3. Analyze** | Dashboard access | `/analytics/dashboard_metrics` | Community-driven insights | | **4. Advocate** | Public testimony | `/meetings/public_hearings` | Data-backed proposals | | **5. Legislate** | Policy adoption | `/legislation/local_ordinances` | Evidence-based laws | **Example: Tempe Water Conservation Policy** **Data Stack:** - **Raw Data:** `/jurisdictions/budget_data` - Water department spending - **Standards:** `/standards/ceds_aligned` - Standardized metrics - **Training:** `/meetings/trainings` - "Water Data 101" workshop - **Analytics:** `/analytics/metric_views/water_usage_per_capita` - **Outcome:** `/legislation` - New conservation ordinance passed **Residents Learned:** - How to query public datasets - How to create visualizations - How to present findings to city council --- **Case Study: Norfolk, VA - Flooding Resilience** **Problem:** Sea level rise threatens low-income neighborhoods **Data Integration:** ```python # Our platform combines: - /jurisdictions/demographics # Vulnerable populations - /budgets/city_budgets # Infrastructure spending - /nonprofits (NTEE W) # Environmental advocacy - /meetings/public_hearings # Community testimony - /standards/schema_org_jsonld # GeoJSON flood maps ``` **Data Academy Curriculum:** 1. **Week 1:** Understanding flood risk data 2. **Week 2:** Budget analysis (where does money go?) 3. **Week 3:** Creating data visualizations 4. **Week 4:** Presenting to city council **Outcome:** - 50 residents trained - Community-led flood resilience plan - $10M infrastructure investment in vulnerable areas --- **Why Data Academies Matter:** **Traditional Model (Fails):** - City: "Here's a 500-page PDF budget" - Residents: *Can't understand it, disengage* **Data Academy Model (Works):** - City: "Here's open data + training" - Residents: *Build skills, create analysis, influence policy* **Our Role:** 1. **Standardize Data:** `/standards/popolo_exports` makes data interoperable 2. **Host Training Events:** `/meetings/trainings` tracks Data Academy schedules 3. **Provide Analytics:** `/analytics/metric_views` offers ready-to-use dashboards 4. **Connect Stakeholders:** `/nonprofits` + `/officials` + `/civic_tech` = collaboration **Citation:** ```bibtex @article{brookings_data_driven, author = {{Brookings Institution}}, title = {How Citizens and Local Governments Advance Data-Driven Policymaking}, journal = {Brookings Center on Regulation and Markets}, year = {2023}, url = {https://www.brookings.edu/} } ``` --- ### Open Data Impact: Evidence-Based Research **Organization:** The GovLab at New York University (NYU Tandon School of Engineering) **What we use:** Evidence-based research on open data impact - validates our platform's approach and demonstrates measurable outcomes from open data initiatives. - **Source:** https://odimpact.org/ - **Key Findings Report:** https://odimpact.org/key-findings.html - **Full Report:** https://odimpact.org/files/open-data-impact-key-findings.pdf - **Funded by:** Omidyar Network - **License:** Creative Commons Attribution-ShareAlike 4.0 International License **Research Overview:** **19 Global Case Studies** analyzing what works in open data: - Sectoral and geographic representativeness - First-hand interviews with stakeholders - Measurable, tangible impact analysis - Best practices and enabling conditions **Economic Impact Estimates:** - **McKinsey (2013):** $3 trillion per year global value of open data - **Omidyar Network Study:** $13 trillion over 5 years in G20 nations **Four Main Impact Dimensions:** | Impact Type | Description | Our Platform Support | |-------------|-------------|---------------------| | **Improving Government** | Transparency, accountability, efficiency | `/jurisdictions/budgets` + `/meetings` + `/legislation` | | **Empowering Citizens** | Informed decision-making, participation | `/analytics/dashboards` + `/surveys` + `/factchecks` | | **Creating Opportunity** | Economic innovation, new businesses | `/civic_tech` + `/grants` + `/nonprofits` | | **Solving Public Problems** | Data-driven solutions to complex issues | `/community_solutions` + `/metric_views` | **Enabling Conditions for Success:** **1. Supply-Side (Data Providers):** - **Quality Data:** Accurate, timely, machine-readable - **Our Implementation:** `/standards/schema_org_jsonld`, `/standards/popolo_exports` **2. Demand-Side (Data Users):** - **Capacity Building:** Skills to analyze and use data - **Our Implementation:** `/meetings/trainings` (Data Academies), `/analytics/metric_views` **3. Intermediaries:** - **Data Translators:** Organizations bridging supply and demand - **Our Implementation:** `/civic_tech/brigade_chapters`, `/nonprofits/advocacy_orgs` **4. Ecosystem:** - **Multi-Stakeholder Collaboration:** Government + Civic Tech + Nonprofits - **Our Implementation:** `/community_solutions/stakeholder_mapping` **Key Challenges Identified:** | Challenge | ODI Findings | Our Mitigation Strategy | |-----------|--------------|------------------------| | **Data Quality** | Incomplete, outdated data | Automated ingestion + validation pipelines | | **Technical Capacity** | Users lack skills to analyze | Pre-built dashboards + metric views | | **Sustainability** | Projects depend on grants | Open-source + reusable infrastructure | | **Privacy Risks** | Potential for harm | Anonymization + ethical data standards | **10 Recommendations for Next-Generation Open Data:** 1. **Focus on Demand, Not Just Supply** β†’ We provide ready-to-use analytics 2. **Build User Capacity** β†’ Data Academies tracked in `/meetings/trainings` 3. **Create Data Intermediaries** β†’ Civic tech projects in `/civic_tech` 4. **Ensure Data Quality** β†’ Standards compliance (`/standards`) 5. **Enable Interoperability** β†’ OCD-ID, Popolo, Schema.org integration 6. **Measure Impact** β†’ `/analytics/metric_views` + `/community_solutions/metrics` 7. **Sustain Engagement** β†’ Open-source + HuggingFace hosting 8. **Mitigate Risks** β†’ Privacy-first design, anonymization 9. **Foster Collaboration** β†’ Multi-stakeholder `/community_solutions` 10. **Scale What Works** β†’ Reusable templates + case studies **How We Apply ODI Research:** **Our Platform as Evidence-Based Open Data Infrastructure:** - **Supply:** 90K+ jurisdictions, 3M+ nonprofits, 500K+ meetings β†’ standardized datasets - **Demand:** Pre-built dashboards, metric views, analytics β†’ accessible to non-technical users - **Intermediaries:** Civic tech projects, brigade chapters, nonprofits β†’ data translators - **Ecosystem:** Community solutions framework β†’ multi-stakeholder collaboration **Real-World Validation:** ODI case studies demonstrate that open data works when: 1. βœ… **Data is standardized** β†’ We use OCD-ID, Popolo, Schema.org 2. βœ… **Users have capacity** β†’ We provide training + dashboards 3. βœ… **Intermediaries bridge gaps** β†’ We integrate civic tech projects 4. βœ… **Impact is measured** β†’ We track metrics + outcomes **Example ODI Case Study Applied to Our Platform:** **Chile's Budget Transparency (ODI Case Study):** - **Problem:** Citizens couldn't understand government budgets - **Solution:** Open budget data + visualization tools - **Impact:** Increased public participation in budget process **Our Implementation:** ```python # Replicating Chile's success: - /jurisdictions/budget_data # Open budget data (supply) - /analytics/dashboard_metrics # Budget visualizations (demand) - /meetings/trainings # Data literacy programs (capacity) - /meetings/public_hearings # Public participation (engagement) - /community_solutions/metrics # Budget impact tracking (measurement) ``` **Citation:** ```bibtex @techreport{verhulst_open_data_impact, author = {Verhulst, Stefaan and Young, Andrew}, title = {Open Data Impact: When Demand and Supply Meet - Key Findings of the Open Data Impact Case Studies}, institution = {The GovLab, NYU Tandon School of Engineering}, year = {2016}, url = {https://odimpact.org/key-findings.html}, note = {Supported by Omidyar Network. 19 global case studies.} } ``` **Why This Matters for Our Platform:** Open Data Impact provides **evidence-based validation** that our approach works: - βœ… Combining **supply** (data) + **demand** (analytics) + **capacity** (training) = impact - βœ… Multi-stakeholder collaboration drives success - βœ… Standardization and quality are essential - βœ… Impact must be measured and documented Their research proves: **Open data alone isn't enough. You need the ecosystem we're building.** --- ---\n\n### IATI Standard (International Aid Transparency Initiative)\n\n**Organization:** IATI Secretariat \n**What we use:** International development funding transparency framework - informs grant tracking, nonprofit program outcomes, and cross-sector collaboration metrics.\n\n- **Source:** https://iatistandard.org/\n- **Current Version:** IATI Standard v2.03\n- **Specification:** https://iatistandard.org/en/iati-standard/203/\n- **License:** Open Data Commons Attribution License (ODC-By)\n- **Coverage:** 1,300+ publishers, $1+ trillion in development aid tracked\n- **Used for:** Grant funding transparency, nonprofit program measurement, community solution tracking\n\n**Why IATI in Community Solutions:**\n\nIATI provides a proven framework for **tracking community impact across sectors** - government, nonprofits, foundations, and international partners.\n\n**Citation:**\n```bibtex\n@misc{iati_standard,\n author = {{IATI Secretariat}},\n title = {IATI Standard Version 2.03},\n year = {2018},\n url = {https://iatistandard.org/},\n note = {Open Data Commons Attribution License (ODC-By)}\n}\n```\n\n**Resources:**\n- **Registry:** https://iatiregistry.org/\n- **d-Portal:** https://d-portal.org/\n- **Datastore:** https://iatidatastore.iatistandard.org/ ## οΏ½πŸ™ Acknowledgments We are grateful to the following organizations and individuals: **Academic Institutions:** - Association for Computational Linguistics (ACL) for MeetingBank - Harvard University Mellon Urbanism Lab for LocalView - Cornell University Roper Center for public opinion research - MIT Election Data + Science Lab for election data - University of Pennsylvania Annenberg Center for fact-checking **Civic Tech Community:** - **GroundVue** - Partner organization inspiring community accountability work - **Code for America** - Civic technology movement and brigade network - **City Bureau** - Documenters Network and City Scrapers project - **Council Data Project** - Open-source municipal data infrastructure - **U.S. Digital Response** - Emergency civic technology support - **Civic Tech Field Guide** - Community resource and project directory **Standards Bodies:** - W3C Community Group for Schema.org - Open Civic Data for jurisdiction identifiers (OCDEP 2) - Popolo Project for open government data standards - IATI Secretariat for international aid transparency - U.S. Department of Education for CEDS **Enterprise Tech for Social Good:** - **Microsoft** - Tech for Social Impact (Nonprofit CDM) - **Google** - Data Commons (Knowledge Graph & Civic Data API) - **AWS** - Open Data for Good (Registry best practices) - **Databricks** - Databricks for Good (Unity Catalog, Delta Lake, MLflow, Agent Bricks) - **Snowflake** - Snowflake for Good (Data Marketplace) - **Oracle** - NetSuite Social Impact (Fund accounting models) - **Salesforce** - Salesforce.org (Nonprofit Success Pack) - **Cisco** - Crisis Response (Network resilience) - **IBM** - Science for Social Good (AI use cases) - **Meta** - Data for Good (Population mapping) **Data Platforms & Organizations:** - HuggingFace for dataset hosting - ProPublica for nonprofit financial data (3M+ organizations), congressional voting records, campaign finance data, and healthcare provider information - Open States for legislative data - OHDSI for OMOP Common Data Model (vocabulary system) - Every.org for charity metadata and mission statements - Findhelp.org for local social services directory (400K+ programs) **Government:** - U.S. Census Bureau for demographic data - National Center for Education Statistics (NCES) - IRS for tax-exempt organization data - CISA for .gov domain registry - All municipal governments providing open access to meeting records **Special Thanks:** - All civic technologists building open government tools - Municipal staff maintaining public meeting archives - Journalists and community advocates holding power accountable --- ## πŸ“– How to Cite This Project If you use **Open Navigator** in your research, please cite: ``` Open Navigator GitHub: https://github.com/getcommunityone/open-navigator-for-engagement License: MIT ``` **BibTeX:** ```bibtex @software{open-navigator-2026, title = {Open Navigator}, author = {Community One}, year = {2026}, url = {https://github.com/getcommunityone/open-navigator-for-engagement}, license = {MIT} } ``` --- ## πŸ“ License Compliance This project respects all dataset licenses and terms of use. See [LICENSE](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/LICENSE) for this project's MIT license. For dataset-specific licenses, please refer to the original sources listed above.