jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 2
sidebar_label: Data and Citations
---
# Data and Citations
:::tip[Why This Page Matters]
**All data used in Open Navigator is properly cited and attributed.** This page provides complete citations, licenses, BibTeX references, and links to original sources for academic research, government data, data sharing standards, and more.
**Use this page to:**
- βœ… Cite data sources in your research or publications
- βœ… Understand licensing and usage terms
- βœ… Find original dataset documentation
- βœ… Access API documentation and technical specs
:::
This page documents all data sources, standards, and research contributions used in **Open Navigator**. All datasets and specifications are properly attributed with citations, licenses, and usage notes.
## πŸ“‘ Quick Navigation
<div style={{display: 'grid', gridTemplateColumns: 'repeat(auto-fit, minmax(200px, 1fr))', gap: '15px', margin: '20px 0'}}>
<a href="#-academic-research" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #2196F3'}}>
<strong>πŸŽ“ Academic Research</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>MeetingBank, LocalView, CivicSearch, Datamuse API, Roper Center, CDP, City Scrapers</span>
</a>
<a href="#government-data" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #4CAF50'}}>
<strong>πŸ›οΈ Government Data</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>U.S. Census, IRS, Open States, LegiScan</span>
</a>
<a href="#data-sharing-standards" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #FF9800'}}>
<strong>🌐 Data Sharing Standards</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>OCD-ID, Popolo, Schema.org, CEDS, OMOP CDM</span>
</a>
<a href="#election--advocacy" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #9C27B0'}}>
<strong>πŸ—³οΈ Election & Advocacy</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>Ballotpedia, MIT Election Lab, OpenElections</span>
</a>
<a href="#nonprofit--philanthropy" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #F44336'}}>
<strong>🏒 Nonprofit & Philanthropy</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>IRS EO-BMF (1.9M+ orgs), Google BigQuery (5M+ Form 990s), GivingTuesday Data Lake (5.4M+ raw XMLs), ProPublica (Nonprofits, Congress, Campaign Finance, Vital Signs), Every.org, Findhelp, 211, Microsoft CDM, ARDA, HIFLD, NCS</span>
</a>
<a href="#-fact-checking" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #8BC34A'}}>
<strong>βœ… Fact-Checking</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>Google, PolitiFact, FactCheck.org</span>
</a>
<a href="#-civic-tech--open-source" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #673AB7'}}>
<strong>πŸ’» Civic Tech & Open Source</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>GitHub, Code for America, Hackathons, Microsoft, Google, AWS, Databricks, DPGA</span>
</a>
<a href="#-community-solutions--use-cases" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #FFEB3B'}}>
<strong>🌟 Community Solutions & Use Cases</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>Spectrum of Engagement, Harvard, Brookings, Open Data Impact, IATI</span>
</a>
<a href="#acknowledgments" className="card" style={{textDecoration: 'none', padding: '15px', borderLeft: '4px solid #607D8B'}}>
<strong>πŸ™ Acknowledgments</strong><br/>
<span style={{fontSize: '0.9em', color: '#666'}}>Organizations & individuals</span>
</a>
</div>
---
## πŸŽ“ Academic Research
**In this section:**
- [MeetingBank Dataset](#meetingbank-dataset)
- [LocalView Dataset (Harvard Dataverse)](#localview-dataset-harvard-dataverse)
- [Council Data Project (CDP)](#council-data-project-cdp)
- [City Scrapers / Documenters.org](#city-scrapers--documentersorg)
- [Roper Center for Public Opinion Research](#roper-center-for-public-opinion-research)
- [Harvard Dataverse](#harvard-dataverse)
- [CivicSearch (School Board Meeting Platform)](#civicsearch-school-board-meeting-platform)
- [Datamuse API (Word-Finding Engine)](#datamuse-api-word-finding-engine)
### MeetingBank Dataset
**What we use:** 1,366 city council meetings from 6 U.S. cities with transcripts and summaries for meeting discovery, transcript analysis, and summarization benchmarking.
**Citation:**
> Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu. "MeetingBank: A Benchmark Dataset for Meeting Summarization" In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), July 2023, Toronto, Canada.
**BibTeX:**
```bibtex
@inproceedings{hu-etal-2023-meetingbank,
title = "MeetingBank: A Benchmark Dataset for Meeting Summarization",
author = "Yebowen Hu and Tim Ganter and Hanieh Deilamsalehy and Franck Dernoncourt and Hassan Foroosh and Fei Liu",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)",
month = July,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
}
```
**Resources:**
- [πŸ“„ Paper](https://arxiv.org/abs/2305.17529)
- [πŸ’Ύ Dataset](https://huggingface.co/datasets/huuuyeah/meetingbank)
- [πŸ“¦ Zenodo](https://zenodo.org/record/7989108)
---
### LocalView Dataset (Harvard Dataverse)
**Organization:** Harvard University Mellon Urbanism Lab
**What we use:** 1,000+ municipalities with meeting videos and automated transcripts for large-scale civic data analysis.
- **Website:** https://www.localview.net/
- **Dataverse:** https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NJTBEM
- **GitHub:** https://mellonurbanism.harvard.edu/localview
- **Coverage:** Thousands of U.S. jurisdictions with continuous data collection
- **Data Included:**
- Meeting videos (YouTube URLs)
- Automated transcripts via speech-to-text
- Metadata (meeting dates, agencies, agendas)
- Quality tracking per jurisdiction
- **License:** Research use (Harvard Dataverse)
- **Research-grade:** Designed for large-scale quantitative analysis
**BibTeX:**
```bibtex
@dataset{localview_2024,
author = {{Harvard Mellon Urbanism Lab}},
title = {LocalView: Municipal Meeting Videos and Transcripts},
year = {2024},
publisher = {Harvard Dataverse},
doi = {10.7910/DVN/NJTBEM},
url = {https://www.localview.net/}
}
```
---
### Council Data Project (CDP)
**Organization:** Open-source civic tech collaboration
**What we use:** 20+ cities with complete data pipelines - meeting transcripts, videos, voting records, and legislation tracking.
- **Website:** https://councildataproject.org/
- **GitHub:** https://github.com/CouncilDataProject
- **Coverage:** 20+ major U.S. cities with full infrastructure
- **Data Included:**
- Meeting transcripts (searchable, indexed)
- Video recordings with timestamps
- Voting records and roll calls
- Legislation text and tracking
- Councilmember information
- **Infrastructure:** Complete ETL pipelines for each city
- **License:** Open source (MIT)
- **API:** Per-city deployments (e.g., https://councildataproject.org/seattle)
**BibTeX:**
```bibtex
@software{council_data_project,
title = {Council Data Project},
author = {{Council Data Project Contributors}},
year = {2024},
url = {https://councildataproject.org/},
license = {MIT}
}
```
---
### City Scrapers / Documenters.org
**Organization:** Documenters Network (civic journalism collaboration)
**What we use:** 100-500 validated government agency URLs across 5 major cities for automated meeting discovery.
- **Website:** https://cityscrapers.org/
- **Documenters:** https://www.documenters.org/
- **GitHub:** https://github.com/City-Bureau
- **Coverage:** Chicago, Pittsburgh, Detroit, Cleveland, Los Angeles
- **Data Included:**
- Government agency URLs (start_urls from spider files)
- Granicus video page URLs with YouTube embeds
- Meeting event schemas
- Scraper patterns for common platforms
- **Cities Covered:**
- Chicago City Scrapers
- Pittsburgh City Scrapers
- Detroit Documenters
- Cleveland City Scrapers
- LA Metro Documenters
- **License:** Open source (MIT)
- **Use Case:** Pre-validated URLs for quality meeting discovery
**BibTeX:**
```bibtex
@software{city_scrapers,
title = {City Scrapers},
author = {{City Bureau and Documenters Network}},
year = {2024},
url = {https://cityscrapers.org/},
license = {MIT}
}
```
---
### Roper Center for Public Opinion Research
**Organization:** Cornell University
**What we use:** Scientifically validated survey questions and public opinion baselines for topic definitions and messaging optimization.
- **Source:** https://ropercenter.cornell.edu/
- **iPoll Database:** https://ropercenter.cornell.edu/ipoll/
- **Coverage:** 500,000+ survey questions from 1930s-present, all major polling organizations
- **License:** Free public search (metadata and question wording), full data requires institutional membership
- **Citation:** "Roper Center for Public Opinion Research, Cornell University. iPoll Databank. https://ropercenter.cornell.edu/ipoll/"
---
### Harvard Dataverse
**What we use:** Meeting datasets and civic engagement research.
- **Source:** https://dataverse.harvard.edu/
- **License:** Varies by dataset
- **Coverage:** Academic research datasets on local government, public meetings, civic participation
---
### CivicSearch (School Board Meeting Platform)
**Organization:** Datamuse, Inc.
**What we use:** Aggregated school board meeting transcripts, agendas, and videos for tracking education policy and local governance.
- **Website:** https://schools.civicsearch.org/
- **Platform:** Datamuse-powered civic search interface
- **Coverage:** School districts nationwide with meeting transcripts and videos
- **Data Included:**
- School board meeting transcripts (AI-indexed)
- Meeting agendas and minutes
- Video recordings (when available)
- Searchable text across multiple districts
- Meeting dates and attendance
- **Example:** [Tuscaloosa City Schools](https://schools.civicsearch.org/tuscaloosa-city-alabama)
- **License:** Free public access for search; bulk/API access requires case-by-case approval
- **Use Case:** Education policy tracking, school board decision analysis, parent/community engagement
**Access Tiers:**
- **Public Search:** Free access via web interface
- **Bulk Data/API:** Contact Datamuse for research or civic organization partnerships
- **Commercial Use:** Licensing required for commercial applications
**Data Privacy:**
- Public meeting transcripts are public record
- Datamuse indexing and presentation subject to their site terms
- No user-uploaded data sold to third parties
**Attribution Requirements:**
```
Data source: CivicSearch (Datamuse, Inc.)
https://schools.civicsearch.org/
School board meeting transcripts and agendas
```
**Terms of Service:**
- ❌ **No automated scraping** - Use official API when available
- βœ… **Attribution required** - Link back to CivicSearch for data used
- βœ… **Public record data** - Meeting transcripts are generally public domain
- ⚠️ **Bulk access** - Requires partnership agreement for large-scale data extraction
**BibTeX:**
```bibtex
@misc{civicsearch_datamuse,
author = {{Datamuse, Inc.}},
title = {CivicSearch: School Board Meeting Platform},
year = {2026},
url = {https://schools.civicsearch.org/},
note = {AI-indexed school board meeting transcripts and agendas}
}
```
**Contact for Data Partnerships:**
For bulk data access, API integration, or civic tech collaborations, reach out to Datamuse directly as a "civic technologist" or research organization. There is no standard commercial checkout - partnerships are handled case-by-case.
---
### Datamuse API (Word-Finding Engine)
**Organization:** Datamuse, Inc.
**What we use:** Natural language processing tools for text analysis, word associations, rhyme detection, and semantic search in meeting transcripts and policy documents.
- **API Documentation:** https://www.datamuse.com/api/
- **Developer Site:** https://www.datamuse.com/
- **Use Cases:** Dictionary apps, RhymeZone, word associations, semantic search
- **Coverage:** English language word relationships, definitions, pronunciations, usage frequency
- **License:** Free tier for most applications; paid tier for high-volume commercial use
**API Endpoints:**
- `/words` - Word finding based on constraints (rhymes, similar meaning, etc.)
- `/sug` - Word suggestions for autocomplete
- Query parameters for semantic relationships, phonetic matching, vocabulary
**Pricing Tiers:**
| Tier | Cost | Limits | Use Case |
|------|------|--------|----------|
| **Free** | $0 | 100,000 requests/day | Non-commercial, small commercial apps |
| **Professional** | Contact for pricing | Unlimited + support | High-volume commercial applications |
**Free Tier Details:**
- βœ… **100,000 requests per day** - Generous limit for most applications
- βœ… **Commercial use allowed** - Can use in commercial apps under daily limit
- βœ… **No API key required** - Simple HTTP GET requests
- βœ… **Fast response times** - Optimized for real-time applications
**Paid Tier (High-Volume):**
- Exceeding 100,000 requests/day requires paid tier
- Contact Datamuse for custom pricing and SLA
- Dedicated support and guaranteed uptime
**Attribution Requirements:**
- βœ… **Link to Datamuse:** Required (or strongly requested) for free tier users
- βœ… **Credit in documentation:** Mention "Powered by Datamuse API"
- Example: `<a href="https://www.datamuse.com/">Powered by Datamuse API</a>`
**Restrictions:**
- ❌ **No scraping of web interfaces** - Use official API, not web scraping
- ❌ **Rate limiting enforced** - Exceeding 100K/day will be throttled
- βœ… **Caching allowed** - Can cache results to reduce API calls
**Terms of Service:**
- Free tier subject to daily quota
- No sale of user-uploaded data
- Commercial use allowed within free tier limits
- Bulk/enterprise usage requires paid license
**Example API Call:**
```bash
# Find words that mean "government" and sound like "regime"
curl "https://api.datamuse.com/words?ml=government&sl=regime"
# Find words that rhyme with "policy"
curl "https://api.datamuse.com/words?rel_rhy=policy"
# Word associations for "civic engagement"
curl "https://api.datamuse.com/words?ml=civic+engagement&max=10"
```
**BibTeX:**
```bibtex
@misc{datamuse_api,
author = {{Datamuse, Inc.}},
title = {Datamuse API: Word-Finding Query Engine},
year = {2026},
url = {https://www.datamuse.com/api/},
note = {Free tier: 100,000 requests/day. Commercial use allowed.}
}
```
**Integration Use Cases:**
- **Meeting Transcript Analysis:** Identify policy-related terms and semantic relationships
- **Search Enhancement:** Improve search with synonym expansion and related terms
- **Topic Modeling:** Extract key themes from public comments and testimony
- **Accessibility:** Provide word suggestions for users with cognitive disabilities
- **Multilingual Support:** Word associations for translation assistance
**Datamuse.ai (Separate Product):**
Note: Datamuse.ai is a distinct SaaS product for natural language exploration:
- **Starter:** ~$29/month (100 queries/month)
- **Professional:** ~$99/month (unlimited queries + API access)
- **Free Trial:** Available for testing
This is separate from the word-finding API and has different pricing.
---
## πŸ›οΈ Government Data
**In this section:**
- [U.S. Census Bureau](#us-census-bureau)
- [IRS Tax-Exempt Organization Search (TEOS)](#irs-tax-exempt-organization-search-teos)
- [Open States API](#open-states-api)
- [LegiScan](#legiscan-)
### U.S. Census Bureau
**What we use:** Geographic boundaries, demographic data, population estimates, and economic indicators.
- **Source:** https://www.census.gov/
- **License:** Public Domain (U.S. Government)
- **Datasets:** Census Gazetteer, American Community Survey (ACS), Decennial Census
- **Coverage:** All 50 states, 3,144 counties, 19,000+ incorporated places
---
### IRS Exempt Organizations Business Master File (EO-BMF)
**Organization:** Internal Revenue Service (IRS), U.S. Department of Treasury
**What we use:** **PRIMARY BULK DATA SOURCE** for comprehensive nonprofit data - ALL 1.9M+ U.S. tax-exempt organizations with EIN, NTEE codes, financial data, subsection classification, and geographic location.
- **Source:** https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf
- **Search Tool:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search
- **Bulk Downloads:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads
- **API Documentation:** See [IRS Bulk Data Integration](./irs-bulk-data.md)
- **Coverage:** 1,952,238 organizations (as of April 2026)
- **Churches & Religious Organizations:** 300,000+ (NTEE codes X, X20, X21, X22, X30, X40)
- **Health Organizations:** 80,000+ (NTEE codes E, E20-E99)
- **Human Services:** 200,000+ (NTEE codes P, P20-P99)
- **All Other Categories:** 1.3M+ (Education, Arts, Environment, etc.)
- **Update Frequency:** Monthly
- **License:** Public domain (U.S. government data)
- **Format:** CSV (regional files), convertible to Parquet
- **Record Count:** 1.9M+ total nonprofits across 4 regional files
**Data Fields (28 columns):**
- **Identification:** EIN, organization name, sort name
- **Location:** Street address, city, state, ZIP code
- **Classification:** NTEE code, subsection (501(c)(3), etc.), foundation code
- **Financial:** Asset amount, income amount, revenue amount
- **Status:** Tax-exempt status, deductibility status, ruling date
- **Organization:** Organization code, activity codes, group affiliation
**NTEE Codes for Churches:**
- **X** - Religion Related, Spiritual Development
- **X20** - Christian (churches, ministries)
- **X21** - Protestant
- **X22** - Roman Catholic
- **X30** - Jewish
- **X40** - Islamic
**Use Cases:**
- **Bulk Download:** Get ALL nonprofits in a state (e.g., 26,148 in Alabama vs 25 from ProPublica API)
- **Comprehensive Coverage:** 1,000x more data per request than API methods
- **Offline Analysis:** Download once, query locally forever (cached as Parquet)
- **NTEE Filtering:** Filter by category code (health, education, religion, etc.)
- **Geographic Analysis:** Complete state/city/ZIP coverage for spatial mapping
**BibTeX Citation:**
```bibtex
@misc{irs_eobmf_2026,
title = {Exempt Organizations Business Master File Extract (EO-BMF)},
author = {{Internal Revenue Service}},
year = {2026},
month = {April},
url = {https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf},
note = {Record count: 1,952,238 organizations. Updated monthly.}
}
```
**Integration:**
- **ProPublica API** complements with detailed Form 990 financials and mission statements
- **Every.org** adds human-readable descriptions and cause tags
- **IRS EO-BMF** provides the complete foundation layer with all organizations
**Complements:**
- **ARDA** for congregation characteristics and health ministry programs
- **HIFLD** for geospatial location data
- **National Congregations Study** for social service provision patterns
- **ProPublica API** for detailed financial breakdowns and executive compensation
---
### Open States / Plural Policy ⭐
**Organization:** Plural Policy (formerly Open States Foundation)
**What we use:** State and local legislative information - bulk downloads of bills, votes, legislators, and legislative sessions for all 50 states.
- **Website:** https://openstates.org/
- **API Documentation:** https://openstates.org/api/
- **Bulk Downloads:** https://open.pluralpolicy.com/data/ ⭐ **Recommended approach**
- **Scrapers Repository:** https://github.com/openstates/openstates-scrapers
- **Local Database Setup:** https://docs.openstates.org/contributing/local-database/
- **Code of Conduct:** https://docs.openstates.org/code-of-conduct/
- **Schema Documentation:** https://github.com/openstates/people/blob/master/schema.md
**Coverage:**
- **All 50 states** + DC + Puerto Rico
- **7,300+ state legislators** with committee assignments
- **Millions of bills** with full text, votes, and sponsors
- **Monthly PostgreSQL dumps** (9.8GB+) for complete local analysis
- **Video sources** (YouTube channels, Granicus portals)
**License:**
- **Bulk data:** Public Domain (preferred method)
- **API content:** Varies by state
- **API Key:** Free tier (50,000 requests/month)
**Bulk Data Formats:**
1. **CSV:** Complete legislative sessions per state
- URL: https://data.openstates.org/session/csv/
- Best for: Spreadsheet analysis, quick exploration
2. **JSON:** Bills with full text and metadata
- URL: https://data.openstates.org/session/json/
- Best for: Application integration, detailed parsing
3. **PostgreSQL:** Monthly database dumps
- URL: https://data.openstates.org/postgres/monthly/
- Best for: SQL analysis, local development, complete schema
- Size: 9.8GB+ (complete legislative database)
- No rate limits on bulk downloads
**What We Use:**
- PostgreSQL monthly dumps for local database (see `scripts/bulk_legislative_download.py`)
- CSV/JSON session data for specific state analysis
- Video source discovery (YouTube channels, Granicus portals)
- Legislator contact information and committee assignments
**Potential Contributions:**
Our project could contribute back to the OpenStates ecosystem:
- **Scraper patterns** for video sources and meeting archives
- **Meeting video discovery** to enhance their data
- **Granicus/YouTube integrations** for automated tracking
- We follow their [Code of Conduct](https://docs.openstates.org/code-of-conduct/) for all contributions
**Local Database Setup:**
We use the PostgreSQL dumps following their [local database documentation](https://docs.openstates.org/contributing/local-database/):
```bash
# Download monthly dump
python scripts/bulk_legislative_download.py --postgres --month 2026-04
# Restore to PostgreSQL
./scripts/setup_openstates_db.sh
```
**BibTeX:**
```bibtex
@software{openstates,
title = {Open States},
author = {{Plural Policy}},
year = {2024},
url = {https://openstates.org/},
note = {Comprehensive state legislative data for all 50 U.S. states}
}
```
### LegiScan ⭐
**Organization:** LegiScan
**What we use:** Comprehensive state legislative tracking with bill text, votes, people, and datasets for all 50 states.
- **Website:** https://legiscan.com/
- **API Documentation:** https://legiscan.com/legiscan
- **Dataset Downloads:** https://legiscan.com/datasets
- **People Database:** https://legiscan.com/legiscan/people
- **Bill Search:** https://legiscan.com/
**Coverage:**
- **All 50 states** + DC + U.S. Congress
- **Current and historical legislation** back to 2011
- **Bill text, sponsors, votes, amendments** with full tracking
- **370,000+ legislators** (current and historical)
- **Roll call votes** with individual legislator positions
- **Committee assignments** and hearing schedules
- **Fiscal notes** and impact statements
**Available Datasets:**
1. **National Dataset:** Complete legislative data for all states
- All bills, resolutions, and legislative documents
- Updated daily during legislative sessions
- Includes bill text, sponsors, status tracking
2. **State-Specific Datasets:** Per-state downloads
- Session-specific or multi-year data
- Optimized for state-level analysis
3. **People Dataset:** Legislator information
- Contact details, committee assignments
- District information and party affiliation
- Historical legislator records
4. **Roll Call Dataset:** Voting records
- Individual votes on bills and amendments
- Voting patterns and trends
- Committee and floor votes
**API Access:**
- **Free Tier:** 30,000 requests per month
- **API Key:** Required (free registration)
- **Bulk Downloads:** Available for subscribers
- **Real-time Updates:** Daily synchronization during sessions
**Data Format:**
- **JSON API** for programmatic access
- **CSV/Excel** exports for datasets
- **SQL dumps** available for subscribers
- **RSS feeds** for bill monitoring
**What We Use:**
- Bill text and status for legislative tracking (see `scripts/legislative_tracker.py`)
- Legislator contact information for advocacy features
- Roll call votes for voting pattern analysis
- Dataset downloads for bulk legislative analysis
**Comparison with Open States:**
- **LegiScan:** More detailed bill tracking, commercial support, datasets for download
- **Open States:** Free bulk PostgreSQL dumps, open source scrapers, community-driven
Both are complementary - we use Open States for bulk data and LegiScan for detailed bill tracking and datasets.
**License:**
- API data: Terms of Service apply (https://legiscan.com/legiscan)
- Datasets: Subscription required for bulk downloads
- API Key: Free tier available
**Use Cases:**
- Track legislation by keyword (e.g., "fluoridation", "oral health")
- Monitor bill progress across multiple states
- Analyze legislator voting patterns
- Build advocacy alerts and notifications
- Research legislative trends over time
**BibTeX:**
```bibtex
@misc{legiscan,
title = {LegiScan: State and Federal Legislative Tracking},
author = {{LegiScan}},
year = {2024},
url = {https://legiscan.com/},
note = {Comprehensive legislative data for all 50 U.S. states and Congress}
}
```
**Documentation:** https://legiscan.com/legiscan
**Support:** support@legiscan.com
**Terms:** https://legiscan.com/legiscan
---
## 🌐 Data Sharing Standards
**In this section:**
- [Open Civic Data (OCD) Standards](#open-civic-data-ocd-standards)
- [Popolo Project](#popolo-project)
- [Schema.org](#schemaorg)
- [Common Education Data Standards (CEDS)](#common-education-data-standards-ceds)
- [OMOP Common Data Model (OHDSI)](#omop-common-data-model-ohdsi)
### Open Civic Data (OCD) Standards
**What we use:** Standardized jurisdiction identifiers for cross-platform compatibility.
**Standard:** [OCDEP 2 - Division Identifiers](https://open-civic-data.readthedocs.io/en/latest/proposals/0002.html)
- **Repository:** https://github.com/opencivicdata/ocd-division-ids
- **License:** Open source
- **Format:** `ocd-division/country:us/state:al/place:birmingham`
- **Coverage:** All U.S. jurisdictions (cities, counties, states, school districts)
**Example Implementation:**
```
ocd-division/country:us/state:al # State
ocd-division/country:us/state:al/county:jefferson # County
ocd-division/country:us/state:al/place:birmingham # City
ocd-division/country:us/state:al/school_district:birmingham_city # School District
```
### Popolo Project
**What we use:** International open government data specification for people, organizations, and elected positions.
- **Specification:** https://www.popoloproject.com/
- **GitHub:** https://github.com/popolo-project/popolo-spec
- **License:** Creative Commons Attribution 4.0 International
- **Adoption:** Used by Civic Commons, OpenNorth, mySociety, Sunlight Foundation, and 30+ civic tech organizations worldwide
**Popolo Classes Implemented:**
| Popolo Class | Our Entity | Use Case |
|--------------|------------|----------|
| **Person** | LEADER | Elected officials, appointees |
| **Organization** | ORGANIZATION | Nonprofits, government agencies |
| **Membership** | LEADER ↔ ORGANIZATION | Relationships with roles and terms |
| **Post** | LEADER.position_type | Positions like "Mayor", "Council Member" |
| **VoteEvent** | VOTE | Voting records on motions/bills |
| **Motion** | AGENDA, LEGISLATION | Formal proposals |
| **Area** | JURISDICTION | Geographic/political boundaries |
| **Event** | MEETING | Public meetings with agendas |
<details>
<summary><strong>Popolo Dependencies (15 W3C/IETF Standards)</strong></summary>
| Standard | Prefix | Use Case |
|----------|--------|----------|
| [FOAF](http://xmlns.com/foaf/0.1/) | `foaf` | People, social networks |
| [vCard](https://www.rfc-editor.org/rfc/rfc6350.html) | `vcard` | Contact information (IETF RFC 6350) |
| [Schema.org](https://schema.org/) | `schema` | Structured web data |
| [DCMI Terms](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) | `dcterms` | Metadata, provenance |
| [W3C Organization Ontology](https://www.w3.org/TR/vocab-org/) | `org` | Organizational structures |
| [ISA Location](https://www.w3.org/ns/locn) | `locn` | Addresses, geographic data |
| [GeoNames](http://www.geonames.org/ontology/) | `gn` | Geographic identifiers |
| [SKOS](https://www.w3.org/2004/02/skos/) | `skos` | Taxonomies, classification |
| [BIO](http://purl.org/vocab/bio/0.1/) | `bio` | Life events, relationships |
| [BIBFRAME](https://www.loc.gov/bibframe/) | `bf` | Bibliographic references |
| [W3C Contact](http://www.w3.org/2000/10/swap/pim/contact#) | `con` | Contact utility concepts |
| [NEPOMUK Calendar](http://www.semanticdesktop.org/ontologies/ncal/) | `ncal` | Events, meetings |
| [ISA Person](http://www.w3.org/ns/person) | `person` | Person attributes |
| [RDF Schema](https://www.w3.org/TR/rdf-schema/) | `rdfs` | Semantic web foundation |
| [ODRS](http://schema.theodi.org/odrs) | `odrs` | Data licensing |
</details>
### Schema.org
**Organization:** W3C Community Group (sponsors: Google, Microsoft, Yahoo, Yandex)
**What we use:** SEO-optimized structured data, JSON-LD exports, semantic web compatibility.
- **Source:** https://schema.org/
- **License:** Creative Commons Attribution-ShareAlike License (CC BY-SA 3.0)
- **Coverage:** 800+ types, 1,400+ properties
**Our Schema.org Type Mappings:**
| Our Entity | Schema.org Type | Use Case |
|------------|----------------|----------|
| JURISDICTION | [AdministrativeArea](https://schema.org/AdministrativeArea) | City/county pages |
| MEETING | [Event](https://schema.org/Event) | Google Calendar rich results |
| LEADER | [Person](https://schema.org/Person) + [GovernmentOfficial](https://schema.org/GovernmentOfficial) | Official profiles |
| ORGANIZATION | [Organization](https://schema.org/Organization) + [NGO](https://schema.org/NGO) | Nonprofit listings |
| LEGISLATION | [Legislation](https://schema.org/Legislation) | Bill tracking |
| BALLOT_MEASURE | [Legislation](https://schema.org/Legislation) | Ballot guides |
| VOTE | [VoteAction](https://schema.org/VoteAction) | Voting records |
| FACT_CHECK | [ClaimReview](https://schema.org/ClaimReview) | Google Fact Check Explorer |
| SCHOOL_DISTRICT | [EducationalOrganization](https://schema.org/EducationalOrganization) | School district info |
| VIDEO | [VideoObject](https://schema.org/VideoObject) | YouTube integration |
| DOCUMENT | [DigitalDocument](https://schema.org/DigitalDocument) | Document library |
| CONSTITUENT | [Person](https://schema.org/Person) | Donor/volunteer profiles |
| DONATION | [DonateAction](https://schema.org/DonateAction) | Donation receipts |
| CAMPAIGN | [FundingScheme](https://schema.org/FundingScheme) | Fundraising campaigns |
| PROGRAM_DELIVERY | [Service](https://schema.org/Service) | Program catalog |
**Benefits:**
- βœ… Google Search rich results
- βœ… Voice assistant compatibility (Alexa, Google Assistant)
- βœ… Knowledge Graph integration
- βœ… Cross-platform (Apple, Bing, Yandex)
### Common Education Data Standards (CEDS)
**Organization:** U.S. Department of Education, National Center for Education Statistics (NCES)
**What we use:** School district data modeling, NCES interoperability, education finance tracking.
- **Source:** https://ceds.ed.gov/
- **GitHub:** https://github.com/CEDStandards
- **License:** Public Domain (U.S. Government)
- **Coverage:** 2,300+ data elements, 500+ option sets
**CEDS Alignment:**
| Our Field | CEDS Element ID | CEDS Element Name |
|-----------|----------------|-------------------|
| `nces_id` | 000827 | LEA Identifier (NCES) |
| `district_name` | 000168 | Name of Institution |
| `total_students` | 001475 | Student Count |
| `total_revenue` | 000612 | Total Revenue |
| `per_pupil_spending` | 000613 | Expenditure per Student |
**Benefits:**
- βœ… NCES Common Core of Data (CCD) compatibility
- βœ… F-33 Finance Survey alignment
- βœ… Federal grant reporting (ESSA, Title I, IDEA)
---
### OMOP Common Data Model (OHDSI)
**Organization:** Observational Health Data Sciences and Informatics (OHDSI)
**What we use:** Vocabulary and terminology standardization system - CONCEPT, VOCABULARY, CONCEPT_CLASS, CONCEPT_RELATIONSHIP tables for consistent data classification.
- **Source:** https://ohdsi.github.io/CommonDataModel/
- **GitHub:** https://github.com/OHDSI/CommonDataModel
- **Vocabulary Documentation:** https://ohdsi.github.io/TheBookOfOhdsi/StandardizedVocabularies.html
- **License:** Apache License 2.0
- **Coverage:** Comprehensive vocabulary system for standardizing concepts across domains
**OMOP CDM Tables We Implement:**
| Table | Purpose | Our Use Case |
|-------|---------|-------------|
| **CONCEPT** | Master vocabulary list | Standardized codes for topics, demographics, classifications |
| **VOCABULARY** | Source vocabularies | Track origin of concepts (NTEE, FIPS, Schema.org, etc.) |
| **CONCEPT_CLASS** | Categorization | Group concepts by type (demographic, geographic, topic) |
| **CONCEPT_RELATIONSHIP** | Linkages | Map relationships between concepts (is-a, maps-to, subsumes) |
**Our OMOP-Inspired Vocabularies:**
| Vocabulary ID | Description | Concept Count |
|---------------|-------------|---------------|
| `NTEE` | National Taxonomy of Exempt Entities | 600+ |
| `FIPS` | Federal Information Processing Standards | 90,000+ |
| `Schema.org` | Structured data types | 800+ |
| `Popolo` | Open government data specs | 15+ |
| `OCD-ID` | Open Civic Data identifiers | 22,000+ |
| `CEDS` | Common Education Data Standards | 2,300+ |
| `CENSUS` | U.S. Census categories | 1,000+ |
| `INTERNAL` | Custom platform classifications | 500+ |
**Example Implementation:**
```sql
-- CONCEPT table: standardized demographics
concept_id | concept_name | vocabulary_id | concept_class_id
-----------|---------------------------|---------------|------------------
100001 | Race: White | CENSUS | Demographic
100002 | Race: Black/African Amer. | CENSUS | Demographic
100003 | Hispanic/Latino Ethnicity | CENSUS | Demographic
100004 | Gender: Male | CENSUS | Demographic
-- CONCEPT_RELATIONSHIP: hierarchies
concept_id_1 | concept_id_2 | relationship_id
-------------|--------------|----------------
100002 | 100000 | Is a # Race category
100003 | 100010 | Is a # Ethnicity category
```
**Benefits:**
- βœ… Consistent terminology across all datasets
- βœ… Hierarchical concept relationships
- βœ… Traceable concept provenance (source vocabularies)
- βœ… Industry-standard approach used by healthcare and research institutions
- βœ… Supports multiple classification systems simultaneously
**BibTeX:**
```bibtex
@misc{ohdsi_omop_cdm,
author = {{Observational Health Data Sciences and Informatics (OHDSI)}},
title = {OMOP Common Data Model},
year = {2024},
url = {https://ohdsi.github.io/CommonDataModel/},
license = {Apache-2.0}
}
```
---
## πŸ—³οΈ Election & Advocacy
**In this section:**
- [Ballotpedia](#ballotpedia)
- [MIT Election Data + Science Lab](#mit-election-data--science-lab)
- [OpenElections](#openelections)
### Ballotpedia
**Organization:** Lucy Burns Institute
**What we use:** Ballot measures, referendums, propositions for fluoridation tracking and health policy analysis.
- **Source:** https://ballotpedia.org/
- **API:** https://ballotpedia.org/API-documentation
- **Coverage:** All 50 states, historical measures back to 1990s
- **License:** API access limited at scale (paid tier available)
### MIT Election Data + Science Lab
**Organization:** Massachusetts Institute of Technology
**What we use:** County-level election results for political composition analysis.
- **Source:** https://electionlab.mit.edu/data
- **Repository:** https://github.com/MEDSL/official-returns
- **Coverage:** 1976-present, presidential/congressional/gubernatorial results
- **License:** Free for research and commercial use
### OpenElections
**What we use:** State-by-state certified election results in standardized CSV format.
- **Source:** https://openelections.net/
- **GitHub:** https://github.com/openelections
- **Coverage:** All 50 states (various completion levels), precinct-level data
- **License:** Open source (varies by state)
---
## 🏒 Nonprofit & Philanthropy
**In this section:**
- [IRS Exempt Organizations Business Master File (EO-BMF)](#irs-exempt-organizations-business-master-file-eo-bmf) - **PRIMARY BULK DATA SOURCE (1.9M+ orgs)**
- [Google BigQuery IRS 990 Data](#google-bigquery-irs-990-data) - **RECOMMENDED FOR BULK FORM 990 ENRICHMENT (5M+ filings)**
- [GivingTuesday 990 Data Infrastructure](#givingtuesday-990-data-infrastructure) - **AWS S3 DATA LAKE (5.4M+ raw Form 990 XMLs)**
- [ProPublica Nonprofit Explorer](#propublica-nonprofit-explorer)
- [ProPublica Congress API](#propublica-congress-api)
- [ProPublica Campaign Finance API](#propublica-campaign-finance-api)
- [ProPublica Vital Signs API](#propublica-vital-signs-api)
- [Every.org Charity API](#everyorg-charity-api)
- [Findhelp.org (Aunt Bertha)](#findhelporg-aunt-bertha)
- [211 Regional Directories](#211-regional-directories)
- [Association of Religion Data Archives (ARDA)](#association-of-religion-data-archives-arda)
- [Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship](#homeland-infrastructure-foundation-level-data-hifld-places-of-worship)
- [National Congregations Study (NCS)](#national-congregations-study-ncs)
- [Microsoft Common Data Model for Nonprofits](#microsoft-common-data-model-for-nonprofits)
### IRS Exempt Organizations Business Master File (EO-BMF)
**Organization:** Internal Revenue Service (IRS), U.S. Department of Treasury
**What we use:** **PRIMARY BULK DATA SOURCE** for comprehensive nonprofit data - ALL 1.9M+ U.S. tax-exempt organizations with EIN, NTEE codes, financial data, subsection classification, and geographic location.
- **Source:** https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf
- **Search Tool:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search
- **Bulk Downloads:** https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads
- **API Documentation:** See [IRS Bulk Data Integration](./irs-bulk-data.md)
- **Coverage:** 1,952,238 organizations (as of April 2026)
- **Churches & Religious Organizations:** 300,000+ (NTEE codes X, X20, X21, X22, X30, X40)
- **Health Organizations:** 80,000+ (NTEE codes E, E20-E99)
- **Human Services:** 200,000+ (NTEE codes P, P20-P99)
- **All Other Categories:** 1.3M+ (Education, Arts, Environment, etc.)
- **Update Frequency:** Monthly
- **License:** Public domain (U.S. government data)
- **Format:** CSV (regional files), convertible to Parquet
- **Record Count:** 1.9M+ total nonprofits across 4 regional files
**Data Fields (28 columns):**
- **Identification:** EIN, organization name, sort name
- **Location:** Street address, city, state, ZIP code
- **Classification:** NTEE code, subsection (501(c)(3), etc.), foundation code
- **Financial:** Asset amount, income amount, revenue amount
- **Status:** Tax-exempt status, deductibility status, ruling date
- **Organization:** Organization code, activity codes, group affiliation
**NTEE Codes for Churches:**
- **X** - Religion Related, Spiritual Development
- **X20** - Christian (churches, ministries)
- **X21** - Protestant
- **X22** - Roman Catholic
- **X30** - Jewish
- **X40** - Islamic
**Use Cases:**
- **Bulk Download:** Get ALL nonprofits in a state (e.g., 26,148 in Alabama vs 25 from ProPublica API)
- **Comprehensive Coverage:** 1,000x more data per request than API methods
- **Offline Analysis:** Download once, query locally forever (cached as Parquet)
- **NTEE Filtering:** Filter by category code (health, education, religion, etc.)
- **Geographic Analysis:** Complete state/city/ZIP coverage for spatial mapping
**BibTeX Citation:**
```bibtex
@misc{irs_eobmf_2026,
title = {Exempt Organizations Business Master File Extract (EO-BMF)},
author = {{Internal Revenue Service}},
year = {2026},
month = {April},
url = {https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf},
note = {Record count: 1,952,238 organizations. Updated monthly.}
}
```
**Integration:**
- **ProPublica API** complements with detailed Form 990 financials and mission statements
- **Every.org** adds human-readable descriptions and cause tags
- **IRS EO-BMF** provides the complete foundation layer with all organizations
**Complements:**
- **ARDA** for congregation characteristics and health ministry programs
- **HIFLD** for geospatial location data
- **National Congregations Study** for social service provision patterns
- **ProPublica API** for detailed financial breakdowns and executive compensation
---
### Google BigQuery IRS 990 Data
**Organization:** Google Cloud Platform (IRS data mirrored by Google)
**What we use:** **RECOMMENDED FOR BULK FORM 990 ENRICHMENT** - SQL-queryable IRS Form 990 electronic filings with detailed financial data, mission statements, and program descriptions.
- **Source:** https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990
- **Documentation:** https://cloud.google.com/bigquery/docs/irs-990-dataset
- **BigQuery Dataset:** `bigquery-public-data.irs_990`
- **Coverage:** 5,000,000+ Form 990 electronic filings (2011-present)
- **Update Frequency:** Annually (updated when IRS publishes new data)
- **License:** Public domain (U.S. government data, hosted by Google)
- **Format:** SQL-queryable tables in BigQuery
- **Cost:** Free tier includes 1 TB of queries per month
**Available Tables:**
- **`irs_990.irs_990_2013` - `irs_990.irs_990_2024`** - Individual years (2013-2024)
- **`irs_990.irs_990_ein`** - All filings aggregated by EIN
- **`irs_990.irs_990_pf_2013` - `irs_990.irs_990_pf_2024`** - Private foundation filings (Form 990-PF)
**Data Fields (100+ columns):**
- **Identification:** EIN, organization name, tax year
- **Financials:**
- Total revenue, contributions, program service revenue, investment income
- Total expenses, program expenses, management expenses, fundraising expenses
- Total assets, total liabilities, net assets
- Grants paid, grants received
- **Mission & Programs:**
- Mission description (text field)
- Program service accomplishments (up to 10 programs with descriptions)
- Program service expenses per program
- **Governance:**
- Number of voting members, independent members
- Officer and director compensation
- Key employee information
- **Activities:**
- Legislative activities, political expenditures, lobbying
- Foreign operations, foreign grants
- Website URL
- **Compliance:**
- Public inspection policies
- Conflict of interest policies
- Whistleblower policies
**Key Advantages:**
- **Serverless SQL:** Query 5M+ records without downloading files
- **Mission Extraction:** Get mission statements and program descriptions in bulk
- **Website URLs:** Extract organization websites (not in EO-BMF)
- **Historical Data:** 10+ years of financial trends per organization
- **Scalable:** Process thousands of nonprofits in a single query
- **No API Rate Limits:** Unlike ProPublica's 25-record limit
**Example Use Cases:**
- **Bulk Mission Enrichment:** Extract mission statements for all health nonprofits in Alabama
- **Website Discovery:** Get organization websites for outreach campaigns
- **Financial Trend Analysis:** Track revenue/expense trends over 10 years
- **Program Service Analysis:** Identify nonprofits by specific program keywords
- **Grant Analysis:** Find organizations that award grants vs. receive grants
**Setup Requirements:**
1. Create a Google Cloud project
2. Enable BigQuery API
3. Authenticate:
```bash
# Option A: Application default credentials
gcloud auth application-default login
# Option B: Service account key
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
```
**Example Query (Extract Alabama Health Nonprofits with Missions):**
```sql
SELECT
ein,
organization_name,
tax_year,
mission_description,
website,
total_revenue,
total_expenses,
program_service_expenses
FROM `bigquery-public-data.irs_990.irs_990_2023`
WHERE state = 'AL'
AND mission_description LIKE '%health%'
AND total_revenue > 100000
ORDER BY total_revenue DESC
LIMIT 1000
```
**BibTeX Citation:**
```bibtex
@misc{google_bigquery_irs990,
title = {IRS 990 Dataset},
author = {{Google Cloud Platform} and {Internal Revenue Service}},
year = {2024},
url = {https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990},
note = {BigQuery public dataset: bigquery-public-data.irs\_990. Coverage: 5M+ Form 990 electronic filings (2011-present)}
}
```
**Integration:**
- **IRS EO-BMF** provides the complete organization registry (1.9M+ orgs)
- **Google BigQuery** enriches with mission statements, websites, and detailed financials
- **ProPublica API** adds executive compensation and recent filing details
- **GivingTuesday Data Lake** provides raw XML for custom field extraction
**Complements:**
- See [Form 990 XML Data (GivingTuesday Data Lake)](./form-990-xml.md) for alternative bulk download approach
- See [IRS Bulk Data Integration](./irs-bulk-data.md) for EO-BMF foundation layer
**Cost Estimates:**
- **Free tier:** 1 TB queries/month = ~2-4 million nonprofit records
- **Beyond free tier:** $5 per TB after first 1 TB
- **Example:** Enriching 100,000 nonprofits with missions = ~20 GB = **Free**
---
### GivingTuesday 990 Data Infrastructure
**Organization:** GivingTuesday
**What we use:** Raw Form 990 XML filings from AWS S3 for detailed financial data extraction, custom field parsing, and comprehensive nonprofit analysis.
- **Website:** https://990data.givingtuesday.org/
- **Data Lake:** `s3://gt990datalake-rawdata` (AWS S3, us-east-1 Virginia, Public Access)
- **Console:** https://us-east-1.console.aws.amazon.com/s3/buckets/gt990datalake-rawdata
- **Coverage:** 5.4M+ e-filed Form 990s (2011-present, ~300K new filings/year)
- **Scale:** ~10 TB of raw XML data
- **Update Frequency:** Ongoing (as IRS publishes new e-filings)
- **License:** Public domain (IRS data) + Open source tools
- **Access:** Free, no AWS credentials required (anonymous access via `--no-sign-request`)
- **Format:** XML files (1-2 MB each) + CSV/Parquet indices
**Data Lake Structure:**
```
s3://gt990datalake-rawdata/
β”œβ”€β”€ EfileData/
β”‚ β”œβ”€β”€ XmlFiles/ # Individual 990 XMLs (~5.4M files, ~10 TB)
β”‚ β”‚ └── [OBJECT_ID]_public.xml (e.g., 202233259349300703_public.xml)
β”‚ └── XmlZips/ # ZIP archives (97 files, ~38 GB β†’ ~95 GB uncompressed)
β”‚ └── YYYY_TEOS_XML_*.zip (e.g., 2023_TEOS_XML_01A.zip ~400 MB)
└── Indices/
└── 990xmls/ # CSV indices with metadata
└── index_all_years_efiledata_xmls_created_on_2023-10-29.csv (~925 MB)
```
**Download Strategies:**
| Approach | Best For | Time | Bandwidth | Storage |
|----------|----------|------|-----------|---------|
| **Individual XMLs** | Single state or targeted | ~2 hrs (22K orgs) | 32 GB | 32 GB |
| **ZIP Archives** | All states / nationwide | ~6 hrs total | 38 GB | 95 GB |
**Choose Individual XMLs when:**
- You need data for 1-5 states only
- You want to download only specific EINs
- Storage space is limited
- You want incremental caching
**Choose ZIP Archives when:**
- You need all 50 states
- You're building a comprehensive database
- You have 100+ GB storage
- You want offline access to all filings
**What You Can Extract:**
- **Financials:** Revenue, expenses, assets, liabilities, net income, grants paid/received
- **Programs:** Detailed program descriptions, accomplishments, expenses per program (up to 10)
- **Governance:** Officer compensation, board members, key employees (with names and titles)
- **Activities:** Legislative activities, lobbying expenses, political contributions
- **Mission:** Organization mission statement and activity descriptions
- **Website:** Organization website URLs
- **Grants:** List of grant recipients with amounts (for grantmaking organizations)
- **Custom Fields:** Any field in the IRS Form 990 schema (990, 990-EZ, 990-PF)
**S3 Access Examples:**
**Individual XMLs (for single state or targeted download):**
```bash
# List index files (no credentials needed)
aws s3 ls s3://gt990datalake-rawdata/Indices/990xmls/ --no-sign-request
# Download index (~925 MB)
aws s3 cp s3://gt990datalake-rawdata/Indices/990xmls/index_all_years_efiledata_xmls_created_on_2023-10-29.csv . --no-sign-request
# Download specific XML
aws s3 cp s3://gt990datalake-rawdata/EfileData/XmlFiles/202233259349300703_public.xml . --no-sign-request
# Batch download for single state (using our script)
python scripts/batch_download_990s.py --state MA --health-only --concurrent 1000
```
**ZIP Archives (for all states / nationwide):**
```bash
# Download all 97 ZIPs (~38 GB) to local directory
./scripts/download_990_zips.sh
# Extract all ZIPs to get ~384K XMLs (~95 GB)
./scripts/extract_990_zips.sh
# Build local index for fast lookup
python scripts/build_990_local_index.py
# Now enrich from local files (no network needed!)
python scripts/enrich_all_states_990.py
```
**Python Access:**
```python
import boto3
from botocore import UNSIGNED
from botocore.config import Config
# Configure anonymous S3 client
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# Download individual XML
xml_obj = s3.get_object(
Bucket='gt990datalake-rawdata',
Key='EfileData/XmlFiles/202233259349300703_public.xml'
)
xml_content = xml_obj['Body'].read()
# Download ZIP
zip_obj = s3.get_object(
Bucket='gt990datalake-rawdata',
Key='EfileData/XmlZips/2023_TEOS_XML_01A.zip'
)
zip_content = zip_obj['Body'].read()
```
**Index Schema:**
The CSV index contains: `EIN`, `TaxPeriod`, `ObjectId`, `URL`, `FormType`, `OrganizationName`, `DLN`, `SubmittedOn`
**Key Advantages:**
- **Raw XML Access:** Extract ANY field from Form 990, including custom/rare fields
- **No Query Costs:** Download once, parse locally (unlike BigQuery queries)
- **Offline Processing:** Process on your own infrastructure without rate limits
- **Complete Historical Data:** All e-filed 990s since 2011
- **Batch Downloads:** Download thousands of XMLs in parallel
- **No Authentication:** Public S3 bucket (no AWS account needed)
**Use Cases:**
- **Custom Field Extraction:** Parse fields not available in BigQuery (e.g., specific schedules)
- **Bulk Enrichment:** Download and process thousands of nonprofits locally
- **Offline Analysis:** Build your own database from raw XML
- **Historical Trends:** Analyze 10+ years of financial data
- **Grant Research:** Extract detailed grant recipient lists from Form 990 Schedule I
**BibTeX Citation:**
```bibtex
@misc{givingtuesday990data,
title = {GivingTuesday 990 Data Infrastructure},
author = {{GivingTuesday}},
year = {2023},
url = {https://990data.givingtuesday.org/},
note = {AWS S3 data lake of IRS Form 990 XML filings. Bucket: s3://gt990datalake-rawdata. Coverage: 5.4M+ filings (2011-present)}
}
```
**Integration:**
- **IRS EO-BMF** provides the complete organization registry (1.9M+ orgs)
- **GivingTuesday Data Lake** enriches with raw XML for custom parsing
- **Google BigQuery** offers SQL interface for standard fields
- **ProPublica API** adds web-friendly access for individual lookups
**Complements:**
- See [Form 990 XML Data (GivingTuesday Data Lake)](./form-990-xml.md) for detailed integration guide
- See [Form 990 Enrichment Guide](../guides/form-990-enrichment.md) for usage examples
- See [IRS Bulk Data Integration](./irs-bulk-data.md) for EO-BMF foundation layer
**Attribution:**
When publishing analyses using this data, please cite:
1. GivingTuesday 990 Data Infrastructure: https://990data.givingtuesday.org/
2. Our enrichment tools: https://github.com/getcommunityone/open-navigator-for-engagement
---
### ProPublica Nonprofit Explorer
**Organization:** ProPublica, Inc.
**What we use:** Enhanced financial data and detailed Form 990 filings to complement IRS EO-BMF bulk data.
- **Source:** https://projects.propublica.org/nonprofits/
- **API Documentation:** https://projects.propublica.org/nonprofits/api
- **Coverage:** 3,000,000+ organizations, 10+ years of historical data
- **Data Included:**
- Total revenue, expenses, assets, liabilities
- Executive compensation (top 5 highest paid)
- Program service expenses vs. administrative overhead
- NTEE classification codes (National Taxonomy of Exempt Entities)
- EIN (Employer Identification Number) for verification
- **Rate Limits:** Free, unlimited access (respectful use recommended: ~1 req/sec)
- **API Limitation:** Returns max 25 results per request, no pagination (use IRS EO-BMF for bulk downloads)
- **License:** Free for research and commercial use
**BibTeX:**
```bibtex
@misc{propublica_nonprofits,
author = {{ProPublica}},
title = {Nonprofit Explorer},
year = {2024},
url = {https://projects.propublica.org/nonprofits/},
note = {Accessed: 2024}
}
```
---
### ProPublica Congress API
**Organization:** ProPublica, Inc.
**What we use:** Legislative data including roll-call votes, member information, bills, and congressional activity to link policy decisions to government meetings.
- **Source:** https://projects.propublica.org/api-docs/congress-api/
- **API Documentation:** https://projects.propublica.org/api-docs/congress-api/
- **Coverage:** U.S. Congress data from 102nd Congress (1991) to present
- **Data Included:**
- Roll-call votes by member and bill
- Bill information, status, and amendments
- Member biographical data and voting records
- Committee assignments and leadership
- Congressional statements and floor appearances
- **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/propublica-congress-api)
- **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY`
- **Rate Limits:** 5,000 requests per day
- **License:** Free for non-commercial and commercial use with attribution
**Use Cases:**
- Link local government meetings to federal legislation
- Track how elected officials vote on issues discussed locally
- Correlate campaign contributions with voting patterns
**BibTeX:**
```bibtex
@misc{propublica_congress,
author = {{ProPublica}},
title = {Congress API},
year = {2024},
url = {https://projects.propublica.org/api-docs/congress-api/},
note = {Accessed: 2024}
}
```
---
### Federal Election Commission (FEC) - Bulk Data & OpenFEC API
**Organization:** Federal Election Commission (FEC), U.S. Government
**What we use:** **PRIMARY SOURCE** for campaign finance data - individual contributions, candidate filings, committee data, and political expenditures for comprehensive campaign finance analysis.
- **OpenFEC API:** https://api.open.fec.gov/developers/
- **Bulk Data Portal:** https://www.fec.gov/data/browse-data/?tab=bulk-data
- **Documentation:** https://www.fec.gov/campaign-finance-data/
- **Coverage:** Complete FEC data from 1980s to present (updated nightly)
- **Data Included:**
- **Individual contributions** $200+ (Schedule A)
- **Operating expenditures** (Schedule B)
- **Candidate master files** (House, Senate, Presidential)
- **Committee master files** (PACs, Super PACs, party committees)
- **Campaign finance totals** by election cycle
- **Independent expenditures** and electioneering communications
- **Access Methods:**
- **Bulk Downloads:** Free, unlimited, no API key (CSV and FEC format)
- **OpenFEC API:** Free with API key (1,000 requests/hour)
- **Demo Key:** 30 requests/hour (no registration)
- **API Key:** Free at https://api.data.gov/signup/
- **License:** Public Domain (U.S. Government)
- **Update Frequency:** Nightly (most datasets)
**Use Cases:**
- Map donor networks and political influence patterns
- Link nonprofit leadership donations to policy decisions
- Track campaign finance in health advocacy organizations
- Analyze funding sources for ballot initiatives
- Cross-reference contributions with government grant awards
- "Follow the money" from donor to policy outcome
**Critical Policy Restriction:**
- ⚠️ **Cannot use contributor data for commercial solicitation or fundraising**
- FEC data is for transparency and research, not marketing
**BibTeX:**
```bibtex
@misc{fec_data_2024,
author = {{Federal Election Commission}},
title = {Campaign Finance Data and Bulk Downloads},
year = {2024},
url = {https://www.fec.gov/data/},
note = {Updated nightly. Accessed: 2024}
}
@misc{openfec_api_2024,
author = {{Federal Election Commission}},
title = {OpenFEC API},
year = {2024},
url = {https://api.open.fec.gov/developers/},
note = {RESTful API for campaign finance data. Accessed: 2024}
}
```
**Integration:** `discovery/fec_integration.py`
---
### ProPublica Campaign Finance API
**Organization:** ProPublica, Inc.
**What we use:** Simplified access to FEC data with pre-aggregated summaries and top donor analysis (complements direct FEC data access).
- **Source:** https://projects.propublica.org/api-docs/campaign-finance/
- **API Documentation:** https://projects.propublica.org/api-docs/campaign-finance/
- **Coverage:** FEC data from 2000 election cycle to present
- **Data Included:**
- Candidate financial summaries and filings
- Committee information and contributions
- Individual and organizational donor data
- Independent expenditures and disbursements
- Top donors by industry and geography
- **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/campaign-finance-api)
- **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY`
- **Rate Limits:** 5,000 requests per day
- **License:** Free for non-commercial and commercial use with attribution
**Note:** ProPublica API provides easier-to-use summaries of FEC data. For bulk analysis, use FEC Bulk Downloads directly.
**Use Cases:**
- Quick lookups of candidate finance summaries
- Pre-aggregated top donor analysis
- Industry contribution patterns
- Journalist-friendly data formatting
**BibTeX:**
```bibtex
@misc{propublica_campaign_finance,
author = {{ProPublica}},
title = {Campaign Finance API},
year = {2024},
url = {https://projects.propublica.org/api-docs/campaign-finance/},
note = {Accessed: 2024}
}
```
---
### ProPublica Vital Signs API
**Organization:** ProPublica, Inc.
**What we use:** Healthcare provider data including doctors, facilities, disciplinary actions, and Medicare participation to support oral health policy analysis.
- **Source:** https://projects.propublica.org/vital-signs/
- **API Documentation:** https://projects.propublica.org/api-docs/vital-signs/
- **Coverage:** 1,000,000+ healthcare providers across the United States
- **Data Included:**
- Doctor biographical information and specialties
- Medical school and residency training
- Hospital affiliations and group practices
- State medical board disciplinary actions
- Medicare participation and payments
- Malpractice claims and settlements
- **Access:** **API Key Required** (Free - sign up at https://www.propublica.org/datastore/api/vital-signs-api)
- **Authentication:** Include as HTTP header: `X-API-Key: YOUR_API_KEY`
- **Rate Limits:** 5,000 requests per day
- **License:** Free for non-commercial and commercial use with attribution
**Use Cases:**
- Map dental care access and provider availability
- Link health policy discussions to provider networks
- Identify healthcare deserts and underserved areas
- Track quality metrics for oral health providers
- Correlate public health outcomes with provider density
**BibTeX:**
```bibtex
@misc{propublica_vital_signs,
author = {{ProPublica}},
title = {Vital Signs: Health Care Provider Data},
year = {2024},
url = {https://projects.propublica.org/vital-signs/},
note = {Accessed: 2024}
}
```
---
### Every.org Charity API
**Organization:** Every.org (Public Benefit Corporation)
**What we use:** Human-readable mission statements, organization logos, cause categories, cleaner metadata than raw IRS filings.
- **API Documentation:** https://www.every.org/nonprofit-api
- **Coverage:** 1,000,000+ verified nonprofits
- **Data Included:**
- Mission statements and descriptions
- Organization logos and images
- Cause tags (health, education, environment, etc.)
- Social media links
- **Access:** API key required (free tier available)
- **License:** API Terms of Service
---
### Findhelp.org (Aunt Bertha)
**Organization:** Findhelp (formerly Aunt Bertha)
**What we use:** Comprehensive directory of local social services - specific programs, hours, eligibility requirements, contact information.
- **Source:** https://www.findhelp.org/
- **Coverage:** 400,000+ community programs across the United States
- **Data Included:**
- Program descriptions and services offered
- Days/hours of operation
- Eligibility requirements
- Languages spoken
- Insurance accepted
- Contact information (phone, email, address)
- **Access:** Public search available, API access by request
- **Use Case:** Manual enrichment of ProPublica financial data with service delivery details
**Example:** https://www.findhelp.org/search?query=dental&location=Tuscaloosa,%20AL
---
### 211 Regional Directories
**What we use:** Regional social services directories with detailed program information, crisis hotlines, local resources.
- **Source:** https://www.211.org/ (national network)
- **Example:** https://www.211connects.org (Alabama)
- **Coverage:** Local services in most U.S. cities and counties
- **Data Included:**
- Specific services and programs
- Hours of operation
- Eligibility criteria
- Languages and accessibility
- **Access:** Public search, some regions offer data partnerships
- **License:** Varies by region
---
### Association of Religion Data Archives (ARDA)
**Organization:** Pennsylvania State University
**What we use:** U.S. Congregational Life Survey and denominational data for understanding church characteristics, programs, and community services including health ministries.
- **Source:** https://www.thearda.com/
- **Data Portal:** https://www.thearda.com/data-archive
- **U.S. Congregational Life Survey:** https://www.thearda.com/Archive/Files/Descriptions/USCONGLIFE.asp
- **Coverage:** 300,000+ congregations with detailed program data
- **License:** Free for research and non-commercial use
**Key Datasets:**
| Dataset | Coverage | Variables |
|---------|----------|----------|
| **U.S. Congregations** | All denominations, 50 states | Congregation size, programs, community services |
| **Religious Congregations & Membership Study** | County-level data | Adherents, congregations by denomination |
| **National Congregations Study** | Representative sample of 1,200+ | Worship, programs, social services |
**What We Extract:**
- Congregation size and attendance
- Health ministry programs (dental, medical, mental health)
- Food programs and community meals
- Youth and senior programs
- Community outreach budget
- Social service partnerships
**Example Use Case:**
Identify churches with active health ministries in Tuscaloosa, AL that provide free dental kits, health screenings, or partner with mobile dental units.
**Citation:**
```bibtex
@misc{arda_congregations,
author = {{Association of Religion Data Archives}},
title = {U.S. Congregational Life Survey},
year = {2024},
publisher = {Pennsylvania State University},
url = {https://www.thearda.com/},
note = {Free for research use}
}
```
---
### Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship
**Organization:** U.S. Department of Homeland Security (DHS)
**What we use:** Geospatial database of 350,000+ places of worship for mapping faith-based health service locations and identifying service gaps.
- **Source:** https://hifld-geoplatform.opendata.arcgis.com/
- **Dataset:** https://hifld-geoplatform.opendata.arcgis.com/datasets/places-of-worship
- **Format:** Shapefile, GeoJSON, CSV
- **Coverage:** 350,000+ churches, mosques, synagogues, temples
- **License:** Public Domain (U.S. Government)
**Fields Available:**
- Name of place of worship
- Address (street, city, state, ZIP)
- Latitude/Longitude (precise geolocation)
- Denomination
- Religious tradition
- Facility type
**Use Cases:**
1. **Map faith-based health providers** - Overlay churches with health ministries on city maps
2. **Identify service deserts** - Find areas underserved by both clinics and church programs
3. **Route mobile dental units** - Plan stops at large congregations
4. **Partnership outreach** - Locate churches near schools or clinics
**Citation:**
> "Homeland Infrastructure Foundation-Level Data (HIFLD): Places of Worship. U.S. Department of Homeland Security. https://hifld-geoplatform.opendata.arcgis.com/"
---
### National Congregations Study (NCS)
**Organization:** Duke University
**What we use:** Representative survey of U.S. congregations to understand social service provision, health programs, and civic engagement patterns.
- **Source:** https://sites.duke.edu/ncsweb/
- **Principal Investigator:** Mark Chaves, Duke University Divinity School
- **Coverage:** 1,200+ congregations (representative sample)
- **Waves:** 1998, 2006-07, 2012, 2018-19
- **License:** Free for academic and research use
**Key Findings:**
- **60% of congregations** provide social services (food, housing, health)
- **15% of congregations** have health-related programs
- **Large urban churches** (500+ attendees) more likely to have formal health ministries
- **25% collaborate** with clinics, hospitals, or health departments
**Variables We Use:**
| Variable | Description | Relevance |
|----------|-------------|-----------|
| `HLTHPROG` | Has health-related program | Health ministry presence |
| `FOODPROG` | Operates food program | Nutrition education opportunity |
| `YOUTHPROG` | Youth programs | Reach children for dental education |
| `SENIORPROG` | Senior programs | Medicare enrollment help |
| `PARTNERSHIP` | Partners with nonprofits | Collaboration potential |
**Citation:**
```bibtex
@misc{ncs_2018,
author = {Chaves, Mark and Anderson, Shawna},
title = {National Congregations Study: Cumulative Dataset (1998, 2006-07, 2012, 2018-19)},
year = {2020},
publisher = {Duke University},
url = {https://sites.duke.edu/ncsweb/},
doi = {10.1093/soc/swaa029}
}
```
---
### Microsoft Common Data Model for Nonprofits
**Organization:** Microsoft Corporation
**What we use:** Nonprofit data standardization, constituent relationship management, donor tracking, program outcome measurement.
- **Repository:** https://github.com/microsoft/Nonprofits/tree/master/CommonDataModelforNonprofits
- **ERD Documentation:** [common-data-model-for-nonprofits-erds.pdf](https://github.com/microsoft/Nonprofits/blob/master/CommonDataModelforNonprofits/Documents/common-data-model-for-nonprofits-erds.pdf)
- **License:** MIT License
- **Coverage:** Donor management, fundraising, program delivery, volunteer management, impact measurement
**Microsoft CDM Entities Implemented:**
| Microsoft CDM Entity | Our Entity | Description |
|---------------------|------------|-------------|
| Constituent | CONSTITUENT | Donors, volunteers, members, beneficiaries |
| Donation | DONATION | Financial contributions and in-kind gifts |
| Campaign | CAMPAIGN | Fundraising campaigns and appeals |
| Designation | DESIGNATION | Fund allocation (unrestricted, restricted, endowment) |
| Membership | MEMBERSHIP | Member enrollment and renewals |
| Volunteer Preference | VOLUNTEER_ACTIVITY | Volunteer hours and activities |
| Delivery Framework | PROGRAM_DELIVERY | Programs and services delivered |
| Objective | PROGRAM_OUTCOME | Measurable impact and KPIs |
**Integration Benefits:**
- βœ… Dynamics 365 Nonprofit compatibility
- βœ… Power Platform (Power BI, Power Apps, Power Automate)
- βœ… Azure Synapse analytics
- βœ… Constituent 360 view
---
## βœ… Fact-Checking
**In this section:**
- [Google Fact Check Tools API](#google-fact-check-tools-api)
- [FactCheck.org](#factcheckorg)
- [PolitiFact](#politifact)
### Google Fact Check Tools API
**Organization:** Google LLC
**What we use:** Aggregated fact-checking data for verifying claims from meetings and legislation.
- **Source:** https://toolbox.google.com/factcheck/explorer
- **API:** https://developers.google.com/fact-check/tools/api
- **Schema:** https://developers.google.com/search/docs/appearance/structured-data/factcheck
- **Coverage:** 100+ fact-checking organizations worldwide
- **License:** Free API (10,000 queries/day quota)
### FactCheck.org
**Organization:** Annenberg Public Policy Center, University of Pennsylvania
**What we use:** Nonpartisan fact-checking of political claims and health policy verification.
- **Source:** https://www.factcheck.org/
- **Coverage:** National politics, health claims, science, viral content (2003-present)
- **License:** Free (web scraping allowed with rate limiting)
### PolitiFact
**Organization:** Poynter Institute (Pulitzer Prize-winning)
**What we use:** State-level fact-checking, Truth-O-Meter ratings for ballot measures.
- **Source:** https://www.politifact.com/
- **Coverage:** All 50 states, federal politics (2007-present)
- **Rating Scale:** True, Mostly True, Half True, Mostly False, False, Pants on Fire
- **License:** Free (web scraping allowed with rate limiting)
---
## πŸ’» Civic Tech & Open Source
**In this section:**
- [Cloud & Data Platforms](#cloud--data-platforms)
- [Civic Tech Field Guide](#civic-tech-field-guide)
- [Code for America: Brigade Network](#code-for-america-brigade-network)
- [U.S. Digital Response (USDR)](#us-digital-response-usdr)
- [Digital Public Goods Alliance (DPGA)](#digital-public-goods-alliance-dpga)
### Cloud & Data Platforms
**Organization:** Microsoft Corporation / GitHub, Inc.
**What we use:** GitHub REST and GraphQL APIs for tracking civic tech projects, hackathons, contributors, and open source development.
- **Source:** https://docs.github.com/en/rest
- **GraphQL API:** https://docs.github.com/en/graphql
- **Rate Limits:** 5,000 requests/hour (authenticated)
- **License:** Free (API usage subject to GitHub Terms of Service)
**Data Extracted:**
| Dataset | Description | Fields Tracked |
|---------|-------------|----------------|
| `github_repositories` | Civic tech projects and repos | name, stars, forks, topics, language, license |
| `contributors` | Project maintainers and contributors | login, contributions, role, github_sponsor_enabled |
| `project_issues` | Good first issues, help wanted | labels, state, title, created_at |
| `project_funding` | GitHub Sponsors, OpenCollective | funding_type, sponsor_count, monthly_amount |
| `hackathon_projects` | Projects built at civic hackathons | hackathon_id, project_name, repo_url, demo_url |
**Civic Tech Topics Tracked:**
- `civic-tech`, `open-government`, `government-transparency`
- `public-data`, `open-data`, `civic-engagement`
- `democracy`, `accountability`, `policy-analysis`
**Why GitHub API:**
- **Discovery:** Find civic tech projects and open source tools
- **Collaboration:** Track contributors and maintainers
- **Opportunities:** Surface "good first issue" labels for new contributors
- **Funding:** Identify projects needing financial support
- **Hackathons:** Document projects built at civic hackathon events
**Implementation:**
```python
# Our platform uses:
- /civic_tech/github_repositories # Project metadata
- /civic_tech/contributors # Maintainer info
- /civic_tech/project_issues # Contribution opportunities
- /civic_tech/project_funding # Financial support
- /civic_tech/hackathon_projects # Hackathon outputs
```
**Citation:**
```bibtex
@misc{github_api,
author = {{GitHub, Inc.}},
title = {GitHub REST API and GraphQL API},
year = {2024},
url = {https://docs.github.com/en/rest},
note = {API for accessing repository data, issues, contributors, and project metadata}
}
```
---
### Civic Tech Field Guide
**Organization:** Compiler LA
**What we use:** Curated directory of 1,000+ civic technology projects categorized by issue area and impact.
- **Source:** https://civictech.guide/
- **Dataset:** https://airtable.com/shr8yfQ5p3CJGMnCs/tblv0VlP8vVGIBYI6
- **Format:** CSV, Airtable API
- **License:** Open Database License (ODbL)
**Categories:**
- Democracy & Voting
- Environment & Climate
- Housing & Homelessness
- Criminal Justice
- Education
- Health & Safety
- Economic Justice
- Infrastructure
**Notable Projects Catalogued:**
- OpenBudget Oakland (Budget transparency)
- Food Oasis (Food access mapping)
- Health Equity Tracker (CDC data visualization)
- City Scrapers (Meeting minutes automation)
- Documenters Network (Public meeting coverage)
**Why Civic Tech Field Guide:**
- **Taxonomy:** Standardized categorization of civic tech projects
- **Discovery:** Find existing tools before building new ones
- **Inspiration:** Learn from successful civic tech implementations
- **Collaboration:** Connect with project maintainers
**Citation:**
```bibtex
@misc{civic_tech_field_guide,
author = {{Compiler LA}},
title = {Civic Tech Field Guide},
year = {2024},
url = {https://civictech.guide/},
note = {Curated directory of 1,000+ civic technology projects}
}
```
---
### Code for America: Brigade Network
**Organization:** Code for America
**What we use:** Brigade chapter locations, hackathon events, and civic tech projects built by local volunteer groups.
- **Source:** https://brigade.codeforamerica.org/
- **Brigades:** https://brigade.codeforamerica.org/brigades
- **Projects:** https://brigade.codeforamerica.org/projects
- **License:** Public information, project-specific licenses vary
**Brigade Network:**
- **80+ active brigades** across the United States
- Monthly civic hack nights and community meetups
- Annual **National Day of Civic Hacking**
- **CodeAcross** weekend hackathons
**Notable Brigade Projects:**
| Project | Brigade | Impact |
|---------|---------|--------|
| **OpenBudget Oakland** | Code for Oakland | Budget transparency & visualization |
| **Food Oasis** | Hack for LA | Map food resources (300+ locations) |
| **Health Equity Tracker** | Code for America | CDC health disparities data |
| **BallotNav** | National | Ballot drop-off location finder |
| **Documenters** | City Bureau (Chicago) | Public meeting coverage network |
**Hackathon Events Tracked:**
| Event | Frequency | Focus |
|-------|-----------|-------|
| **National Day of Civic Hacking** | Annual (June) | Nationwide simultaneous hackathons |
| **CodeAcross** | Annual (February) | Local government collaboration |
| **Monthly Hack Nights** | Monthly | Ongoing project development |
**Brigade Data in Our Platform:**
```python
# We track:
- /civic_tech/brigade_chapters # 80+ locations with contact info
- /civic_tech/hackathons # Events: CodeAcross, NDoCH
- /civic_tech/hackathon_projects # Projects built at events
- /civic_tech/hackathon_participants # Contributors and attendees
```
**Citation:**
```bibtex
@misc{code_for_america_brigade,
author = {{Code for America}},
title = {Brigade Network: Volunteer Civic Technology},
year = {2024},
url = {https://brigade.codeforamerica.org/},
note = {80+ local volunteer groups building civic technology}
}
```
---
### U.S. Digital Response (USDR)
**Organization:** U.S. Digital Response
**What we use:** Emergency civic tech projects and rapid-response open source tools for government needs.
- **Source:** https://www.usdigitalresponse.org/
- **Projects:** https://github.com/usdigitalresponse
- **License:** Varies by project (mostly MIT, Apache 2.0)
**Key Projects:**
| Project | Purpose | Tech Stack |
|---------|---------|------------|
| **grants-ingest** | Federal grant opportunity aggregation | Python, PostgreSQL |
| **usdr-gost** | Grant opportunity management system | TypeScript, React |
| **cpf-reporter** | Compliance reporting automation | Node.js |
**Focus Areas:**
- **COVID-19 Response:** Vaccine distribution, testing sites
- **Emergency Management:** Disaster response coordination
- **Grants & Funding:** Grant opportunity discovery
- **Government Modernization:** UI/UX improvements for gov services
**Why USDR:**
- **Rapid Response:** Builds tools during emergencies
- **Open Source:** All code publicly available
- **Government Partnership:** Works directly with agencies
- **Reusable Tools:** Solutions applicable to multiple jurisdictions
**Citation:**
```bibtex
@misc{us_digital_response,
author = {{U.S. Digital Response}},
title = {Open Source Civic Technology for Emergency Response},
year = {2024},
url = {https://www.usdigitalresponse.org/},
note = {Rapid-response civic tech projects for government needs}
}
```
---
### Digital Public Goods Alliance (DPGA)
**Organization:** United Nations Development Programme (UNDP), Norway, Sierra Leone, Germany
**What we use:** Registry of 500+ Digital Public Goods (DPGs) certified as open source projects meeting UN Sustainable Development Goals.
- **Source:** https://digitalpublicgoods.net/
- **Registry:** https://digitalpublicgoods.net/registry/
- **Standard:** https://digitalpublicgoods.net/standard/
- **License:** CC0 1.0 Universal (registry data)
**DPG Standard Requirements:**
1. βœ… **Open License:** OSI-approved, Creative Commons
2. βœ… **Open Source:** Public code repositories
3. βœ… **Documentation:** Clear usage instructions
4. βœ… **Privacy & Security:** Data protection mechanisms
5. βœ… **Standards:** Adheres to relevant standards
6. βœ… **SDG Alignment:** Supports UN Sustainable Development Goals
**Notable Digital Public Goods:**
| DPG | Category | Impact |
|-----|----------|--------|
| **OpenStreetMap** | Geographic data | Global collaborative mapping |
| **DHIS2** | Health information | Used in 100+ countries |
| **Open Food Network** | Food systems | Local food marketplace platform |
| **Ushahidi** | Crisis response | Crowdsourced incident reporting |
| **Khan Academy** | Education | Free online learning platform |
**Why DPGA:**
- **Certification:** Vetted open source projects
- **SDG Alignment:** Projects tied to development goals
- **Sustainability:** Focus on long-term viability
- **Global Impact:** International collaboration
**Our Use Case:**
```python
# We track DPG-certified civic tech projects:
- /civic_tech/github_repositories (dpg_certified = true)
- /civic_tech/project_metadata (sdg_goals = [...])
```
**Citation:**
```bibtex
@misc{digital_public_goods_alliance,
author = {{Digital Public Goods Alliance}},
title = {Digital Public Goods Registry},
year = {2024},
url = {https://digitalpublicgoods.net/},
note = {500+ open source projects certified as Digital Public Goods}
}
```
---
## 🌟 Community Solutions & Use Cases
**In this section:**
- [Spectrum of Community Engagement to Ownership](#spectrum-of-community-engagement-to-ownership)
- [Harvard Ash Center: Data-Smart City Solutions (Archived)](#harvard-ash-center-data-smart-city-solutions-archived)
- [Brookings Institution: Data-Driven Policymaking](#brookings-institution-data-driven-policymaking)
- [Open Data Impact: Evidence-Based Research](#open-data-impact-evidence-based-research)
### Spectrum of Community Engagement to Ownership
**Organization:** Facilitating Power, Rosa GonzΓ‘lez
**What we use:** Framework for community-driven governance that maps to our data structure (nonprofits, jurisdictions, grants, officials).
- **Source:** https://movementstrategy.org/
- **Framework:** https://movementstrategy.org/b/wp-content/uploads/2021/08/Spectrum-of-Community-Engagement-to-Ownership.pdf
- **Article:** "From Community Engagement to Ownership"
- **License:** Creative Commons (educational use)
**The Spectrum Framework - Four Key Sectors:**
| Sector | Maps to Our Data | Community Role |
|--------|------------------|----------------|
| **Community-Based Organizations** | `/nonprofits` | Grassroots leadership, lived experience |
| **City/County Staff** | `/jurisdictions` | Government accountability, service delivery |
| **Philanthropic Partners** | `/grants` | Resource allocation, funding equity |
| **Facilitative Leaders** | `/officials` | Elected officials, decision makers |
**Engagement Levels:**
1. **Inform** β†’ One-way communication
2. **Consult** β†’ Gather input, government decides
3. **Involve** β†’ Work together on solutions
4. **Collaborate** β†’ Shared decision-making
5. **Defer to** β†’ Community-driven governance
**Real-World Case Studies:**
**Providence, RI: Racial and Environmental Justice Committee**
- **Challenge:** Environmental hazards disproportionately affect communities of color
- **Our Data:** `/jurisdictions/demographics` + `/nonprofits/environmental_orgs` + `/meetings/public_hearings`
- **Outcome:** Moved from "consulting" to "community-driven" - residents now co-chair committee
- **Metrics:** Track using `/analytics/metric_views` - meeting attendance, community proposals adopted
**Portland, OR: Equity Working Group**
- **Challenge:** Budget decisions lacked community input
- **Our Data:** `/budgets/city_budgets` + `/nonprofits/advocacy_orgs` + `/officials/city_council`
- **Outcome:** Participatory budgeting with community ownership
- **Metrics:** Track using `/analytics/dashboard_metrics` - community budget proposals, funding allocated
**How Our Platform Supports the Spectrum:**
- **Inform:** `/meetings/agendas` + `/documents` for transparency
- **Consult:** `/surveys` + `/factchecks` for informed input
- **Involve:** `/civic_tech/hackathons` + `/nonprofits/volunteer_activities`
- **Collaborate:** `/grants/participatory_budgeting` + `/legislation/co-creation`
- **Defer to:** `/analytics/community_impact_metrics`
**Citation:**
```bibtex
@article{gonzalez_spectrum,
author = {GonzΓ‘lez, Rosa},
title = {Spectrum of Community Engagement to Ownership},
organization = {Facilitating Power},
year = {2021},
url = {https://movementstrategy.org/}
}
```
---
### Harvard Ash Center: Data-Smart City Solutions (Archived)
**Organization:** Harvard Kennedy School Ash Center for Democratic Governance and Innovation
**What we use:** Research on how data engineering impacts community outcomes - informs our `/analytics/metric_views` templates.
- **Source:** https://ash.harvard.edu/
- **Note:** Data-Smart City Solutions initiative (archived) - use cases based on historical civic data research
- **License:** Educational use
**Example Use Cases from Data-Smart Research:**
**Use Case 1: Youth Obesity Prevention (Austin, TX)**
**Problem:** Childhood obesity rates 30% higher in low-income neighborhoods
**Data Integration:**
```python
# Our platform combines:
- /jurisdictions/demographics # BMI, income, age
- /nonprofits (NTEE K30) # Food access programs
- /civic_tech/food_oasis # Food desert mapping
- /meetings/school_board # Nutrition policy discussions
```
**Outcome:**
- Identified 15 "food deserts" lacking fresh produce
- Partnered with 8 nonprofits to launch mobile markets
- School board approved healthier lunch standards
**Metrics We Track:**
- **Metric View:** `youth_nutrition_access`
- **KPIs:** Fresh food outlets per capita, school lunch quality scores, childhood obesity trends
- **Dashboard:** `/analytics/dashboard_metrics/health_equity`
---
**Use Case 2: College Readiness (Mesa Public Schools, AZ)**
**Problem:** 40% of students off-track for college by 9th grade
**Data Integration:**
```python
# Our platform combines:
- /school_districts/nces_data # Enrollment, demographics
- /school_districts/budgets # Per-pupil spending, program funding
- /analytics/date_dimension # Time-series tracking
- /surveys/student_surveys # Student engagement, aspirations
```
**Outcome:**
- Early warning system identifies at-risk students
- Targeted interventions (tutoring, mentorship)
- College enrollment increased 15%
**Metrics We Track:**
- **Metric View:** `college_readiness_pipeline`
- **KPIs:** On-track percentage, intervention effectiveness, college enrollment rates
- **Dashboard:** `/analytics/dashboard_metrics/education_outcomes`
---
**Our Use Case Template:**
For each community challenge, we provide:
1. **Problem Definition** β†’ What data shows the issue
2. **Data Integration** β†’ Which datasets to combine
3. **Analytics View** β†’ Pre-built metric views
4. **Action Pathways** β†’ Nonprofits, officials, meetings to engage
5. **Success Metrics** β†’ How to measure impact
**Citation:**
```bibtex
@misc{harvard_datasmart_use_cases,
author = {{Harvard Kennedy School Ash Center}},
title = {Data-Smart City Solutions: Civic Data Use Cases},
year = {2016},
note = {Archived civic data research initiative},
url = {https://ash.harvard.edu/}
}
```
---
### Brookings Institution: Data-Driven Policymaking
**Organization:** Brookings Institution, Center on Regulation and Markets
**What we use:** Data Academy model for turning "Open Data" into "Accessible Data" - validates our `/domains` and `/standards` architecture.
- **Source:** https://www.brookings.edu/
- **Article:** "How Citizens and Local Governments Advance Data-Driven Policymaking"
- **License:** Public research
**The Data Academy Model:**
**Case Study: Tempe, AZ**
**Workflow:**
```
City Creates Dashboard β†’ Residents Attend Data Academy β†’ Data Informs Policy
(/standards) (/meetings/trainings) (/legislation)
```
**Our Platform Support:**
| Stage | City Action | Our Data | Resident Outcome |
|-------|-------------|----------|------------------|
| **1. Publish** | Open data portal | `/standards/schema_org_jsonld` | Machine-readable datasets |
| **2. Train** | Data Academy | `/meetings/trainings` | Residents learn SQL, Tableau |
| **3. Analyze** | Dashboard access | `/analytics/dashboard_metrics` | Community-driven insights |
| **4. Advocate** | Public testimony | `/meetings/public_hearings` | Data-backed proposals |
| **5. Legislate** | Policy adoption | `/legislation/local_ordinances` | Evidence-based laws |
**Example: Tempe Water Conservation Policy**
**Data Stack:**
- **Raw Data:** `/jurisdictions/budget_data` - Water department spending
- **Standards:** `/standards/ceds_aligned` - Standardized metrics
- **Training:** `/meetings/trainings` - "Water Data 101" workshop
- **Analytics:** `/analytics/metric_views/water_usage_per_capita`
- **Outcome:** `/legislation` - New conservation ordinance passed
**Residents Learned:**
- How to query public datasets
- How to create visualizations
- How to present findings to city council
---
**Case Study: Norfolk, VA - Flooding Resilience**
**Problem:** Sea level rise threatens low-income neighborhoods
**Data Integration:**
```python
# Our platform combines:
- /jurisdictions/demographics # Vulnerable populations
- /budgets/city_budgets # Infrastructure spending
- /nonprofits (NTEE W) # Environmental advocacy
- /meetings/public_hearings # Community testimony
- /standards/schema_org_jsonld # GeoJSON flood maps
```
**Data Academy Curriculum:**
1. **Week 1:** Understanding flood risk data
2. **Week 2:** Budget analysis (where does money go?)
3. **Week 3:** Creating data visualizations
4. **Week 4:** Presenting to city council
**Outcome:**
- 50 residents trained
- Community-led flood resilience plan
- $10M infrastructure investment in vulnerable areas
---
**Why Data Academies Matter:**
**Traditional Model (Fails):**
- City: "Here's a 500-page PDF budget"
- Residents: *Can't understand it, disengage*
**Data Academy Model (Works):**
- City: "Here's open data + training"
- Residents: *Build skills, create analysis, influence policy*
**Our Role:**
1. **Standardize Data:** `/standards/popolo_exports` makes data interoperable
2. **Host Training Events:** `/meetings/trainings` tracks Data Academy schedules
3. **Provide Analytics:** `/analytics/metric_views` offers ready-to-use dashboards
4. **Connect Stakeholders:** `/nonprofits` + `/officials` + `/civic_tech` = collaboration
**Citation:**
```bibtex
@article{brookings_data_driven,
author = {{Brookings Institution}},
title = {How Citizens and Local Governments Advance Data-Driven Policymaking},
journal = {Brookings Center on Regulation and Markets},
year = {2023},
url = {https://www.brookings.edu/}
}
```
---
### Open Data Impact: Evidence-Based Research
**Organization:** The GovLab at New York University (NYU Tandon School of Engineering)
**What we use:** Evidence-based research on open data impact - validates our platform's approach and demonstrates measurable outcomes from open data initiatives.
- **Source:** https://odimpact.org/
- **Key Findings Report:** https://odimpact.org/key-findings.html
- **Full Report:** https://odimpact.org/files/open-data-impact-key-findings.pdf
- **Funded by:** Omidyar Network
- **License:** Creative Commons Attribution-ShareAlike 4.0 International License
**Research Overview:**
**19 Global Case Studies** analyzing what works in open data:
- Sectoral and geographic representativeness
- First-hand interviews with stakeholders
- Measurable, tangible impact analysis
- Best practices and enabling conditions
**Economic Impact Estimates:**
- **McKinsey (2013):** $3 trillion per year global value of open data
- **Omidyar Network Study:** $13 trillion over 5 years in G20 nations
**Four Main Impact Dimensions:**
| Impact Type | Description | Our Platform Support |
|-------------|-------------|---------------------|
| **Improving Government** | Transparency, accountability, efficiency | `/jurisdictions/budgets` + `/meetings` + `/legislation` |
| **Empowering Citizens** | Informed decision-making, participation | `/analytics/dashboards` + `/surveys` + `/factchecks` |
| **Creating Opportunity** | Economic innovation, new businesses | `/civic_tech` + `/grants` + `/nonprofits` |
| **Solving Public Problems** | Data-driven solutions to complex issues | `/community_solutions` + `/metric_views` |
**Enabling Conditions for Success:**
**1. Supply-Side (Data Providers):**
- **Quality Data:** Accurate, timely, machine-readable
- **Our Implementation:** `/standards/schema_org_jsonld`, `/standards/popolo_exports`
**2. Demand-Side (Data Users):**
- **Capacity Building:** Skills to analyze and use data
- **Our Implementation:** `/meetings/trainings` (Data Academies), `/analytics/metric_views`
**3. Intermediaries:**
- **Data Translators:** Organizations bridging supply and demand
- **Our Implementation:** `/civic_tech/brigade_chapters`, `/nonprofits/advocacy_orgs`
**4. Ecosystem:**
- **Multi-Stakeholder Collaboration:** Government + Civic Tech + Nonprofits
- **Our Implementation:** `/community_solutions/stakeholder_mapping`
**Key Challenges Identified:**
| Challenge | ODI Findings | Our Mitigation Strategy |
|-----------|--------------|------------------------|
| **Data Quality** | Incomplete, outdated data | Automated ingestion + validation pipelines |
| **Technical Capacity** | Users lack skills to analyze | Pre-built dashboards + metric views |
| **Sustainability** | Projects depend on grants | Open-source + reusable infrastructure |
| **Privacy Risks** | Potential for harm | Anonymization + ethical data standards |
**10 Recommendations for Next-Generation Open Data:**
1. **Focus on Demand, Not Just Supply** β†’ We provide ready-to-use analytics
2. **Build User Capacity** β†’ Data Academies tracked in `/meetings/trainings`
3. **Create Data Intermediaries** β†’ Civic tech projects in `/civic_tech`
4. **Ensure Data Quality** β†’ Standards compliance (`/standards`)
5. **Enable Interoperability** β†’ OCD-ID, Popolo, Schema.org integration
6. **Measure Impact** β†’ `/analytics/metric_views` + `/community_solutions/metrics`
7. **Sustain Engagement** β†’ Open-source + HuggingFace hosting
8. **Mitigate Risks** β†’ Privacy-first design, anonymization
9. **Foster Collaboration** β†’ Multi-stakeholder `/community_solutions`
10. **Scale What Works** β†’ Reusable templates + case studies
**How We Apply ODI Research:**
**Our Platform as Evidence-Based Open Data Infrastructure:**
- **Supply:** 90K+ jurisdictions, 3M+ nonprofits, 500K+ meetings β†’ standardized datasets
- **Demand:** Pre-built dashboards, metric views, analytics β†’ accessible to non-technical users
- **Intermediaries:** Civic tech projects, brigade chapters, nonprofits β†’ data translators
- **Ecosystem:** Community solutions framework β†’ multi-stakeholder collaboration
**Real-World Validation:**
ODI case studies demonstrate that open data works when:
1. βœ… **Data is standardized** β†’ We use OCD-ID, Popolo, Schema.org
2. βœ… **Users have capacity** β†’ We provide training + dashboards
3. βœ… **Intermediaries bridge gaps** β†’ We integrate civic tech projects
4. βœ… **Impact is measured** β†’ We track metrics + outcomes
**Example ODI Case Study Applied to Our Platform:**
**Chile's Budget Transparency (ODI Case Study):**
- **Problem:** Citizens couldn't understand government budgets
- **Solution:** Open budget data + visualization tools
- **Impact:** Increased public participation in budget process
**Our Implementation:**
```python
# Replicating Chile's success:
- /jurisdictions/budget_data # Open budget data (supply)
- /analytics/dashboard_metrics # Budget visualizations (demand)
- /meetings/trainings # Data literacy programs (capacity)
- /meetings/public_hearings # Public participation (engagement)
- /community_solutions/metrics # Budget impact tracking (measurement)
```
**Citation:**
```bibtex
@techreport{verhulst_open_data_impact,
author = {Verhulst, Stefaan and Young, Andrew},
title = {Open Data Impact: When Demand and Supply Meet - Key Findings of the Open Data Impact Case Studies},
institution = {The GovLab, NYU Tandon School of Engineering},
year = {2016},
url = {https://odimpact.org/key-findings.html},
note = {Supported by Omidyar Network. 19 global case studies.}
}
```
**Why This Matters for Our Platform:**
Open Data Impact provides **evidence-based validation** that our approach works:
- βœ… Combining **supply** (data) + **demand** (analytics) + **capacity** (training) = impact
- βœ… Multi-stakeholder collaboration drives success
- βœ… Standardization and quality are essential
- βœ… Impact must be measured and documented
Their research proves: **Open data alone isn't enough. You need the ecosystem we're building.**
---
---\n\n### IATI Standard (International Aid Transparency Initiative)\n\n**Organization:** IATI Secretariat \n**What we use:** International development funding transparency framework - informs grant tracking, nonprofit program outcomes, and cross-sector collaboration metrics.\n\n- **Source:** https://iatistandard.org/\n- **Current Version:** IATI Standard v2.03\n- **Specification:** https://iatistandard.org/en/iati-standard/203/\n- **License:** Open Data Commons Attribution License (ODC-By)\n- **Coverage:** 1,300+ publishers, $1+ trillion in development aid tracked\n- **Used for:** Grant funding transparency, nonprofit program measurement, community solution tracking\n\n**Why IATI in Community Solutions:**\n\nIATI provides a proven framework for **tracking community impact across sectors** - government, nonprofits, foundations, and international partners.\n\n**Citation:**\n```bibtex\n@misc{iati_standard,\n author = {{IATI Secretariat}},\n title = {IATI Standard Version 2.03},\n year = {2018},\n url = {https://iatistandard.org/},\n note = {Open Data Commons Attribution License (ODC-By)}\n}\n```\n\n**Resources:**\n- **Registry:** https://iatiregistry.org/\n- **d-Portal:** https://d-portal.org/\n- **Datastore:** https://iatidatastore.iatistandard.org/
## οΏ½πŸ™ Acknowledgments
We are grateful to the following organizations and individuals:
**Academic Institutions:**
- Association for Computational Linguistics (ACL) for MeetingBank
- Harvard University Mellon Urbanism Lab for LocalView
- Cornell University Roper Center for public opinion research
- MIT Election Data + Science Lab for election data
- University of Pennsylvania Annenberg Center for fact-checking
**Civic Tech Community:**
- **GroundVue** - Partner organization inspiring community accountability work
- **Code for America** - Civic technology movement and brigade network
- **City Bureau** - Documenters Network and City Scrapers project
- **Council Data Project** - Open-source municipal data infrastructure
- **U.S. Digital Response** - Emergency civic technology support
- **Civic Tech Field Guide** - Community resource and project directory
**Standards Bodies:**
- W3C Community Group for Schema.org
- Open Civic Data for jurisdiction identifiers (OCDEP 2)
- Popolo Project for open government data standards
- IATI Secretariat for international aid transparency
- U.S. Department of Education for CEDS
**Enterprise Tech for Social Good:**
- **Microsoft** - Tech for Social Impact (Nonprofit CDM)
- **Google** - Data Commons (Knowledge Graph & Civic Data API)
- **AWS** - Open Data for Good (Registry best practices)
- **Databricks** - Databricks for Good (Unity Catalog, Delta Lake, MLflow, Agent Bricks)
- **Snowflake** - Snowflake for Good (Data Marketplace)
- **Oracle** - NetSuite Social Impact (Fund accounting models)
- **Salesforce** - Salesforce.org (Nonprofit Success Pack)
- **Cisco** - Crisis Response (Network resilience)
- **IBM** - Science for Social Good (AI use cases)
- **Meta** - Data for Good (Population mapping)
**Data Platforms & Organizations:**
- HuggingFace for dataset hosting
- ProPublica for nonprofit financial data (3M+ organizations), congressional voting records, campaign finance data, and healthcare provider information
- Open States for legislative data
- OHDSI for OMOP Common Data Model (vocabulary system)
- Every.org for charity metadata and mission statements
- Findhelp.org for local social services directory (400K+ programs)
**Government:**
- U.S. Census Bureau for demographic data
- National Center for Education Statistics (NCES)
- IRS for tax-exempt organization data
- CISA for .gov domain registry
- All municipal governments providing open access to meeting records
**Special Thanks:**
- All civic technologists building open government tools
- Municipal staff maintaining public meeting archives
- Journalists and community advocates holding power accountable
---
## πŸ“– How to Cite This Project
If you use **Open Navigator** in your research, please cite:
```
Open Navigator
GitHub: https://github.com/getcommunityone/open-navigator-for-engagement
License: MIT
```
**BibTeX:**
```bibtex
@software{open-navigator-2026,
title = {Open Navigator},
author = {Community One},
year = {2026},
url = {https://github.com/getcommunityone/open-navigator-for-engagement},
license = {MIT}
}
```
---
## πŸ“ License Compliance
This project respects all dataset licenses and terms of use. See [LICENSE](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/LICENSE) for this project's MIT license.
For dataset-specific licenses, please refer to the original sources listed above.