Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
NCCS (National Center for Charitable Statistics) Data Scripts
Scripts for downloading and working with nonprofit data from the National Center for Charitable Statistics at the Urban Institute.
Data Source
- Organization: National Center for Charitable Statistics (NCCS), Urban Institute
- Website: https://nccs.urban.org/
- Catalog: https://urbaninstitute.github.io/nccs/catalogs/catalog-bmf.html
- Coverage: Tax-exempt organizations (1989-present)
- Data Types: Unified BMF, Transformed BMF, Raw BMF archives
Scripts
bulk_download_nccs.py
Download all NCCS BMF (Business Master File) datasets with organized directory structure.
Features:
- Downloads Unified BMF (by state or full file)
- Downloads Transformed BMF (monthly cleaned data)
- Downloads Raw BMF archives (unmodified IRS files)
- Resume interrupted downloads
- Progress tracking and logging
- Filter by states or months
Usage:
# Download everything to /mnt/d/nccs_data/
python bulk_download_nccs.py
# Download to custom directory
python bulk_download_nccs.py --base-dir /path/to/directory
# Download only Unified BMF
python bulk_download_nccs.py --dataset unified
# Download specific states only
python bulk_download_nccs.py --dataset unified --states CA,NY,TX,FL
# Download only recent transformed BMF
python bulk_download_nccs.py --dataset transformed --months 2025_12,2026_01
# Skip full unified file (only download state files)
python bulk_download_nccs.py --dataset unified --no-full --states CA,TX
# Resume interrupted download
python bulk_download_nccs.py --resume
# Dry run (show what would be downloaded)
python bulk_download_nccs.py --dry-run
Output Structure:
/mnt/d/nccs_data/
βββ unified-bmf/
β βββ v1.2/
β βββ full/
β β βββ UNIFIED_BMF_V1.2.csv (All states combined)
β βββ by-state/
β β βββ AL.csv
β β βββ CA.csv
β β βββ NY.csv
β βββ ... (56 files: 50 states + DC + 5 territories)
β βββ data-dictionary/
β βββ harmonized_data_dictionary.xlsx
βββ transformed-bmf/
β βββ 2023_06/
β β βββ bmf_2023_06_processed.csv
β β βββ bmf_2023_06_data_dictionary.csv
β βββ 2025_12/
β β βββ bmf_2025_12_processed.csv
β β βββ bmf_2025_12_data_dictionary.csv
β βββ ... (Monthly from June 2023-Jan 2026)
βββ raw-bmf/
β βββ 2023-06-BMF.csv
β βββ 2025-12-BMF.csv
β βββ ... (Monthly from June 2023-Jan 2026)
βββ download_log.json
Datasets
Unified BMF (Recommended for Longitudinal Analysis)
What it is:
- Consolidates all historical BMF releases into a single file
- One row per organization that has ever held tax-exempt status
- Enables longitudinal analysis without merging multiple annual files
Key Features:
ORG_YEAR_FIRSTandORG_YEAR_LASTvariables tracking organizational lifecycle- Most recent address geocoded to Census block
- FIPS codes at block, tract, county, and state levels
- Metropolitan area codes using current CBSA definitions
- AI/Lakehouse optimized format
Coverage: 1989 through mid-2025 (update pending)
Use When:
- You need to track organizations over time
- Building historical sampling frames
- Linking nonprofit data to Census geographies
- Analyzing organizational entry/exit patterns
- Metropolitan vs rural nonprofit analysis
File Sizes:
- Full file: ~1.5 GB (all states combined)
- By state: 0.1 MB (territories) to 149.5 MB (California)
- Note: 'ZZ' (Unmapped) is not available as a separate file from NCCS
Transformed BMF (Recommended for Current Analysis)
What it is:
- Monthly IRS releases with standardized cleaning and validation
- Consistent column names and quality flags
- Documented transformations
Key Features:
- Standardized field names
- Quality flags identifying potential data issues
- Documentation of all transformations applied
- Monthly updates
Coverage: June 2023 to present (monthly snapshots)
Use When:
- You need current BMF data with consistent formatting
- You want documented quality checks
- Working with monthly snapshots
File Sizes: ~50-150 MB per month
Raw BMF Archives (For Replication Studies)
What it is:
- Unmodified monthly BMF files as released by the IRS
- Original IRS schema and variable names
Coverage: June 2023 to present (monthly snapshots)
Use When:
- Replicating analysis built on raw IRS files
- Need data exactly as IRS published it
- Require specific point-in-time snapshot
File Sizes: ~100-200 MB per month
Key Data Fields
Geographic Fields (Unified BMF)
- FIPS Codes: Block, Tract, County, State
- CBSA Codes: Core Based Statistical Area (Metropolitan/Rural)
- Geocoded Address: Census block level precision
Temporal Fields (Unified BMF)
- ORG_YEAR_FIRST: When organization first appeared in BMF
- ORG_YEAR_LAST: When organization last appeared (or current if still active)
Organization Fields
- EIN: Employer Identification Number (unique ID)
- NAME: Organization name
- NTEE_CODE: National Taxonomy of Exempt Entities classification
- SUBSECTION: IRS subsection (501(c)(3), 501(c)(4), etc.)
- FINANCIAL_DATA: Revenue, assets, expenses
- ADDRESS: Street, city, state, ZIP
Census Integration
The Unified BMF is specifically designed for Census data integration:
import pandas as pd
# Load Unified BMF for a state
bmf = pd.read_csv('/mnt/d/nccs_data/unified-bmf/v1.2/by-state/CA.csv')
# Load Census data (example: ACS demographic data)
census = pd.read_csv('census_tract_data.csv')
# Merge on FIPS tract code
merged = bmf.merge(census, left_on='FIPS_TRACT', right_on='GEOID', how='left')
# Now analyze nonprofits by demographic characteristics
analysis = merged.groupby('NTEE_CODE').agg({
'MEDIAN_INCOME': 'mean',
'POPULATION': 'sum',
'EIN': 'count'
})
Related Resources
- NCCS Data Archive: https://nccs.urban.org/nccs-data-archive
- NCCS Census Crosswalk: For aggregating to additional geographic levels
- BMF Processing Guide: https://urbaninstitute.github.io/nccs-data-bmf/
- Source Code: https://github.com/UrbanInstitute/nccs-data-bmf
- IRS Data Dictionary: https://www.irs.gov/pub/irs-soi/eo-info.pdf
Attribution
When using NCCS data, please cite:
- National Center for Charitable Statistics, Urban Institute
- IRS Business Master File (original data source)
- Specify the data vintage/update date used
Example Citation:
National Center for Charitable Statistics (2026). Unified Business Master File (BMF), v1.2.
Retrieved from https://nccs.urban.org/. Original data: IRS Exempt Organizations Business Master File.
Support
- NCCS Contact: https://nccs.urban.org/nccs/contact/
- Documentation Issues: https://github.com/UrbanInstitute/nccs-data-bmf/issues