jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc

NCCS (National Center for Charitable Statistics) Data Scripts

Scripts for downloading and working with nonprofit data from the National Center for Charitable Statistics at the Urban Institute.

Data Source

Scripts

bulk_download_nccs.py

Download all NCCS BMF (Business Master File) datasets with organized directory structure.

Features:

  • Downloads Unified BMF (by state or full file)
  • Downloads Transformed BMF (monthly cleaned data)
  • Downloads Raw BMF archives (unmodified IRS files)
  • Resume interrupted downloads
  • Progress tracking and logging
  • Filter by states or months

Usage:

# Download everything to /mnt/d/nccs_data/
python bulk_download_nccs.py

# Download to custom directory
python bulk_download_nccs.py --base-dir /path/to/directory

# Download only Unified BMF
python bulk_download_nccs.py --dataset unified

# Download specific states only
python bulk_download_nccs.py --dataset unified --states CA,NY,TX,FL

# Download only recent transformed BMF
python bulk_download_nccs.py --dataset transformed --months 2025_12,2026_01

# Skip full unified file (only download state files)
python bulk_download_nccs.py --dataset unified --no-full --states CA,TX

# Resume interrupted download
python bulk_download_nccs.py --resume

# Dry run (show what would be downloaded)
python bulk_download_nccs.py --dry-run

Output Structure:

/mnt/d/nccs_data/
β”œβ”€β”€ unified-bmf/
β”‚   └── v1.2/
β”‚       β”œβ”€β”€ full/
β”‚       β”‚   └── UNIFIED_BMF_V1.2.csv          (All states combined)
β”‚       β”œβ”€β”€ by-state/
β”‚       β”‚   β”œβ”€β”€ AL.csv
β”‚       β”‚   β”œβ”€β”€ CA.csv
β”‚       β”‚   β”œβ”€β”€ NY.csv
        β”‚   └── ...                           (56 files: 50 states + DC + 5 territories)
β”‚       └── data-dictionary/
β”‚           └── harmonized_data_dictionary.xlsx
β”œβ”€β”€ transformed-bmf/
β”‚   β”œβ”€β”€ 2023_06/
β”‚   β”‚   β”œβ”€β”€ bmf_2023_06_processed.csv
β”‚   β”‚   └── bmf_2023_06_data_dictionary.csv
β”‚   β”œβ”€β”€ 2025_12/
β”‚   β”‚   β”œβ”€β”€ bmf_2025_12_processed.csv
β”‚   β”‚   └── bmf_2025_12_data_dictionary.csv
β”‚   └── ...                                   (Monthly from June 2023-Jan 2026)
β”œβ”€β”€ raw-bmf/
β”‚   β”œβ”€β”€ 2023-06-BMF.csv
β”‚   β”œβ”€β”€ 2025-12-BMF.csv
β”‚   └── ...                                   (Monthly from June 2023-Jan 2026)
└── download_log.json

Datasets

Unified BMF (Recommended for Longitudinal Analysis)

What it is:

  • Consolidates all historical BMF releases into a single file
  • One row per organization that has ever held tax-exempt status
  • Enables longitudinal analysis without merging multiple annual files

Key Features:

  • ORG_YEAR_FIRST and ORG_YEAR_LAST variables tracking organizational lifecycle
  • Most recent address geocoded to Census block
  • FIPS codes at block, tract, county, and state levels
  • Metropolitan area codes using current CBSA definitions
  • AI/Lakehouse optimized format

Coverage: 1989 through mid-2025 (update pending)

Use When:

  • You need to track organizations over time
  • Building historical sampling frames
  • Linking nonprofit data to Census geographies
  • Analyzing organizational entry/exit patterns
  • Metropolitan vs rural nonprofit analysis

File Sizes:

  • Full file: ~1.5 GB (all states combined)
  • By state: 0.1 MB (territories) to 149.5 MB (California)
  • Note: 'ZZ' (Unmapped) is not available as a separate file from NCCS

Transformed BMF (Recommended for Current Analysis)

What it is:

  • Monthly IRS releases with standardized cleaning and validation
  • Consistent column names and quality flags
  • Documented transformations

Key Features:

  • Standardized field names
  • Quality flags identifying potential data issues
  • Documentation of all transformations applied
  • Monthly updates

Coverage: June 2023 to present (monthly snapshots)

Use When:

  • You need current BMF data with consistent formatting
  • You want documented quality checks
  • Working with monthly snapshots

File Sizes: ~50-150 MB per month

Raw BMF Archives (For Replication Studies)

What it is:

  • Unmodified monthly BMF files as released by the IRS
  • Original IRS schema and variable names

Coverage: June 2023 to present (monthly snapshots)

Use When:

  • Replicating analysis built on raw IRS files
  • Need data exactly as IRS published it
  • Require specific point-in-time snapshot

File Sizes: ~100-200 MB per month

Key Data Fields

Geographic Fields (Unified BMF)

  • FIPS Codes: Block, Tract, County, State
  • CBSA Codes: Core Based Statistical Area (Metropolitan/Rural)
  • Geocoded Address: Census block level precision

Temporal Fields (Unified BMF)

  • ORG_YEAR_FIRST: When organization first appeared in BMF
  • ORG_YEAR_LAST: When organization last appeared (or current if still active)

Organization Fields

  • EIN: Employer Identification Number (unique ID)
  • NAME: Organization name
  • NTEE_CODE: National Taxonomy of Exempt Entities classification
  • SUBSECTION: IRS subsection (501(c)(3), 501(c)(4), etc.)
  • FINANCIAL_DATA: Revenue, assets, expenses
  • ADDRESS: Street, city, state, ZIP

Census Integration

The Unified BMF is specifically designed for Census data integration:

import pandas as pd

# Load Unified BMF for a state
bmf = pd.read_csv('/mnt/d/nccs_data/unified-bmf/v1.2/by-state/CA.csv')

# Load Census data (example: ACS demographic data)
census = pd.read_csv('census_tract_data.csv')

# Merge on FIPS tract code
merged = bmf.merge(census, left_on='FIPS_TRACT', right_on='GEOID', how='left')

# Now analyze nonprofits by demographic characteristics
analysis = merged.groupby('NTEE_CODE').agg({
    'MEDIAN_INCOME': 'mean',
    'POPULATION': 'sum',
    'EIN': 'count'
})

Related Resources

Attribution

When using NCCS data, please cite:

  1. National Center for Charitable Statistics, Urban Institute
  2. IRS Business Master File (original data source)
  3. Specify the data vintage/update date used

Example Citation:

National Center for Charitable Statistics (2026). Unified Business Master File (BMF), v1.2.
Retrieved from https://nccs.urban.org/. Original data: IRS Exempt Organizations Business Master File.

Support