Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 7,155 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | # NCCS (National Center for Charitable Statistics) Data Scripts
Scripts for downloading and working with nonprofit data from the National Center for Charitable Statistics at the Urban Institute.
## Data Source
- **Organization**: National Center for Charitable Statistics (NCCS), Urban Institute
- **Website**: https://nccs.urban.org/
- **Catalog**: https://urbaninstitute.github.io/nccs/catalogs/catalog-bmf.html
- **Coverage**: Tax-exempt organizations (1989-present)
- **Data Types**: Unified BMF, Transformed BMF, Raw BMF archives
## Scripts
### `bulk_download_nccs.py`
Download all NCCS BMF (Business Master File) datasets with organized directory structure.
**Features:**
- Downloads Unified BMF (by state or full file)
- Downloads Transformed BMF (monthly cleaned data)
- Downloads Raw BMF archives (unmodified IRS files)
- Resume interrupted downloads
- Progress tracking and logging
- Filter by states or months
**Usage:**
```bash
# Download everything to /mnt/d/nccs_data/
python bulk_download_nccs.py
# Download to custom directory
python bulk_download_nccs.py --base-dir /path/to/directory
# Download only Unified BMF
python bulk_download_nccs.py --dataset unified
# Download specific states only
python bulk_download_nccs.py --dataset unified --states CA,NY,TX,FL
# Download only recent transformed BMF
python bulk_download_nccs.py --dataset transformed --months 2025_12,2026_01
# Skip full unified file (only download state files)
python bulk_download_nccs.py --dataset unified --no-full --states CA,TX
# Resume interrupted download
python bulk_download_nccs.py --resume
# Dry run (show what would be downloaded)
python bulk_download_nccs.py --dry-run
```
**Output Structure:**
```
/mnt/d/nccs_data/
βββ unified-bmf/
β βββ v1.2/
β βββ full/
β β βββ UNIFIED_BMF_V1.2.csv (All states combined)
β βββ by-state/
β β βββ AL.csv
β β βββ CA.csv
β β βββ NY.csv
β βββ ... (56 files: 50 states + DC + 5 territories)
β βββ data-dictionary/
β βββ harmonized_data_dictionary.xlsx
βββ transformed-bmf/
β βββ 2023_06/
β β βββ bmf_2023_06_processed.csv
β β βββ bmf_2023_06_data_dictionary.csv
β βββ 2025_12/
β β βββ bmf_2025_12_processed.csv
β β βββ bmf_2025_12_data_dictionary.csv
β βββ ... (Monthly from June 2023-Jan 2026)
βββ raw-bmf/
β βββ 2023-06-BMF.csv
β βββ 2025-12-BMF.csv
β βββ ... (Monthly from June 2023-Jan 2026)
βββ download_log.json
```
## Datasets
### Unified BMF (Recommended for Longitudinal Analysis)
**What it is:**
- Consolidates all historical BMF releases into a single file
- One row per organization that has ever held tax-exempt status
- Enables longitudinal analysis without merging multiple annual files
**Key Features:**
- `ORG_YEAR_FIRST` and `ORG_YEAR_LAST` variables tracking organizational lifecycle
- Most recent address geocoded to Census block
- FIPS codes at block, tract, county, and state levels
- Metropolitan area codes using current CBSA definitions
- AI/Lakehouse optimized format
**Coverage:** 1989 through mid-2025 (update pending)
**Use When:**
- You need to track organizations over time
- Building historical sampling frames
- Linking nonprofit data to Census geographies
- Analyzing organizational entry/exit patterns
- Metropolitan vs rural nonprofit analysis
**File Sizes:**
- Full file: ~1.5 GB (all states combined)
- By state: 0.1 MB (territories) to 149.5 MB (California)
- **Note:** 'ZZ' (Unmapped) is not available as a separate file from NCCS
### Transformed BMF (Recommended for Current Analysis)
**What it is:**
- Monthly IRS releases with standardized cleaning and validation
- Consistent column names and quality flags
- Documented transformations
**Key Features:**
- Standardized field names
- Quality flags identifying potential data issues
- Documentation of all transformations applied
- Monthly updates
**Coverage:** June 2023 to present (monthly snapshots)
**Use When:**
- You need current BMF data with consistent formatting
- You want documented quality checks
- Working with monthly snapshots
**File Sizes:** ~50-150 MB per month
### Raw BMF Archives (For Replication Studies)
**What it is:**
- Unmodified monthly BMF files as released by the IRS
- Original IRS schema and variable names
**Coverage:** June 2023 to present (monthly snapshots)
**Use When:**
- Replicating analysis built on raw IRS files
- Need data exactly as IRS published it
- Require specific point-in-time snapshot
**File Sizes:** ~100-200 MB per month
## Key Data Fields
### Geographic Fields (Unified BMF)
- **FIPS Codes**: Block, Tract, County, State
- **CBSA Codes**: Core Based Statistical Area (Metropolitan/Rural)
- **Geocoded Address**: Census block level precision
### Temporal Fields (Unified BMF)
- **ORG_YEAR_FIRST**: When organization first appeared in BMF
- **ORG_YEAR_LAST**: When organization last appeared (or current if still active)
### Organization Fields
- **EIN**: Employer Identification Number (unique ID)
- **NAME**: Organization name
- **NTEE_CODE**: National Taxonomy of Exempt Entities classification
- **SUBSECTION**: IRS subsection (501(c)(3), 501(c)(4), etc.)
- **FINANCIAL_DATA**: Revenue, assets, expenses
- **ADDRESS**: Street, city, state, ZIP
## Census Integration
The Unified BMF is specifically designed for Census data integration:
```python
import pandas as pd
# Load Unified BMF for a state
bmf = pd.read_csv('/mnt/d/nccs_data/unified-bmf/v1.2/by-state/CA.csv')
# Load Census data (example: ACS demographic data)
census = pd.read_csv('census_tract_data.csv')
# Merge on FIPS tract code
merged = bmf.merge(census, left_on='FIPS_TRACT', right_on='GEOID', how='left')
# Now analyze nonprofits by demographic characteristics
analysis = merged.groupby('NTEE_CODE').agg({
'MEDIAN_INCOME': 'mean',
'POPULATION': 'sum',
'EIN': 'count'
})
```
## Related Resources
- **NCCS Data Archive**: https://nccs.urban.org/nccs-data-archive
- **NCCS Census Crosswalk**: For aggregating to additional geographic levels
- **BMF Processing Guide**: https://urbaninstitute.github.io/nccs-data-bmf/
- **Source Code**: https://github.com/UrbanInstitute/nccs-data-bmf
- **IRS Data Dictionary**: https://www.irs.gov/pub/irs-soi/eo-info.pdf
## Attribution
When using NCCS data, please cite:
1. **National Center for Charitable Statistics**, Urban Institute
2. **IRS Business Master File** (original data source)
3. Specify the **data vintage/update date** used
**Example Citation:**
```
National Center for Charitable Statistics (2026). Unified Business Master File (BMF), v1.2.
Retrieved from https://nccs.urban.org/. Original data: IRS Exempt Organizations Business Master File.
```
## Support
- **NCCS Contact**: https://nccs.urban.org/nccs/contact/
- **Documentation Issues**: https://github.com/UrbanInstitute/nccs-data-bmf/issues
|