open-navigator / discovery /README_NONPROFIT_DISCOVERY.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc

Nonprofit Discovery Module

Automated discovery and enrichment of nonprofits and churches using 100% FREE open data APIs.

Why This Matters

When government says "no" to a policy (e.g., "We can't do dental screenings - legal risk"), you can instantly show citizens the nonprofits already doing it. This:

  1. Bypasses the technocratic veto - Shows direct alternatives
  2. Creates social pressure - Exposes inefficiency ("$5K legal review vs $25 screening")
  3. Mobilizes citizens - Provides volunteer/donation pathways

Data Sources (All Free)

1. ProPublica Nonprofit Explorer API ⭐ PRIMARY SOURCE

What it provides:

  • Financial data (revenue, expenses, assets) from IRS Form 990
  • NTEE codes (standardized classification)
  • EIN (tax ID) for verification
  • 3+ million organizations, 10+ years of data

Coverage: All nonprofits with >$50K revenue or >$250K assets

API Docs: https://projects.propublica.org/nonprofits/api

Example Usage:

from discovery.nonprofit_discovery import NonprofitDiscovery

discovery = NonprofitDiscovery()

# Search all health organizations in Tuscaloosa
health_orgs = discovery.search_propublica(
    state="AL",
    city="Tuscaloosa",
    ntee_code="E"  # E = Health
)

# Get detailed financials for specific org
details = discovery.get_propublica_org_details("63-0123456")
print(f"Revenue: ${details['filings'][0]['total_revenue']:,}")

Rate Limits: Free, unlimited. Be respectful: ~1 request/second suggested.


2. IRS Tax-Exempt Organization Search (TEOS)

What it provides:

  • Official tax-exempt status
  • Pub 78 verification (deductibility)
  • Bulk download of all U.S. nonprofits

Source: https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads

Note: ProPublica API already includes this data, so direct IRS access only needed for bulk downloads.


3. Every.org Charity API

What it provides:

  • Human-readable mission statements
  • Organization logos and images
  • Cause categories
  • Cleaner data than raw IRS filings

API Docs: https://www.every.org/nonprofit-api

Note: May require API key for full access. Free tier available.

Example Usage:

# Search by location and cause
orgs = discovery.search_everyorg(
    location="Tuscaloosa, AL",
    causes=["health", "education", "youth"]
)

4. Local Service Directories (Manual Enrichment)

Findhelp.org (Aunt Bertha):

211 Alabama:

  • Regional social services directory
  • More detailed than IRS data (days/hours, languages, insurance)
  • Search: https://www.211connects.org

Strategy: Use ProPublica for financial backbone, then manually enrich with Findhelp/211 for specific service details.


NTEE Code Classification

NTEE = National Taxonomy of Exempt Entities (IRS classification system)

Key Codes for Oral Health Policy

Code Category Description Example Orgs
E Health General and rehabilitative health Community health centers
E20 Hospitals Primary medical care facilities County hospitals
E32 School Health School-based health care Mobile dental clinics in schools
E40 Health General Community clinics Free clinics
E80 Health Other Health N.E.C. Health advocacy groups
F Mental Health Crisis intervention Counseling centers
K Food/Nutrition Food, agriculture, nutrition Food banks
K30 Food Service Free food distribution School meal programs
K34 Congregate Meals Community dining programs Senior nutrition sites
N Recreation Sports, leisure, athletics Community rec centers
O Youth Dev Youth development programs After-school programs
O50 Youth Other Youth development N.E.C. Mentoring programs
P Human Services Multipurpose human services Family support centers
X Religion Religious organizations Churches, synagogues
X20 Christian Christian orgs Church health ministries
W Public Benefit Society benefit programs Water advocacy groups

NTEE Hierarchy

E (Health)
β”œβ”€β”€ E20 (Hospitals)
β”œβ”€β”€ E30 (Ambulatory Health)
β”‚   └── E32 (School-Based Health) ⭐ Mobile dental units
β”œβ”€β”€ E40 (Reproductive Health)
└── E80 (Health N.E.C.)

X (Religion)
β”œβ”€β”€ X20 (Christian) ⭐ Church health ministries
β”œβ”€β”€ X30 (Jewish)
└── X40 (Islamic)

Quick Start

1. Discover All Tuscaloosa Nonprofits

source .venv/bin/activate
python scripts/discover_tuscaloosa_nonprofits.py

Output: frontend/policy-dashboards/src/data/tuscaloosa_nonprofits.json

2. Search Specific NTEE Codes

from discovery.nonprofit_discovery import NonprofitDiscovery

discovery = NonprofitDiscovery()

# Just dental/school health
dental = discovery.search_propublica(
    state="AL",
    city="Tuscaloosa",
    ntee_code="E32"
)

# Churches with health ministries
churches = discovery.search_propublica(
    state="AL",
    city="Tuscaloosa", 
    ntee_code="X20"
)

# Food/nutrition programs
food = discovery.search_propublica(
    state="AL",
    city="Tuscaloosa",
    ntee_code="K"
)

# Merge and export
all_orgs = discovery.merge_nonprofit_data(dental, churches)
all_orgs.extend(food)
discovery.export_to_frontend(all_orgs)

3. Get Detailed Financials

# Get 5 years of 990 data for a specific org
details = discovery.get_propublica_org_details("63-0123456")

print(f"Organization: {details['name']}")
print(f"NTEE: {details['ntee_code']} - {details['ntee_description']}")

print("\nRecent Filings:")
for filing in details['filings']:
    revenue = filing['total_revenue']
    expenses = filing['total_expenses']
    year = filing['tax_period']
    print(f"  {year}: ${revenue:,} revenue, ${expenses:,} expenses")

Data Model

Nonprofit Record (Frontend Format)

{
  "name": "Tuscaloosa County Interfaith Dental Initiative",
  "ein": "63-0345678",
  "ntee_code": "E32",
  "ntee_description": "School-Based Health Care",
  "mission": "Multi-faith collaboration providing free dental care",
  "services": [
    "Mobile dental unit serving Title I schools",
    "Free toothbrush and fluoride programs",
    "Parent education workshops"
  ],
  "annual_budget": 125000,
  "students_served": 2400,
  "families_served": 0,
  "youth_served": 0,
  "contact": {
    "website": "https://tuscaloosainterfaithdental.org",
    "email": "contact@tuscaloosainterfaithdental.org",
    "phone": "(205) 555-0300"
  },
  "logo_url": "https://...",
  "volunteer_opportunities": true,
  "accepting_board_members": true
}

ProPublica API Response

{
  "organizations": [
    {
      "ein": "630345678",
      "name": "TUSCALOOSA COUNTY INTERFAITH DENTAL INITIATIVE",
      "city": "TUSCALOOSA",
      "state": "AL",
      "ntee_code": "E32",
      "revenue_amount": 125000,
      "asset_amount": 45000,
      "income_amount": 125000
    }
  ]
}

Architecture

Discovery Pipeline

1. Search ProPublica API
   ↓ (by state, city, NTEE code)
2. Get Financial Data
   ↓ (revenue, expenses, assets)
3. Enrich with Every.org
   ↓ (mission, logo, causes)
4. Match to Government Decisions
   ↓ (by NTEE code)
5. Export to Frontend
   ↓
frontend/policy-dashboards/src/data/tuscaloosa_nonprofits.json

Caching Strategy

All API responses are cached in data/cache/nonprofits/:

data/cache/nonprofits/
β”œβ”€β”€ propublica_AL_E_Tuscaloosa.json
β”œβ”€β”€ propublica_AL_E32_Tuscaloosa.json
β”œβ”€β”€ propublica_org_63-0345678.json
└── everyorg_Tuscaloosa_AL_health-education.json

Benefits:

  • Faster subsequent runs (no API calls)
  • Respectful to free APIs (no repeated requests)
  • Offline development possible
  • Manual review/editing of cached data

Cache Invalidation:

  • Delete cache files to force fresh download
  • Recommended refresh: Monthly (990 data updates annually)

Cost Comparison

Paid Services

Service Cost Coverage
Candid/GuideStar Premium $500-2,000/month Deep services data
Charity Navigator API $500+/month Ratings + financials
GiveWell Data Free (limited) Top charities only

Our Free Stack

Service Cost Coverage
ProPublica API $0 1.8M orgs, 10+ years
IRS TEOS $0 All U.S. nonprofits
Every.org API $0 (basic) Mission + logos
Total $0/month 95% of paid features

What You Give Up:

  • Real-time "services provided" updates (need manual enrichment)
  • Phone numbers/emails (need scraping or manual entry)
  • Volunteer opportunities feed (need manual verification)

What You Keep:

  • All financial data (revenue, expenses, assets)
  • NTEE classification (interoperable with paid services)
  • Mission statements and descriptions
  • Scalability to all 50 states

Advanced Usage

Bulk Download for All Alabama

# Get ALL health nonprofits in Alabama
alabama_health = []

for city in ["Birmingham", "Montgomery", "Mobile", "Tuscaloosa", "Huntsville"]:
    orgs = discovery.search_propublica(
        state="AL",
        city=city,
        ntee_code="E"
    )
    alabama_health.extend(orgs)
    time.sleep(1)  # Rate limiting

print(f"Found {len(alabama_health)} health nonprofits in Alabama")

Find Nonprofits by Revenue

# Find large health orgs (>$1M revenue)
large_orgs = [
    org for org in nonprofits
    if (org.get('revenue_amount') or 0) > 1000000
]

print(f"Large organizations: {len(large_orgs)}")
for org in sorted(large_orgs, key=lambda x: x['revenue_amount'], reverse=True)[:10]:
    print(f"  {org['name']}: ${org['revenue_amount']:,}")

Match to Government Decisions

# Load government decisions with NTEE codes
with open('frontend/policy-dashboards/src/data/tuscaloosa_policies.json') as f:
    decisions = json.load(f)

# Find nonprofits for each deferred decision
for decision in decisions:
    if decision.get('outcome') in ['Tabled', 'Deferred']:
        ntee = decision.get('ntee_code')
        
        # Find matching nonprofits
        matches = [
            org for org in nonprofits
            if org['ntee_code'] == ntee or
               org['ntee_code'].startswith(ntee[0])
        ]
        
        if matches:
            print(f"\nDecision: {decision['decision_summary']}")
            print(f"Government said NO, but {len(matches)} nonprofits are doing it:")
            for org in matches[:3]:
                revenue = org.get('revenue_amount', 0)
                print(f"  β€’ {org['name']}: ${revenue:,}/year")

Troubleshooting

ProPublica API Returns Empty Results

Possible causes:

  • City name spelling (try "Tuscaloosa" vs "TUSCALOOSA")
  • NTEE code doesn't exist in that location
  • No nonprofits in that category

Solutions:

# Try broader search (remove city filter)
orgs = discovery.search_propublica(state="AL", ntee_code="E32")

# Try major category only (E vs E32)
orgs = discovery.search_propublica(state="AL", city="Tuscaloosa", ntee_code="E")

Every.org API Requires Authentication

Solution: Every.org is optional. ProPublica provides 90% of needed data.

# Skip Every.org if auth fails
try:
    everyorg_orgs = discovery.search_everyorg(...)
except:
    everyorg_orgs = []  # Continue with ProPublica data only

Rate Limiting

Built-in protection: Module automatically spaces requests 1 second apart.

If you hit rate limits:

discovery.min_request_interval = 2.0  # Increase to 2 seconds

Next Steps

  1. Run discovery: python scripts/discover_tuscaloosa_nonprofits.py
  2. Review output: Check frontend/policy-dashboards/src/data/tuscaloosa_nonprofits.json
  3. Manual enrichment: Add phone/email from Findhelp.org or 211
  4. Verify services: Cross-check "services provided" with org websites
  5. Launch frontend: cd frontend/policy-dashboards && npm start

Resources