open-navigator / website /docs /guides /api-troubleshooting.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 5
---
# API Troubleshooting
Common issues when working with external APIs and their solutions.
## ProPublica Nonprofit Explorer API
### 500 Internal Server Error
**Symptom:**
```
ERROR | ProPublica API request failed: 500 Server Error: Internal Server Error
```
**Cause:**
The ProPublica API is experiencing server-side issues. This is not a problem with your code or configuration.
**Solution:**
The pipeline now includes **automatic retry logic** with exponential backoff:
1. **Automatic retries**: Up to 3 attempts per request
2. **Exponential backoff**: 2s, 4s, 8s delays between retries
3. **Graceful degradation**: Continues processing other states/NTEE codes if one fails
**What to do:**
1. **Wait and retry** - API issues are usually temporary:
```bash
# Try again in 5-10 minutes
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
```
2. **Try different states** - Some states may work while others fail:
```bash
# Try California and Texas instead
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX
```
3. **Use cached data** - If you've successfully discovered data before:
```bash
# Use existing bronze data
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
```
4. **Check API status** - Visit the ProPublica website to check for known issues
5. **Reduce request volume** - Try fewer NTEE codes at once by modifying the script
:::tip Success Rate
The pipeline shows a **discovery summary** with success/failure counts:
```
DISCOVERY SUMMARY
Total requests: 12
Successful: 8 (66.7%)
No results: 2
Failed: 2
Total nonprofits discovered: 1,247
```
Even with some failures, you'll still get useful data!
:::
### Rate Limiting
**Symptom:**
```
Too many requests
```
**Solution:**
The pipeline includes automatic rate limiting (1 request/second). If you still encounter issues, the built-in retry logic will handle it.
### Timeout Errors
**Symptom:**
```
Request timeout
```
**Solution:**
- Automatic retry with exponential backoff
- Timeout increased to 30 seconds per request
- If all retries fail, continues to next request
## Alternative Data Sources
If ProPublica API is consistently unavailable, you can use these alternative sources:
### 1. IRS Tax Exempt Organization Search
Direct download of IRS data:
- https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads
### 2. Every.org API
Alternative nonprofit data source (requires registration):
- https://www.every.org/nonprofits
### 3. GuideStar/Candid
Comprehensive nonprofit database (some features require subscription):
- https://www.guidestar.org/
## Pipeline Best Practices
### Start Small
```bash
# Test with one state first
python scripts/create_all_gold_tables.py --nonprofits-only --states AL
```
### Check Cached Data
```bash
# See what's already been discovered
ls -lh data/cache/nonprofits/
ls -lh data/bronze/nonprofits/
```
### Monitor Progress
The pipeline provides detailed logging:
- ✅ Successful requests
- ⚠️ No results found
- ❌ Failed requests
- Progress counter (8/12)
### Use Skip Discovery
If you've already discovered data and just want to regenerate gold tables:
```bash
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
```
## Error Codes Reference
| Error Code | Meaning | Solution |
|------------|---------|----------|
| 500 | Server error | Retry later, API is down |
| 429 | Too many requests | Built-in rate limiting handles this |
| 404 | Not found | Check state/NTEE code validity |
| 403 | Forbidden | Check if API requires authentication |
| Timeout | Request took too long | Automatic retry with backoff |
## Getting Help
If issues persist:
1. **Check cache directory** - Data may have been partially downloaded:
```bash
ls -lh data/cache/nonprofits/
```
2. **Review logs** - Detailed error messages help diagnose issues
3. **Try different parameters**:
```bash
# Different states
--states NY CA FL
# Skip discovery (use cached)
--skip-discovery
```
4. **File an issue** - Include:
- Error messages
- States/NTEE codes attempted
- Timestamp
- Discovery summary output
## Success Stories
**Expected behavior:**
- Some requests may fail (API issues)
- Pipeline continues processing
- You get partial results from successful requests
- Summary shows what worked vs. what failed
**Example successful run:**
```
DISCOVERY SUMMARY
Total requests: 24 (4 states × 6 NTEE codes)
Successful: 18 (75%)
No results: 4
Failed: 2
Total nonprofits discovered: 3,421
✅ Created gold tables with 3,421 nonprofit records!
```
Even with 2 failed requests, you got 3,400+ nonprofits!
---
## Quick Reference
```bash
# Standard run (handles failures gracefully)
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
# Use cached data (skip API calls)
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
# Try different states if some fail
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX NY
# Run only meetings (no API calls)
python scripts/create_all_gold_tables.py --meetings-only
```