Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 5,239 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | ---
sidebar_position: 5
---
# API Troubleshooting
Common issues when working with external APIs and their solutions.
## ProPublica Nonprofit Explorer API
### 500 Internal Server Error
**Symptom:**
```
ERROR | ProPublica API request failed: 500 Server Error: Internal Server Error
```
**Cause:**
The ProPublica API is experiencing server-side issues. This is not a problem with your code or configuration.
**Solution:**
The pipeline now includes **automatic retry logic** with exponential backoff:
1. **Automatic retries**: Up to 3 attempts per request
2. **Exponential backoff**: 2s, 4s, 8s delays between retries
3. **Graceful degradation**: Continues processing other states/NTEE codes if one fails
**What to do:**
1. **Wait and retry** - API issues are usually temporary:
```bash
# Try again in 5-10 minutes
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
```
2. **Try different states** - Some states may work while others fail:
```bash
# Try California and Texas instead
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX
```
3. **Use cached data** - If you've successfully discovered data before:
```bash
# Use existing bronze data
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
```
4. **Check API status** - Visit the ProPublica website to check for known issues
5. **Reduce request volume** - Try fewer NTEE codes at once by modifying the script
:::tip Success Rate
The pipeline shows a **discovery summary** with success/failure counts:
```
DISCOVERY SUMMARY
Total requests: 12
Successful: 8 (66.7%)
No results: 2
Failed: 2
Total nonprofits discovered: 1,247
```
Even with some failures, you'll still get useful data!
:::
### Rate Limiting
**Symptom:**
```
Too many requests
```
**Solution:**
The pipeline includes automatic rate limiting (1 request/second). If you still encounter issues, the built-in retry logic will handle it.
### Timeout Errors
**Symptom:**
```
Request timeout
```
**Solution:**
- Automatic retry with exponential backoff
- Timeout increased to 30 seconds per request
- If all retries fail, continues to next request
## Alternative Data Sources
If ProPublica API is consistently unavailable, you can use these alternative sources:
### 1. IRS Tax Exempt Organization Search
Direct download of IRS data:
- https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads
### 2. Every.org API
Alternative nonprofit data source (requires registration):
- https://www.every.org/nonprofits
### 3. GuideStar/Candid
Comprehensive nonprofit database (some features require subscription):
- https://www.guidestar.org/
## Pipeline Best Practices
### Start Small
```bash
# Test with one state first
python scripts/create_all_gold_tables.py --nonprofits-only --states AL
```
### Check Cached Data
```bash
# See what's already been discovered
ls -lh data/cache/nonprofits/
ls -lh data/bronze/nonprofits/
```
### Monitor Progress
The pipeline provides detailed logging:
- ✅ Successful requests
- ⚠️ No results found
- ❌ Failed requests
- Progress counter (8/12)
### Use Skip Discovery
If you've already discovered data and just want to regenerate gold tables:
```bash
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
```
## Error Codes Reference
| Error Code | Meaning | Solution |
|------------|---------|----------|
| 500 | Server error | Retry later, API is down |
| 429 | Too many requests | Built-in rate limiting handles this |
| 404 | Not found | Check state/NTEE code validity |
| 403 | Forbidden | Check if API requires authentication |
| Timeout | Request took too long | Automatic retry with backoff |
## Getting Help
If issues persist:
1. **Check cache directory** - Data may have been partially downloaded:
```bash
ls -lh data/cache/nonprofits/
```
2. **Review logs** - Detailed error messages help diagnose issues
3. **Try different parameters**:
```bash
# Different states
--states NY CA FL
# Skip discovery (use cached)
--skip-discovery
```
4. **File an issue** - Include:
- Error messages
- States/NTEE codes attempted
- Timestamp
- Discovery summary output
## Success Stories
**Expected behavior:**
- Some requests may fail (API issues)
- Pipeline continues processing
- You get partial results from successful requests
- Summary shows what worked vs. what failed
**Example successful run:**
```
DISCOVERY SUMMARY
Total requests: 24 (4 states × 6 NTEE codes)
Successful: 18 (75%)
No results: 4
Failed: 2
Total nonprofits discovered: 3,421
✅ Created gold tables with 3,421 nonprofit records!
```
Even with 2 failed requests, you got 3,400+ nonprofits!
---
## Quick Reference
```bash
# Standard run (handles failures gracefully)
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
# Use cached data (skip API calls)
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
# Try different states if some fail
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX NY
# Run only meetings (no API calls)
python scripts/create_all_gold_tables.py --meetings-only
```
|