File size: 5,239 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
sidebar_position: 5
---

# API Troubleshooting

Common issues when working with external APIs and their solutions.

## ProPublica Nonprofit Explorer API

### 500 Internal Server Error

**Symptom:**
```
ERROR | ProPublica API request failed: 500 Server Error: Internal Server Error
```

**Cause:**
The ProPublica API is experiencing server-side issues. This is not a problem with your code or configuration.

**Solution:**

The pipeline now includes **automatic retry logic** with exponential backoff:

1. **Automatic retries**: Up to 3 attempts per request
2. **Exponential backoff**: 2s, 4s, 8s delays between retries
3. **Graceful degradation**: Continues processing other states/NTEE codes if one fails

**What to do:**

1. **Wait and retry** - API issues are usually temporary:
   ```bash
   # Try again in 5-10 minutes
   python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI
   ```

2. **Try different states** - Some states may work while others fail:
   ```bash
   # Try California and Texas instead
   python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX
   ```

3. **Use cached data** - If you've successfully discovered data before:
   ```bash
   # Use existing bronze data
   python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
   ```

4. **Check API status** - Visit the ProPublica website to check for known issues

5. **Reduce request volume** - Try fewer NTEE codes at once by modifying the script

:::tip Success Rate
The pipeline shows a **discovery summary** with success/failure counts:
```
DISCOVERY SUMMARY
Total requests: 12
Successful: 8 (66.7%)
No results: 2
Failed: 2
Total nonprofits discovered: 1,247
```

Even with some failures, you'll still get useful data!
:::

### Rate Limiting

**Symptom:**
```
Too many requests
```

**Solution:**
The pipeline includes automatic rate limiting (1 request/second). If you still encounter issues, the built-in retry logic will handle it.

### Timeout Errors

**Symptom:**
```
Request timeout
```

**Solution:**
- Automatic retry with exponential backoff
- Timeout increased to 30 seconds per request
- If all retries fail, continues to next request

## Alternative Data Sources

If ProPublica API is consistently unavailable, you can use these alternative sources:

### 1. IRS Tax Exempt Organization Search

Direct download of IRS data:
- https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads

### 2. Every.org API

Alternative nonprofit data source (requires registration):
- https://www.every.org/nonprofits

### 3. GuideStar/Candid

Comprehensive nonprofit database (some features require subscription):
- https://www.guidestar.org/

## Pipeline Best Practices

### Start Small

```bash
# Test with one state first
python scripts/create_all_gold_tables.py --nonprofits-only --states AL
```

### Check Cached Data

```bash
# See what's already been discovered
ls -lh data/cache/nonprofits/
ls -lh data/bronze/nonprofits/
```

### Monitor Progress

The pipeline provides detailed logging:
- ✅ Successful requests
- ⚠️  No results found
- ❌ Failed requests
- Progress counter (8/12)

### Use Skip Discovery

If you've already discovered data and just want to regenerate gold tables:

```bash
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery
```

## Error Codes Reference

| Error Code | Meaning | Solution |
|------------|---------|----------|
| 500 | Server error | Retry later, API is down |
| 429 | Too many requests | Built-in rate limiting handles this |
| 404 | Not found | Check state/NTEE code validity |
| 403 | Forbidden | Check if API requires authentication |
| Timeout | Request took too long | Automatic retry with backoff |

## Getting Help

If issues persist:

1. **Check cache directory** - Data may have been partially downloaded:
   ```bash
   ls -lh data/cache/nonprofits/
   ```

2. **Review logs** - Detailed error messages help diagnose issues

3. **Try different parameters**:
   ```bash
   # Different states
   --states NY CA FL
   
   # Skip discovery (use cached)
   --skip-discovery
   ```

4. **File an issue** - Include:
   - Error messages
   - States/NTEE codes attempted
   - Timestamp
   - Discovery summary output

## Success Stories

**Expected behavior:**
- Some requests may fail (API issues)
- Pipeline continues processing
- You get partial results from successful requests
- Summary shows what worked vs. what failed

**Example successful run:**
```
DISCOVERY SUMMARY
Total requests: 24 (4 states × 6 NTEE codes)
Successful: 18 (75%)
No results: 4
Failed: 2
Total nonprofits discovered: 3,421

✅ Created gold tables with 3,421 nonprofit records!
```

Even with 2 failed requests, you got 3,400+ nonprofits!

---

## Quick Reference

```bash
# Standard run (handles failures gracefully)
python scripts/create_all_gold_tables.py --nonprofits-only --states AL MI

# Use cached data (skip API calls)
python scripts/create_all_gold_tables.py --nonprofits-only --skip-discovery

# Try different states if some fail
python scripts/create_all_gold_tables.py --nonprofits-only --states CA TX NY

# Run only meetings (no API calls)
python scripts/create_all_gold_tables.py --meetings-only
```