File size: 10,926 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
---
sidebar_position: 5
sidebar_label: Adding New Data Sources
---

# Adding New Data Sources - Compliance Checklist

:::tip[Use This Checklist]
**Before integrating any new data source**, work through this checklist to ensure legal compliance, proper attribution, and best practices.
:::

## βœ… Pre-Integration Checklist

### 1. Legal Review

- [ ] **Find and read the Terms of Service**
  - API Terms of Service URL: _________________
  - Data Usage Policy URL: _________________
  - Last reviewed: _________________

- [ ] **Verify the data is legally accessible**
  - [ ] Public domain (U.S. Government data)
  - [ ] Open license (CC0, CC-BY, MIT, etc.)
  - [ ] Free API with terms of service
  - [ ] Paid API with commercial license

- [ ] **Check for usage restrictions**
  - [ ] No restrictions on commercial use
  - [ ] No restrictions on redistribution
  - [ ] No prohibition on caching/storage
  - [ ] No requirement for user consent/opt-in

- [ ] **Identify attribution requirements**
  - Required attribution text: _________________
  - Logo/trademark requirements: _________________
  - Link-back requirements: _________________

### 2. API Access & Rate Limits

- [ ] **API Key Requirements**
  - [ ] No API key required βœ…
  - [ ] Free API key (document registration process)
  - [ ] Paid API key (not recommended for open-source project)

- [ ] **Rate Limits**
  - Requests per second: _________________
  - Requests per day: _________________
  - Requests per month: _________________
  - Recommended delay between requests: _________________

- [ ] **User-Agent Requirements**
  - [ ] Custom User-Agent required
  - [ ] Contact email required
  - [ ] Project URL required

### 3. Data Privacy & Personal Information

- [ ] **Data Type Classification**
  - [ ] Public records only (government data)
  - [ ] Aggregated statistics only (no individuals)
  - [ ] Individual-level data from public sources
  - [ ] Personal information requiring consent (AVOID)

- [ ] **Privacy Compliance**
  - [ ] Data is public record
  - [ ] No personal financial information
  - [ ] No health information (PHI)
  - [ ] No authentication required to access original data

- [ ] **GDPR Considerations**
  - [ ] Right to be forgotten process documented
  - [ ] Legal basis identified (public interest, legitimate interest)
  - [ ] Data minimization applied

### 4. Technical Requirements

- [ ] **API Documentation**
  - API documentation URL: _________________
  - SDK/client library available: _________________
  - Code examples available: _________________

- [ ] **Data Format**
  - Response format (JSON, XML, CSV): _________________
  - Pagination supported: Yes / No
  - Batch operations supported: Yes / No

- [ ] **Error Handling**
  - [ ] Rate limit error codes documented
  - [ ] Retry strategy defined
  - [ ] Timeout handling planned

---

## πŸ“ Implementation Checklist

### 1. Create Integration Module

Create file: `discovery/{source_name}_integration.py`

**Required docstring elements:**
```python
"""
[Source Name] Integration

[Brief description of what this source provides]

Data Source: [Official URL]
API Documentation: [API docs URL]
Terms of Use: [Terms of Service URL]
License: [Data license]

Key Features:
- Feature 1
- Feature 2
- Feature 3

Use Cases:
- Use case 1
- Use case 2

Author: Open Navigator
License: MIT
"""
```

### 2. Implement Rate Limiting

```python
import time
import asyncio

class DataSourceClient:
    def __init__(self):
        self.request_delay = 1.0  # seconds between requests
        self.last_request_time = 0
    
    async def _rate_limit(self):
        """Enforce rate limiting"""
        elapsed = time.time() - self.last_request_time
        if elapsed < self.request_delay:
            await asyncio.sleep(self.request_delay - elapsed)
        self.last_request_time = time.time()
```

### 3. Set User-Agent Header

```python
self.session.headers.update({
    'User-Agent': 'CommunityOne/1.0 (Civic Engagement Platform; https://communityone.com/)',
    'Accept': 'application/json',
})
```

### 4. Handle API Keys Securely

**Add to `.env.example`:**
```bash
# [Source Name] API Key
# Get your key at: [Registration URL]
# Free tier: [Quota details]
[SOURCE]_API_KEY=your-api-key-here
```

**Load from environment:**
```python
import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv('[SOURCE]_API_KEY')
if not api_key:
    logger.warning("⚠️  [SOURCE]_API_KEY not found")
```

### 5. Add Error Handling

```python
try:
    response = await self.session.get(url)
    response.raise_for_status()
    return response.json()
except httpx.HTTPStatusError as e:
    if e.response.status_code == 429:  # Rate limited
        logger.warning(f"Rate limited, waiting...")
        await asyncio.sleep(60)
        return await self._fetch(url)  # Retry
    else:
        logger.error(f"HTTP error: {e}")
        raise
except Exception as e:
    logger.error(f"Failed to fetch data: {e}")
    raise
```

---

## πŸ“š Documentation Checklist

### 1. Update Legal Compliance Document

Add to: `website/docs/legal-compliance.md`

**Template:**
```markdown
### [Source Name]

**Data Type:** [Description]
**Source:** [Official URL]
**API Documentation:** [API docs URL]
**License:** [License type]
**Terms of Use:** [ToS URL]

**Compliance Status:** βœ… **COMPLIANT** / ⚠️ **NOT USED**
- [Key compliance point 1]
- [Key compliance point 2]
- API key requirement: Yes/No
- Rate limit: [Details]

**Implementation:** `discovery/[filename].py`

**Use Policy Key Points:**
- [Policy point 1]
- [Policy point 2]
- [Attribution requirements]

**Environment Variable:**
```bash
[SOURCE]_API_KEY=your-api-key-here
```
```

### 2. Update Citations Page

Add to: `website/docs/data-sources/citations.md`

**Template:**
```markdown
### [Source Name]

**Organization:** [Organization name]
**What we use:** [Description of how we use this data]

- **Source:** [Official URL]
- **API Documentation:** [API docs URL]
- **Coverage:** [Geographic/temporal coverage]
- **License:** [License details]
- **Access:** [API key requirements]

**BibTeX:**
```bibtex
@misc{[citation_key],
  author = {{[Organization Name]}},
  title = {[Dataset/API Name]},
  year = {2026},
  url = {[Official URL]},
  note = {Accessed: 2026}
}
```
```

### 3. Update API Integration Status

Add to: `docs/API_INTEGRATION_STATUS.md`

Document integration status, free vs paid, key requirements, and code examples.

### 4. Add Usage Examples

Create or update: `examples/demo_[source_name].py`

```python
#!/usr/bin/env python3
"""
Example: [Source Name] Integration

Demonstrates how to fetch data from [Source Name] API.
"""

import asyncio
from discovery.[source_name]_integration import [ClassName]

async def main():
    """Example usage"""
    client = [ClassName](api_key="your-key-here")
    
    # Example query
    results = await client.fetch_data(param="value")
    
    print(f"Found {len(results)} results")
    for item in results[:5]:
        print(f"  - {item}")

if __name__ == "__main__":
    asyncio.run(main())
```

---

## πŸ§ͺ Testing Checklist

### 1. Unit Tests

- [ ] Test API client initialization
- [ ] Test successful data fetch
- [ ] Test rate limiting
- [ ] Test error handling (404, 500, 429)
- [ ] Test API key validation

### 2. Integration Tests

- [ ] Test with real API (if free tier available)
- [ ] Test with demo/sandbox environment
- [ ] Verify data format matches schema
- [ ] Test pagination (if applicable)

### 3. Compliance Tests

- [ ] Verify User-Agent is set correctly
- [ ] Verify rate limiting is enforced
- [ ] Verify attribution is included in output
- [ ] Verify no API keys in logs or code

---

## πŸš€ Pre-Deployment Checklist

### 1. Code Review

- [ ] Code follows project style guidelines
- [ ] Type hints added for all functions
- [ ] Docstrings complete and accurate
- [ ] No hardcoded credentials
- [ ] No debug print statements

### 2. Documentation Review

- [ ] Legal compliance doc updated
- [ ] Citations page updated
- [ ] API integration status updated
- [ ] Usage examples created
- [ ] README updated (if needed)

### 3. Security Review

- [ ] No API keys in code
- [ ] Environment variables documented in `.env.example`
- [ ] User-Agent identifies project
- [ ] Rate limiting prevents abuse
- [ ] Error messages don't leak sensitive info

### 4. License Review

- [ ] Data source license compatible with MIT
- [ ] Attribution requirements documented
- [ ] Terms of service compliance verified
- [ ] Commercial use permitted (or documented as reference only)

---

## πŸ“‹ Quick Reference: Data Source Types

### βœ… RECOMMENDED: Public Domain Government Data

**Examples:** IRS, Census Bureau, NCES, Grants.gov

**Characteristics:**
- No API key required (usually)
- Public domain - no restrictions
- Free unlimited access
- No attribution required (but recommended)

**Best for:** Production use, open-source projects

---

### βœ… RECOMMENDED: Free Public APIs (API Key Required)

**Examples:** Open States, Google Civic API, Wikidata, DBpedia

**Characteristics:**
- Free API key registration
- Generous free tier quotas
- Open license or public domain data
- Attribution required

**Best for:** Production use with proper attribution

---

### ⚠️ CAUTION: Free APIs with Restrictions

**Examples:** ProPublica, FEC (contributor restrictions)

**Characteristics:**
- Free access but with usage restrictions
- May prohibit commercial use of certain data
- May have low rate limits
- May require approval process

**Best for:** Research, education, limited production use

---

### ❌ AVOID: Paid Commercial APIs

**Examples:** Ballotpedia API, Cicero API

**Characteristics:**
- Requires paid subscription
- Not suitable for open-source projects
- May have restrictive terms

**Best for:** Reference implementations only, enterprise deployments

---

## πŸ”— Resources

- [Legal Compliance Documentation](../legal-compliance.md)
- [Citations & Data Sources](../data-sources/citations.md)
- [API Integration Status](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/docs/API_INTEGRATION_STATUS.md)
- [Project License (MIT)](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/LICENSE)

---

## πŸ“ž Questions?

If you're unsure about legal compliance for a data source:

1. **Check the Terms of Service** - Start here always
2. **Look for similar integrations** - See how other open-source projects use it
3. **Ask the community** - Open a GitHub Discussion
4. **Consult legal counsel** - When in doubt, especially for commercial use

---

:::warning[When in Doubt, Don't Integrate]
If you cannot clearly verify that a data source:
- Is legally accessible
- Permits commercial use and redistribution
- Has acceptable rate limits and API quotas
- Doesn't violate privacy laws

**DO NOT INTEGRATE IT.** Mark it as "reference only" or find a free alternative.
:::