Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 10,926 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | ---
sidebar_position: 5
sidebar_label: Adding New Data Sources
---
# Adding New Data Sources - Compliance Checklist
:::tip[Use This Checklist]
**Before integrating any new data source**, work through this checklist to ensure legal compliance, proper attribution, and best practices.
:::
## β
Pre-Integration Checklist
### 1. Legal Review
- [ ] **Find and read the Terms of Service**
- API Terms of Service URL: _________________
- Data Usage Policy URL: _________________
- Last reviewed: _________________
- [ ] **Verify the data is legally accessible**
- [ ] Public domain (U.S. Government data)
- [ ] Open license (CC0, CC-BY, MIT, etc.)
- [ ] Free API with terms of service
- [ ] Paid API with commercial license
- [ ] **Check for usage restrictions**
- [ ] No restrictions on commercial use
- [ ] No restrictions on redistribution
- [ ] No prohibition on caching/storage
- [ ] No requirement for user consent/opt-in
- [ ] **Identify attribution requirements**
- Required attribution text: _________________
- Logo/trademark requirements: _________________
- Link-back requirements: _________________
### 2. API Access & Rate Limits
- [ ] **API Key Requirements**
- [ ] No API key required β
- [ ] Free API key (document registration process)
- [ ] Paid API key (not recommended for open-source project)
- [ ] **Rate Limits**
- Requests per second: _________________
- Requests per day: _________________
- Requests per month: _________________
- Recommended delay between requests: _________________
- [ ] **User-Agent Requirements**
- [ ] Custom User-Agent required
- [ ] Contact email required
- [ ] Project URL required
### 3. Data Privacy & Personal Information
- [ ] **Data Type Classification**
- [ ] Public records only (government data)
- [ ] Aggregated statistics only (no individuals)
- [ ] Individual-level data from public sources
- [ ] Personal information requiring consent (AVOID)
- [ ] **Privacy Compliance**
- [ ] Data is public record
- [ ] No personal financial information
- [ ] No health information (PHI)
- [ ] No authentication required to access original data
- [ ] **GDPR Considerations**
- [ ] Right to be forgotten process documented
- [ ] Legal basis identified (public interest, legitimate interest)
- [ ] Data minimization applied
### 4. Technical Requirements
- [ ] **API Documentation**
- API documentation URL: _________________
- SDK/client library available: _________________
- Code examples available: _________________
- [ ] **Data Format**
- Response format (JSON, XML, CSV): _________________
- Pagination supported: Yes / No
- Batch operations supported: Yes / No
- [ ] **Error Handling**
- [ ] Rate limit error codes documented
- [ ] Retry strategy defined
- [ ] Timeout handling planned
---
## π Implementation Checklist
### 1. Create Integration Module
Create file: `discovery/{source_name}_integration.py`
**Required docstring elements:**
```python
"""
[Source Name] Integration
[Brief description of what this source provides]
Data Source: [Official URL]
API Documentation: [API docs URL]
Terms of Use: [Terms of Service URL]
License: [Data license]
Key Features:
- Feature 1
- Feature 2
- Feature 3
Use Cases:
- Use case 1
- Use case 2
Author: Open Navigator
License: MIT
"""
```
### 2. Implement Rate Limiting
```python
import time
import asyncio
class DataSourceClient:
def __init__(self):
self.request_delay = 1.0 # seconds between requests
self.last_request_time = 0
async def _rate_limit(self):
"""Enforce rate limiting"""
elapsed = time.time() - self.last_request_time
if elapsed < self.request_delay:
await asyncio.sleep(self.request_delay - elapsed)
self.last_request_time = time.time()
```
### 3. Set User-Agent Header
```python
self.session.headers.update({
'User-Agent': 'CommunityOne/1.0 (Civic Engagement Platform; https://communityone.com/)',
'Accept': 'application/json',
})
```
### 4. Handle API Keys Securely
**Add to `.env.example`:**
```bash
# [Source Name] API Key
# Get your key at: [Registration URL]
# Free tier: [Quota details]
[SOURCE]_API_KEY=your-api-key-here
```
**Load from environment:**
```python
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv('[SOURCE]_API_KEY')
if not api_key:
logger.warning("β οΈ [SOURCE]_API_KEY not found")
```
### 5. Add Error Handling
```python
try:
response = await self.session.get(url)
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as e:
if e.response.status_code == 429: # Rate limited
logger.warning(f"Rate limited, waiting...")
await asyncio.sleep(60)
return await self._fetch(url) # Retry
else:
logger.error(f"HTTP error: {e}")
raise
except Exception as e:
logger.error(f"Failed to fetch data: {e}")
raise
```
---
## π Documentation Checklist
### 1. Update Legal Compliance Document
Add to: `website/docs/legal-compliance.md`
**Template:**
```markdown
### [Source Name]
**Data Type:** [Description]
**Source:** [Official URL]
**API Documentation:** [API docs URL]
**License:** [License type]
**Terms of Use:** [ToS URL]
**Compliance Status:** β
**COMPLIANT** / β οΈ **NOT USED**
- [Key compliance point 1]
- [Key compliance point 2]
- API key requirement: Yes/No
- Rate limit: [Details]
**Implementation:** `discovery/[filename].py`
**Use Policy Key Points:**
- [Policy point 1]
- [Policy point 2]
- [Attribution requirements]
**Environment Variable:**
```bash
[SOURCE]_API_KEY=your-api-key-here
```
```
### 2. Update Citations Page
Add to: `website/docs/data-sources/citations.md`
**Template:**
```markdown
### [Source Name]
**Organization:** [Organization name]
**What we use:** [Description of how we use this data]
- **Source:** [Official URL]
- **API Documentation:** [API docs URL]
- **Coverage:** [Geographic/temporal coverage]
- **License:** [License details]
- **Access:** [API key requirements]
**BibTeX:**
```bibtex
@misc{[citation_key],
author = {{[Organization Name]}},
title = {[Dataset/API Name]},
year = {2026},
url = {[Official URL]},
note = {Accessed: 2026}
}
```
```
### 3. Update API Integration Status
Add to: `docs/API_INTEGRATION_STATUS.md`
Document integration status, free vs paid, key requirements, and code examples.
### 4. Add Usage Examples
Create or update: `examples/demo_[source_name].py`
```python
#!/usr/bin/env python3
"""
Example: [Source Name] Integration
Demonstrates how to fetch data from [Source Name] API.
"""
import asyncio
from discovery.[source_name]_integration import [ClassName]
async def main():
"""Example usage"""
client = [ClassName](api_key="your-key-here")
# Example query
results = await client.fetch_data(param="value")
print(f"Found {len(results)} results")
for item in results[:5]:
print(f" - {item}")
if __name__ == "__main__":
asyncio.run(main())
```
---
## π§ͺ Testing Checklist
### 1. Unit Tests
- [ ] Test API client initialization
- [ ] Test successful data fetch
- [ ] Test rate limiting
- [ ] Test error handling (404, 500, 429)
- [ ] Test API key validation
### 2. Integration Tests
- [ ] Test with real API (if free tier available)
- [ ] Test with demo/sandbox environment
- [ ] Verify data format matches schema
- [ ] Test pagination (if applicable)
### 3. Compliance Tests
- [ ] Verify User-Agent is set correctly
- [ ] Verify rate limiting is enforced
- [ ] Verify attribution is included in output
- [ ] Verify no API keys in logs or code
---
## π Pre-Deployment Checklist
### 1. Code Review
- [ ] Code follows project style guidelines
- [ ] Type hints added for all functions
- [ ] Docstrings complete and accurate
- [ ] No hardcoded credentials
- [ ] No debug print statements
### 2. Documentation Review
- [ ] Legal compliance doc updated
- [ ] Citations page updated
- [ ] API integration status updated
- [ ] Usage examples created
- [ ] README updated (if needed)
### 3. Security Review
- [ ] No API keys in code
- [ ] Environment variables documented in `.env.example`
- [ ] User-Agent identifies project
- [ ] Rate limiting prevents abuse
- [ ] Error messages don't leak sensitive info
### 4. License Review
- [ ] Data source license compatible with MIT
- [ ] Attribution requirements documented
- [ ] Terms of service compliance verified
- [ ] Commercial use permitted (or documented as reference only)
---
## π Quick Reference: Data Source Types
### β
RECOMMENDED: Public Domain Government Data
**Examples:** IRS, Census Bureau, NCES, Grants.gov
**Characteristics:**
- No API key required (usually)
- Public domain - no restrictions
- Free unlimited access
- No attribution required (but recommended)
**Best for:** Production use, open-source projects
---
### β
RECOMMENDED: Free Public APIs (API Key Required)
**Examples:** Open States, Google Civic API, Wikidata, DBpedia
**Characteristics:**
- Free API key registration
- Generous free tier quotas
- Open license or public domain data
- Attribution required
**Best for:** Production use with proper attribution
---
### β οΈ CAUTION: Free APIs with Restrictions
**Examples:** ProPublica, FEC (contributor restrictions)
**Characteristics:**
- Free access but with usage restrictions
- May prohibit commercial use of certain data
- May have low rate limits
- May require approval process
**Best for:** Research, education, limited production use
---
### β AVOID: Paid Commercial APIs
**Examples:** Ballotpedia API, Cicero API
**Characteristics:**
- Requires paid subscription
- Not suitable for open-source projects
- May have restrictive terms
**Best for:** Reference implementations only, enterprise deployments
---
## π Resources
- [Legal Compliance Documentation](../legal-compliance.md)
- [Citations & Data Sources](../data-sources/citations.md)
- [API Integration Status](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/docs/API_INTEGRATION_STATUS.md)
- [Project License (MIT)](https://github.com/getcommunityone/open-navigator-for-engagement/blob/main/LICENSE)
---
## π Questions?
If you're unsure about legal compliance for a data source:
1. **Check the Terms of Service** - Start here always
2. **Look for similar integrations** - See how other open-source projects use it
3. **Ask the community** - Open a GitHub Discussion
4. **Consult legal counsel** - When in doubt, especially for commercial use
---
:::warning[When in Doubt, Don't Integrate]
If you cannot clearly verify that a data source:
- Is legally accessible
- Permits commercial use and redistribution
- Has acceptable rate limits and API quotas
- Doesn't violate privacy laws
**DO NOT INTEGRATE IT.** Mark it as "reference only" or find a free alternative.
:::
|