# Bulk Downloads vs API: Which to Use?

## TL;DR

**Use Bulk Downloads** for:
- ✅ Historical analysis (analyzing past sessions)
- ✅ Map generation (need all states at once)
- ✅ Research projects (large datasets)
- ✅ Offline processing
- ✅ Multi-issue tracking across all states

**Use API** for:
- ✅ Real-time bill status (same-day updates)
- ✅ Search by specific keywords
- ✅ Individual bill lookups
- ✅ Automated alerts for bill changes

---

## Comparison Table

| Feature | Bulk Download | API |
|---------|--------------|-----|
| **Speed (50 states)** | ⚡ 5-10 minutes | 🐌 2-4 hours |
| **API Key Required** | ❌ No | ✅ Yes |
| **Rate Limits** | ❌ None | ⚠️ 50K/month |
| **Internet Required** | Download once | Always |
| **Data Freshness** | Monthly updates | Real-time |
| **Bill Text** | ✅ Full text (JSON) | ✅ Via API |
| **Complete Sessions** | ✅ All bills | Paginated |
| **Cost** | 💰 Free | 💰 Free (50K limit) |
| **Redistribution** | ✅ Allowed | ⚠️ Varies by state |

---

## Real-World Example

### Task: Create fluoridation legislation map for all 50 states (2024)

#### Method 1: Bulk Download

```bash
# Download all 50 states
python scripts/bulk_legislative_download.py --year 2024 --format csv --merge

# Time: ~5 minutes
# API calls: 0
# Result: 1 CSV file with ALL bills
```

**Result:** One 500MB file with ~100,000 bills from all states

#### Method 2: API

```bash
# Search each state individually
python scripts/legislative_tracker.py --issue fluoridation --year 2024

# Time: ~2-4 hours
# API calls: ~10,000 (search + pagination)
# Result: Filtered bills matching "fluoridation"
```

**Result:** Filtered dataset with ~500 matching bills

---

## When API is Better

### Use Case 1: Real-Time Bill Tracking

**Need:** Alert when a specific bill status changes

```python
# API can check latest status
async def check_bill_status(bill_id):
    response = await client.get(f"{base_url}/bills/{bill_id}")
    return response.json()['latest_action']

# Bulk: Would need to wait for next monthly dump
```

### Use Case 2: Keyword Search

**Need:** Find all bills mentioning "oral health"

```python
# API can search full text
params = {"q": "oral health", "jurisdiction": "AL"}
response = await client.get(f"{base_url}/bills", params=params)

# Bulk: Would need to download all bills, then search locally
```

### Use Case 3: Single Bill Lookup

**Need:** Get details for one specific bill

```python
# API is instant
response = await client.get(f"{base_url}/bills/AL/2024/HB123")

# Bulk: Download entire session just for one bill
```

---

## When Bulk Downloads are Better

### Use Case 1: All-State Analysis

**Need:** Map legislation across all 50 states

**API Approach:**
```python
# 50 states × 100 requests per state = 5,000 API calls
# Time: ~2 hours (with rate limiting)
# Risk: Hit API quota limit
```

**Bulk Approach:**
```python
# Download all 50 state CSV files
# Time: ~5 minutes
# API calls: 0
# No quota concerns
```

**Winner:** Bulk (50x faster)

### Use Case 2: Historical Trends

**Need:** Analyze fluoridation bills from 2010-2024

**API Approach:**
```python
# 50 states × 15 years × 100 requests = 75,000 API calls
# Time: Would exceed free tier quota
# Cost: Need paid plan
```

**Bulk Approach:**
```python
# Download 50 states × 15 years = 750 CSV files
# Time: ~30 minutes
# Cost: Free, no limits
```

**Winner:** Bulk (only viable option)

### Use Case 3: Offline Processing

**Need:** Process data without internet

**API Approach:**
```python
# Must cache all API responses locally
# Complex caching logic needed
# Cache invalidation issues
```

**Bulk Approach:**
```python
# Download once, process forever
# No internet needed after download
# Simple file-based workflow
```

**Winner:** Bulk (simpler)

---

## Hybrid Approach (Best of Both Worlds)

### Strategy: Bulk for foundation, API for updates

```python
# 1. Download complete 2024 session (bulk)
!python scripts/bulk_legislative_download.py --year 2024 --merge

# 2. Load bulk data
df = pd.read_csv('data/cache/legislation_bulk/all_states_2024.csv')
print(f"Loaded {len(df)} bills from bulk download")

# 3. Use API for recent updates (last 7 days)
from datetime import datetime, timedelta
recent_cutoff = datetime.now() - timedelta(days=7)

# API search for bills updated in last week
async def get_recent_updates():
    params = {
        "updated_since": recent_cutoff.isoformat(),
        "jurisdiction": "all"
    }
    return await api_client.get("/bills", params=params)

recent = await get_recent_updates()

# 4. Merge bulk + recent updates
combined = pd.concat([df, recent])
```

**Benefits:**
- Complete historical data (bulk)
- Real-time updates (API)
- Minimal API calls (only recent changes)

---

## Recommendations by Project Type

### Academic Research
→ **Use Bulk Downloads**
- Need complete datasets
- Historical analysis
- No real-time requirements
- May publish/redistribute

### News/Journalism
→ **Use API**
- Need latest bill status
- Breaking news coverage
- Specific bill tracking
- Real-time alerts

### Advocacy Campaigns
→ **Use Hybrid**
- Bulk for initial analysis
- API for monitoring active bills
- Alerts when bills advance
- Historical context + real-time

### Government Dashboards
→ **Use Hybrid**
- Bulk for historical trends
- API for current session
- Daily/weekly refresh
- Public redistribution

---

## Cost Analysis

### Free Tier Limits

**API:**
- 50,000 requests/month free
- ~100 bills per request (pagination)
- = ~5M bill records/month max

**Bulk:**
- Unlimited downloads
- ~100K bills per download
- = Unlimited bill records/month

### Time to Download All States (2024)

**API (50 states):**
```
50 states × 100 API calls = 5,000 requests
5,000 requests × 0.5s rate limit = 2,500 seconds = ~42 minutes
(Not including processing time)
```

**Bulk (50 states):**
```
50 CSV downloads × 5s each = 250 seconds = ~4 minutes
(Includes all data, no processing needed)
```

**Time Saved:** ~38 minutes (10x faster)

### Data Completeness

**API:**
- Must paginate through all results
- Risk of missing data if pagination fails
- Requires careful error handling

**Bulk:**
- Complete session in one file
- Guaranteed completeness
- No pagination errors

---

## PostgreSQL Dump Option

**For power users:**

```bash
# Download complete Open States database
python scripts/bulk_legislative_download.py --postgres --month 2026-04

# Restore to local PostgreSQL
pg_restore -d openstates 2026-04-public.pgdump

# Now use SQL for analysis!
psql openstates -c "
  SELECT state, COUNT(*) as bill_count
  FROM bills
  WHERE session_year = 2024
  GROUP BY state
  ORDER BY bill_count DESC;
"
```

**Benefits:**
- Complete database with relationships
- SQL queries for complex analysis
- No need for Python/pandas
- Can use PostgreSQL extensions
- Best for large-scale research

**Drawbacks:**
- Large file size (~5GB compressed)
- Requires PostgreSQL installation
- More complex setup

---

## Final Recommendation

**Default choice: Bulk Downloads**

Reasons:
1. Faster (10x speed improvement)
2. No API key setup
3. No rate limits
4. Work offline
5. Complete sessions guaranteed

**Switch to API when:**
- Need real-time status
- Tracking specific bills
- Keyword search required
- Small subset of data

**Use Both when:**
- Initial bulk download
- Periodic API updates
- Best of both worlds