open-navigator / website /docs /guides /county-aggregation.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 5
---
# County-Level Data Aggregation
This guide explains how to aggregate Open Navigator statistics by county.
## Overview
County-level aggregation allows you to:
- Filter search results by county
- Analyze legislation impact at the county level
- Track nonprofit activity within counties
- Monitor civic engagement metrics by county
## Current Status
### Data Available
**ZIP Code to County Mapping** - We have a complete mapping of ZIP codes (ZCTAs) to counties:
- File: `data/gold/reference/zip_county_mapping.parquet`
- Coverage: 33,791 ZIP codes mapped to counties
- Source: Census Bureau 2020 ZCTA-to-County relationship file
**Township to County Mapping** - Townships encode their county in the GEOID:
- Coverage: 36,421 townships
- Method: First 5 digits of township GEOID = state FIPS + county FIPS
**All Counties** - Complete list of U.S. counties:
- File: `data/gold/reference/jurisdictions_counties.parquet`
- Coverage: 3,222 counties
- Includes: Population, area, coordinates
### Data Not Yet Available
⚠️ **City to County Mapping** - Cities don't include county in Census gazetteer files:
- Cities: 32,333 records
- Current county data: None
- Needed: Census place-to-county relationship files or geocoding
⚠️ **School District to County** - School districts often span multiple counties:
- School districts: 13,326 records
- Current county data: None
- Needed: Census school district-to-county relationship files
## Using County Data
### Query by County
```sql
-- Find all jurisdictions in a county
SELECT name, type, state, county, population
FROM jurisdictions_search
WHERE county = 'Los Angeles County'
AND state = 'CA';
-- Find all townships in a county
SELECT name, type, county, area_sq_miles
FROM jurisdictions_search
WHERE type = 'township'
AND county = 'Cook County'
AND state = 'IL';
```
### API Filtering
The `/api/search/` endpoint supports county filtering:
```typescript
// Search for jurisdictions in a county
const response = await api.get('/search/', {
params: {
q: 'city council',
types: 'jurisdictions',
state: 'CA',
county: 'Los Angeles County',
limit: 20
}
});
```
### ZIP Code to County Lookup
```python
import pandas as pd
# Load ZIP to county mapping
zip_county = pd.read_parquet('data/gold/reference/zip_county_mapping.parquet')
# Look up county for a ZIP code
zip_code = '90210'
county = zip_county[zip_county['zcta'] == zip_code]['county_name'].values[0]
print(f"ZIP {zip_code} is in {county}")
# Output: ZIP 90210 is in Los Angeles County
```
### County Statistics
```sql
-- Count jurisdictions per county
SELECT county, state, COUNT(*) as jurisdiction_count
FROM jurisdictions_search
WHERE county IS NOT NULL
GROUP BY county, state
ORDER BY jurisdiction_count DESC
LIMIT 20;
-- Aggregate nonprofits by county (when county data is available)
-- This requires joining with nonprofit location data
SELECT
z.county_name,
z.state_fips,
COUNT(DISTINCT n.ein) as nonprofit_count,
SUM(n.revenue) as total_revenue
FROM nonprofits_search n
JOIN zip_county_mapping z ON n.zip_code = z.zcta
GROUP BY z.county_name, z.state_fips
ORDER BY nonprofit_count DESC;
```
## Adding County Data to Cities
To add county information for cities, you have several options:
### Option 1: Geocoding API
Use a geocoding service to look up county from city coordinates:
```python
import pandas as pd
from geopy.geocoders import Nominatim
cities_df = pd.read_parquet('data/gold/reference/jurisdictions_cities.parquet')
geolocator = Nominatim(user_agent="open-navigator")
for _, row in cities_df.iterrows():
lat = row['INTPTLAT']
lon = row['INTPTLONG']
location = geolocator.reverse(f"{lat}, {lon}")
county = location.raw['address'].get('county', '')
# Update database with county
```
### Option 2: Census Relationship Files
Download state-specific place-to-county crosswalk files:
1. Visit: https://www2.census.gov/geo/docs/maps-data/data/rel2020/place/
2. Download state files (e.g., `tab20_place20_county20_01.txt` for Alabama)
3. Process each state file to extract place-to-county mappings
### Option 3: OpenStreetMap
Use OpenStreetMap data which includes county (administrative level) information:
```python
from OSMPythonTools.nominatim import Nominatim
nominatim = Nominatim()
result = nominatim.query('Los Angeles, CA', params={'addressdetails': 1})
county = result.toJSON()[0]['address']['county']
```
## County-Based Aggregation Examples
### Legislative Activity by County
```python
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:password@localhost:5433/open_navigator')
# Aggregate bills by county (requires joining with sponsor locations)
query = """
SELECT
s.county,
s.state,
COUNT(DISTINCT b.id) as bill_count,
COUNT(DISTINCT CASE WHEN b.classification = 'bill' THEN b.id END) as regular_bills,
COUNT(DISTINCT CASE WHEN b.classification = 'resolution' THEN b.id END) as resolutions
FROM bills b
JOIN sponsors sp ON b.id = sp.bill_id
JOIN legislators l ON sp.person_id = l.id
JOIN jurisdictions_search s ON l.district_id = s.geoid
WHERE s.county IS NOT NULL
GROUP BY s.county, s.state
ORDER BY bill_count DESC
LIMIT 20;
"""
df = pd.read_sql(query, engine)
print(df)
```
### Nonprofit Density by County
```python
# Requires ZIP code to county mapping
query = """
SELECT
z.county_name,
COUNT(DISTINCT n.ein) as nonprofit_count,
ROUND(COUNT(DISTINCT n.ein)::numeric / c.population * 100000, 2) as nonprofits_per_100k
FROM nonprofits_search n
JOIN zip_county_mapping z ON n.zip_code = z.zcta
JOIN (
SELECT county, state, SUM(population) as population
FROM jurisdictions_search
WHERE type = 'county'
GROUP BY county, state
) c ON z.county_name = c.county
GROUP BY z.county_name, c.population
HAVING c.population > 100000
ORDER BY nonprofits_per_100k DESC
LIMIT 20;
"""
```
## Future Enhancements
1. **Automated Geocoding**: Add a pipeline to geocode all cities and assign counties
2. **County Profiles**: Create dedicated county profile pages showing:
- All jurisdictions within the county
- Legislative activity
- Nonprofit statistics
- Meeting calendar
3. **County Comparison Tool**: Side-by-side comparison of county metrics
4. **County-Level Maps**: Interactive maps showing county-level heatmaps
## Data Files
All county-related mapping files are stored in `data/gold/reference/`:
| File | Description | Records |
|------|-------------|---------|
| `jurisdictions_counties.parquet` | All U.S. counties | 3,222 |
| `zip_county_mapping.parquet` | ZIP/ZCTA to county | 33,791 |
| `jurisdictions_cities.parquet` | All cities (no county yet) | 32,333 |
| `jurisdictions_townships.parquet` | Townships (county in GEOID) | 36,421 |
| `jurisdictions_school_districts.parquet` | School districts | 13,326 |
## Scripts
### Download County Mappings
```bash
# Download Census relationship files and create mappings
python scripts/data/download_county_mappings.py
```
This script:
- Downloads ZCTA-to-county relationship file from Census Bureau
- Processes it into a clean parquet file
- Shows instructions for additional manual downloads
### Update Database
```bash
# Update jurisdictions_search table with county data
python scripts/data/update_jurisdiction_counties.py
```
This script:
- Updates townships with county information (from GEOID)
- Reports on coverage statistics
- Identifies gaps in county data
## Troubleshooting
### County Field is NULL
If the `county` field is NULL for jurisdictions:
1. **Check if data exists**:
```sql
SELECT type, COUNT(*), COUNT(county)
FROM jurisdictions_search
GROUP BY type;
```
2. **Run the update script**:
```bash
python scripts/data/update_jurisdiction_counties.py
```
3. **For cities**: County data requires additional Census files or geocoding
### ZIP Code Not Found
If a ZIP code isn't in the mapping:
1. Check if it's a valid ZIP: Some ZIP codes are for PO boxes or specific buildings
2. Use the ZCTA (ZIP Code Tabulation Area) instead - it's the Census approximation
3. Fall back to city/state lookup
### Search Filtering Not Working
If county filtering isn't working in the search API:
1. Verify the API endpoint supports the `county` parameter
2. Check that the county name is exact (include "County" suffix)
3. Use URL encoding for county names with spaces
## Related Documentation
- [Data Sources](../data-sources/census.md) - Census Bureau data sources
- [Search API](../api/search.md) - Search API documentation
- [Database Schema](../development/database-schema.md) - Database structure