Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 8,697 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | ---
sidebar_position: 5
---
# County-Level Data Aggregation
This guide explains how to aggregate Open Navigator statistics by county.
## Overview
County-level aggregation allows you to:
- Filter search results by county
- Analyze legislation impact at the county level
- Track nonprofit activity within counties
- Monitor civic engagement metrics by county
## Current Status
### Data Available
✅ **ZIP Code to County Mapping** - We have a complete mapping of ZIP codes (ZCTAs) to counties:
- File: `data/gold/reference/zip_county_mapping.parquet`
- Coverage: 33,791 ZIP codes mapped to counties
- Source: Census Bureau 2020 ZCTA-to-County relationship file
✅ **Township to County Mapping** - Townships encode their county in the GEOID:
- Coverage: 36,421 townships
- Method: First 5 digits of township GEOID = state FIPS + county FIPS
✅ **All Counties** - Complete list of U.S. counties:
- File: `data/gold/reference/jurisdictions_counties.parquet`
- Coverage: 3,222 counties
- Includes: Population, area, coordinates
### Data Not Yet Available
⚠️ **City to County Mapping** - Cities don't include county in Census gazetteer files:
- Cities: 32,333 records
- Current county data: None
- Needed: Census place-to-county relationship files or geocoding
⚠️ **School District to County** - School districts often span multiple counties:
- School districts: 13,326 records
- Current county data: None
- Needed: Census school district-to-county relationship files
## Using County Data
### Query by County
```sql
-- Find all jurisdictions in a county
SELECT name, type, state, county, population
FROM jurisdictions_search
WHERE county = 'Los Angeles County'
AND state = 'CA';
-- Find all townships in a county
SELECT name, type, county, area_sq_miles
FROM jurisdictions_search
WHERE type = 'township'
AND county = 'Cook County'
AND state = 'IL';
```
### API Filtering
The `/api/search/` endpoint supports county filtering:
```typescript
// Search for jurisdictions in a county
const response = await api.get('/search/', {
params: {
q: 'city council',
types: 'jurisdictions',
state: 'CA',
county: 'Los Angeles County',
limit: 20
}
});
```
### ZIP Code to County Lookup
```python
import pandas as pd
# Load ZIP to county mapping
zip_county = pd.read_parquet('data/gold/reference/zip_county_mapping.parquet')
# Look up county for a ZIP code
zip_code = '90210'
county = zip_county[zip_county['zcta'] == zip_code]['county_name'].values[0]
print(f"ZIP {zip_code} is in {county}")
# Output: ZIP 90210 is in Los Angeles County
```
### County Statistics
```sql
-- Count jurisdictions per county
SELECT county, state, COUNT(*) as jurisdiction_count
FROM jurisdictions_search
WHERE county IS NOT NULL
GROUP BY county, state
ORDER BY jurisdiction_count DESC
LIMIT 20;
-- Aggregate nonprofits by county (when county data is available)
-- This requires joining with nonprofit location data
SELECT
z.county_name,
z.state_fips,
COUNT(DISTINCT n.ein) as nonprofit_count,
SUM(n.revenue) as total_revenue
FROM nonprofits_search n
JOIN zip_county_mapping z ON n.zip_code = z.zcta
GROUP BY z.county_name, z.state_fips
ORDER BY nonprofit_count DESC;
```
## Adding County Data to Cities
To add county information for cities, you have several options:
### Option 1: Geocoding API
Use a geocoding service to look up county from city coordinates:
```python
import pandas as pd
from geopy.geocoders import Nominatim
cities_df = pd.read_parquet('data/gold/reference/jurisdictions_cities.parquet')
geolocator = Nominatim(user_agent="open-navigator")
for _, row in cities_df.iterrows():
lat = row['INTPTLAT']
lon = row['INTPTLONG']
location = geolocator.reverse(f"{lat}, {lon}")
county = location.raw['address'].get('county', '')
# Update database with county
```
### Option 2: Census Relationship Files
Download state-specific place-to-county crosswalk files:
1. Visit: https://www2.census.gov/geo/docs/maps-data/data/rel2020/place/
2. Download state files (e.g., `tab20_place20_county20_01.txt` for Alabama)
3. Process each state file to extract place-to-county mappings
### Option 3: OpenStreetMap
Use OpenStreetMap data which includes county (administrative level) information:
```python
from OSMPythonTools.nominatim import Nominatim
nominatim = Nominatim()
result = nominatim.query('Los Angeles, CA', params={'addressdetails': 1})
county = result.toJSON()[0]['address']['county']
```
## County-Based Aggregation Examples
### Legislative Activity by County
```python
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:password@localhost:5433/open_navigator')
# Aggregate bills by county (requires joining with sponsor locations)
query = """
SELECT
s.county,
s.state,
COUNT(DISTINCT b.id) as bill_count,
COUNT(DISTINCT CASE WHEN b.classification = 'bill' THEN b.id END) as regular_bills,
COUNT(DISTINCT CASE WHEN b.classification = 'resolution' THEN b.id END) as resolutions
FROM bills b
JOIN sponsors sp ON b.id = sp.bill_id
JOIN legislators l ON sp.person_id = l.id
JOIN jurisdictions_search s ON l.district_id = s.geoid
WHERE s.county IS NOT NULL
GROUP BY s.county, s.state
ORDER BY bill_count DESC
LIMIT 20;
"""
df = pd.read_sql(query, engine)
print(df)
```
### Nonprofit Density by County
```python
# Requires ZIP code to county mapping
query = """
SELECT
z.county_name,
COUNT(DISTINCT n.ein) as nonprofit_count,
ROUND(COUNT(DISTINCT n.ein)::numeric / c.population * 100000, 2) as nonprofits_per_100k
FROM nonprofits_search n
JOIN zip_county_mapping z ON n.zip_code = z.zcta
JOIN (
SELECT county, state, SUM(population) as population
FROM jurisdictions_search
WHERE type = 'county'
GROUP BY county, state
) c ON z.county_name = c.county
GROUP BY z.county_name, c.population
HAVING c.population > 100000
ORDER BY nonprofits_per_100k DESC
LIMIT 20;
"""
```
## Future Enhancements
1. **Automated Geocoding**: Add a pipeline to geocode all cities and assign counties
2. **County Profiles**: Create dedicated county profile pages showing:
- All jurisdictions within the county
- Legislative activity
- Nonprofit statistics
- Meeting calendar
3. **County Comparison Tool**: Side-by-side comparison of county metrics
4. **County-Level Maps**: Interactive maps showing county-level heatmaps
## Data Files
All county-related mapping files are stored in `data/gold/reference/`:
| File | Description | Records |
|------|-------------|---------|
| `jurisdictions_counties.parquet` | All U.S. counties | 3,222 |
| `zip_county_mapping.parquet` | ZIP/ZCTA to county | 33,791 |
| `jurisdictions_cities.parquet` | All cities (no county yet) | 32,333 |
| `jurisdictions_townships.parquet` | Townships (county in GEOID) | 36,421 |
| `jurisdictions_school_districts.parquet` | School districts | 13,326 |
## Scripts
### Download County Mappings
```bash
# Download Census relationship files and create mappings
python scripts/data/download_county_mappings.py
```
This script:
- Downloads ZCTA-to-county relationship file from Census Bureau
- Processes it into a clean parquet file
- Shows instructions for additional manual downloads
### Update Database
```bash
# Update jurisdictions_search table with county data
python scripts/data/update_jurisdiction_counties.py
```
This script:
- Updates townships with county information (from GEOID)
- Reports on coverage statistics
- Identifies gaps in county data
## Troubleshooting
### County Field is NULL
If the `county` field is NULL for jurisdictions:
1. **Check if data exists**:
```sql
SELECT type, COUNT(*), COUNT(county)
FROM jurisdictions_search
GROUP BY type;
```
2. **Run the update script**:
```bash
python scripts/data/update_jurisdiction_counties.py
```
3. **For cities**: County data requires additional Census files or geocoding
### ZIP Code Not Found
If a ZIP code isn't in the mapping:
1. Check if it's a valid ZIP: Some ZIP codes are for PO boxes or specific buildings
2. Use the ZCTA (ZIP Code Tabulation Area) instead - it's the Census approximation
3. Fall back to city/state lookup
### Search Filtering Not Working
If county filtering isn't working in the search API:
1. Verify the API endpoint supports the `county` parameter
2. Check that the county name is exact (include "County" suffix)
3. Use URL encoding for county names with spaces
## Related Documentation
- [Data Sources](../data-sources/census.md) - Census Bureau data sources
- [Search API](../api/search.md) - Search API documentation
- [Database Schema](../development/database-schema.md) - Database structure
|