open-navigator / website /docs /development /county-data-status.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
metadata
sidebar_position: 8

County Search and Aggregation - Status Summary

Issue Identified

County search filtering is not working because:

  1. ❌ The county field in jurisdictions_search table is NULL for most records
  2. ❌ Cities don't have county data (Census gazetteer files don't include it)
  3. ⚠️ Townships have county data but database update failed due to transaction errors

What We Have Now

✅ Created

  1. ZIP Code to County Mapping (data/gold/reference/zip_county_mapping.parquet)

    • 33,791 ZIP codes mapped to counties
    • Downloaded from Census Bureau 2020 ZCTA-to-County relationship file
  2. Scripts:

    • scripts/data/download_county_mappings.py - Downloads Census relationship files
    • scripts/data/update_jurisdiction_counties.py - Updates database with county data
  3. Documentation: website/docs/guides/county-aggregation.md

    • Complete guide on county-level aggregation
    • Examples of queries and API usage
    • Future enhancement roadmap

⚠️ Partially Working

  • Township County Mapping: Code works (infers from GEOID) but needs database fix
  • Search API: Already supports county parameter, just needs data

❌ Not Yet Available

  • City to County Mapping: Census doesn't provide this in gazetteer files
    • Need geocoding API OR state-specific Census relationship files
    • Affects 32,333 cities

How to Fix County Search

Quick Fix: Update Townships (23,318 records)

The update script has a bug where database transaction errors cause rollback. Fix:

# In update_township_with_counties(), add error handling per row:
for _, row in townships_df.iterrows():
    try:
        # Update code...
        conn.commit()  # Commit each row individually
    except Exception as e:
        conn.rollback()  # Rollback only this row
        logger.error(f"Error: {e}")

Then run:

python scripts/data/update_jurisdiction_counties.py

This will populate county data for townships (64% of non-county jurisdictions).

Complete Fix: Add City-County Mapping

Three options:

Option 1: Use Geocoding (Recommended for now)

  • Use city lat/lon coordinates (already in data)
  • Call geocoding API (Nominatim, Google, etc.)
  • Free tier available

Option 2: Download Census Relationship Files

  • Download state-by-state from Census Bureau
  • URL: https://www2.census.gov/geo/docs/maps-data/data/rel2020/place/
  • Process each state file

Option 3: Use OpenStreetMap

  • Query OSM Nominatim API for each city
  • Extract county from administrative boundaries

Current Database State

Type              Total    With County   Percent
-------------------------------------------------
city             32,333            0      0.0%
county            3,222            0      0.0%
school_district  13,326            0      0.0%
township         36,421            0      0.0%  (should be 64% after fix)

API Already Works

The Search API in api/routes/search_postgres.py already has county filtering:

# City filter
if city:
    where_clauses.append(f"LOWER(name) LIKE LOWER(${param_idx})")
    params.append(f"%{city}%")

Just missing: county filter (which would be trivial to add once data exists)

Frontend Already Uses It

The Home.tsx component already passes county to the API:

if (searchScope === 'county' || searchScope === 'city') {
  if (location.county) params.set('county', location.county)
}

Next Steps (Priority Order)

1. Fix Township Update (10 minutes)

Edit scripts/data/update_jurisdiction_counties.py to commit per-row instead of in batch. This will populate 23,318 township records with county data.

2. Add County Filter to Search API (5 minutes)

Add to api/routes/search_postgres.py:

# County filter
if county:
    where_clauses.append(f"county = ${param_idx}")
    params.append(county)
    param_idx += 1

3. Add City-County Geocoding (1-2 hours)

Create script to geocode all 32,333 cities using Nominatim:

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="open-navigator")
# Geocode each city, extract county, update database
# Add rate limiting (1 request/second for free tier)

4. Test County Search

Once townships have county data:

  1. Search for "Boston" with scope="county"
  2. Should return all jurisdictions in the county
  3. Verify heatmap/stats aggregate correctly

Files Created

  1. /home/developer/projects/open-navigator/scripts/data/download_county_mappings.py
  2. /home/developer/projects/open-navigator/scripts/data/update_jurisdiction_counties.py
  3. /home/developer/projects/open-navigator/website/docs/guides/county-aggregation.md
  4. /home/developer/projects/open-navigator/data/gold/reference/zip_county_mapping.parquet

Summary

Problem: County field is empty in database → county filtering doesn't work

Root Cause: Census gazetteer files don't include county for cities/places

Solution Created:

  1. ✅ Downloaded ZIP→County mapping (33,791 records)
  2. ✅ Created scripts to update database
  3. ⚠️ Township updates ready but transaction error needs fix
  4. ❌ City→County needs geocoding or additional Census files

Impact: Once township fix is applied, 64% of jurisdictions will have county data. City geocoding will bring it to 100%.