jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
metadata
sidebar_position: 8

Census American Community Survey (ACS)

Add demographic, economic, housing, and social data from the U.S. Census Bureau's American Community Survey to enrich your civic engagement analysis.

Overview

The American Community Survey (ACS) is the premier source for detailed population and housing information about America. It provides data for communities across the United States, Puerto Rico, and Island Areas.

What's Included

  • Demographics: Age, race, ethnicity, language, citizenship
  • Economics: Income, poverty, employment, occupation
  • Housing: Occupancy, value, rent, housing costs
  • Education: School enrollment, educational attainment
  • Health: Health insurance coverage by age and type
  • Social: Disability status, veteran status, commuting

ACS vs. Census of Governments

Dataset Purpose What it Measures
Census of Governments Jurisdiction discovery Lists all government entities (cities, counties, districts)
American Community Survey (ACS) Community demographics Population characteristics, economics, housing

Use both together: Census of Governments tells you which jurisdictions exist, ACS tells you about the people who live there.

๐Ÿš€ Quick Start

1. Get a Census API Key (Recommended)

While optional, an API key increases your rate limit from 500 to 5,000 requests per day.

  1. Visit: https://api.census.gov/data/key_signup.html
  2. Enter your email and organization
  3. Check email for API key
  4. Add to .env file:
CENSUS_API_KEY=your_key_here

2. Run the ACS Ingestion Script

# Activate virtual environment
source .venv/bin/activate

# Navigate to script directory
cd scripts/datasources/census

# Run the example (downloads sample data)
python acs_ingestion.py

This will:

  • Download median household income for all U.S. counties
  • Download health insurance data for California
  • Cache data to data/cache/acs/

๐Ÿ“Š Available Data Tables

Demographics

Table Code Description Use Case
B01001 Sex by Age Identify communities with children (dental screening priority)
B02001 Race Analyze health equity across racial groups
B03002 Hispanic or Latino Origin by Race Understand demographic composition
B05001 Nativity and Citizenship Status Language access planning
B16001 Language Spoken at Home Multilingual outreach needs

Economics

Table Code Description Use Case
B19013 Median Household Income Target low-income communities for programs
B17001 Poverty Status Medicaid eligibility analysis
B23025 Employment Status Economic health assessment
C24010 Sex by Occupation Workforce composition

Health Insurance โญ Critical for Oral Health Policy

Table Code Description Use Case
B27001 Health Insurance Coverage Status by Age Overall insurance coverage rates
B27010 Health Insurance Coverage (Under 19) Child dental insurance coverage
C27007 Medicaid/Means-Tested Public Coverage Medicaid enrollment by community

Education

Table Code Description Use Case
B15003 Educational Attainment Community education levels
B14001 School Enrollment by Age Number of school-aged children

๐Ÿ’ป Usage Examples

Example 1: Download Data for All Counties

import asyncio
from pathlib import Path
from scripts.datasources.census.acs_ingestion import ACSDataIngestion

async def download_county_data():
    # Initialize with default cache directory
    acs = ACSDataIngestion()
    
    # Download median household income for all U.S. counties
    income_df = await acs.download_acs_data_api(
        table="B19013",        # Median household income
        geography="county",    # County level
        state="*"             # All states
    )
    
    print(f"Downloaded {len(income_df)} counties")
    print(income_df.head())

asyncio.run(download_county_data())

Example 2: Child Health Insurance Coverage

Critical for oral health policy analysis!

async def analyze_child_insurance():
    acs = ACSDataIngestion()
    
    # Download health insurance for children under 19
    child_insurance_df = await acs.download_acs_data_api(
        table="B27010",        # Health insurance (Under 19)
        geography="county",
        state="*"
    )
    
    # This table includes:
    # - With health insurance
    # - With public coverage (Medicaid/CHIP)
    # - With private coverage
    # - No health insurance
    
    return child_insurance_df

df = asyncio.run(analyze_child_insurance())

Example 3: Download Multiple Tables at Once

async def download_comprehensive_data():
    acs = ACSDataIngestion()
    
    # Download all key demographic tables for California
    ca_data = await acs.download_all_demographics(
        geography="county",
        state="06"  # California FIPS code
    )
    
    # Returns dictionary with multiple DataFrames
    for table_code, df in ca_data.items():
        print(f"{table_code}: {len(df)} counties")

asyncio.run(download_comprehensive_data())

Example 4: Use Cached Data

acs = ACSDataIngestion()

# First call downloads from API
df1 = await acs.download_acs_data_api("B19013", "county", "*")

# Subsequent calls use cached Parquet file (instant!)
df2 = acs.get_cached_data("B19013", "county", "*")

print(f"Same data: {df1.equals(df2)}")  # True

๐Ÿ—„๏ธ Data Storage Options

Option 1: Default Cache (Recommended for Development)

# Uses data/cache/acs/ in project directory
acs = ACSDataIngestion()

Location: /home/developer/projects/open-navigator/data/cache/acs/

Option 2: D Drive (Windows)

from pathlib import Path

# Store all ACS data on D drive
acs = ACSDataIngestion(data_dir=Path("D:/open-navigator-data/acs"))

Location: D:\open-navigator-data\acs\

Option 3: External Drive (Linux/Mac)

# Mount external drive first, then:
acs = ACSDataIngestion(data_dir=Path("/mnt/external/acs-data"))

Location: /mnt/external/acs-data/

Option 4: Network Storage

# For shared team access
acs = ACSDataIngestion(data_dir=Path("//server/shared/acs"))

๐Ÿ“ Data File Format

Downloaded data is cached as Parquet files for fast loading:

data/cache/acs/
โ”œโ”€โ”€ B19013_county_*_2022.parquet   # Median income, all counties
โ”œโ”€โ”€ B27010_county_06_2022.parquet  # Child insurance, CA only
โ”œโ”€โ”€ B01001_place_*_2022.parquet    # Age/sex, all cities
โ””โ”€โ”€ acs_2022_ALL/                  # Bulk download (if used)

Parquet advantages:

  • 10x smaller than CSV
  • 100x faster to load
  • Preserves data types
  • Columnar storage (efficient queries)

๐ŸŒ Geography Levels

ACS data is available at multiple geographic levels:

Level Code Example Records (approx.)
National us United States 1
State state California, Texas 50
County county Los Angeles County 3,200
Place place San Francisco city 19,500
Tract tract Neighborhood-level 85,000
County Subdivision cousub Townships 36,000

Choose based on your analysis needs:

  • State-level: Policy comparison across states
  • County-level: Regional analysis
  • Place-level: City-specific programs
  • Tract-level: Neighborhood targeting (large datasets!)

๐Ÿ”— Integration with Open Navigator

Enriching Jurisdiction Data

Combine ACS demographics with jurisdiction discovery:

from discovery.census_ingestion import CensusGovernmentIngestion
from scripts.datasources.census.acs_ingestion import ACSDataIngestion

# Step 1: Get list of all counties
census = CensusGovernmentIngestion()
counties_df = await census.download_census_data("counties")

# Step 2: Add demographic data from ACS
acs = ACSDataIngestion()
demographics = await acs.download_acs_data_api("B19013", "county", "*")

# Step 3: Join on FIPS code
enriched = counties_df.merge(demographics, on="fips", how="left")

# Now you have: county name, URL, population, AND median income!

Targeting High-Need Communities

Identify counties for oral health program targeting:

async def find_high_need_counties():
    acs = ACSDataIngestion()
    
    # Get poverty data
    poverty_df = await acs.download_acs_data_api("B17001", "county", "*")
    
    # Get child health insurance
    child_insurance_df = await acs.download_acs_data_api("B27010", "county", "*")
    
    # Combine datasets
    combined = poverty_df.merge(child_insurance_df, on=["state", "county"])
    
    # Filter for high poverty + low insurance coverage
    high_need = combined[
        (combined["poverty_rate"] > 0.15) &  # > 15% poverty
        (combined["uninsured_children"] > 100)  # > 100 uninsured kids
    ]
    
    return high_need

โšก Performance Tips

1. Use State Filters

# โŒ Slow: Downloads all 3,200 counties
all_counties = await acs.download_acs_data_api("B19013", "county", "*")

# โœ… Fast: Downloads only California's 58 counties
ca_counties = await acs.download_acs_data_api("B19013", "county", "06")

2. Leverage Caching

# First run: Downloads from API (slow)
df1 = await acs.download_acs_data_api("B19013", "county", "*")

# Second run: Loads from Parquet cache (instant!)
df2 = acs.get_cached_data("B19013", "county", "*")

3. Download Multiple Tables in Parallel

async def parallel_download():
    acs = ACSDataIngestion()
    
    # Download 3 tables simultaneously
    results = await asyncio.gather(
        acs.download_acs_data_api("B19013", "county", "*"),
        acs.download_acs_data_api("B27010", "county", "*"),
        acs.download_acs_data_api("B17001", "county", "*"),
    )
    
    income_df, insurance_df, poverty_df = results

4. Avoid Bulk Downloads (Unless Necessary)

The Census Bureau offers bulk downloads of ALL ACS data:

# โš ๏ธ WARNING: This downloads 15 GB!
await acs.download_bulk_files(state="ALL")

Use bulk downloads only if:

  • You need 100+ tables
  • You need tract-level data for entire U.S.
  • You're doing large-scale research

Otherwise: Use targeted API downloads (much faster!)

๐Ÿ“š Resources

Official Documentation

Understanding ACS Data

State FIPS Codes

Common state codes for API queries:

State FIPS State FIPS
Alabama 01 Montana 30
Alaska 02 Nebraska 31
Arizona 04 Nevada 32
Arkansas 05 New Hampshire 33
California 06 New Jersey 34
Colorado 08 New Mexico 35
Connecticut 09 New York 36
Delaware 10 North Carolina 37
Florida 12 Ohio 39
Georgia 13 Oklahoma 40
Hawaii 15 Oregon 41
Illinois 17 Pennsylvania 42
Indiana 18 Texas 48
Iowa 19 Utah 49
Kansas 20 Virginia 51
Louisiana 22 Washington 53
Massachusetts 25 Wisconsin 55
Michigan 26

Full list: https://www.census.gov/library/reference/code-lists/ansi/ansi-codes-for-states.html

๐Ÿ†˜ Troubleshooting

"API request failed: 403"

Cause: Rate limit exceeded (500 requests/day without API key)

Fix: Get a Census API key (see Quick Start above)

"Module 'config.settings' has no attribute 'CENSUS_API_KEY'"

Cause: API key not set in configuration

Fix: Add to .env file:

CENSUS_API_KEY=your_key_here

"No data returned for this geography"

Cause: Not all tables are available at all geography levels

Fix: Check Census API documentation for table availability by geography

Downloads are slow

Solutions:

  1. Use state filters instead of "*"
  2. Use cached data for repeated queries
  3. Download during off-peak hours (late night/early morning EST)
  4. Consider bulk downloads if you need many tables

๐Ÿ”ฎ Next Steps

  1. Explore Available Tables: Run acs.list_available_tables()
  2. Download Sample Data: Try the examples in this guide
  3. Join with Jurisdictions: Combine ACS demographics with jurisdiction URLs
  4. Build Dashboards: Create visualizations of demographic data
  5. Target Programs: Use poverty/insurance data to prioritize outreach

Related Documentation