Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 2,599 Bytes
896453f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | # Census Bureau Data URL Fix
## Problem
The original Census Bureau data URLs were returning 404 errors because the data structure changed.
## Solution
### Updated URLs (2022 Census of Governments)
The Census Bureau publishes data as **ZIP files containing Excel spreadsheets**, not direct CSV files.
**New URLs:**
- **Counties**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org05.zip
- **Municipalities**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org06.zip
- **School Districts**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org09.zip
- **Special Districts**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org08.zip
### Required Dependencies
To process Excel files from Census Bureau:
```bash
pip install openpyxl
```
### How It Works
1. **Downloads ZIP file** from Census Bureau
2. **Extracts Excel file** (.xlsx) from ZIP
3. **Converts to CSV** using pandas
4. **Caches locally** (7-day cache)
### Installation
```bash
source venv/bin/activate
pip install pyspark delta-spark openpyxl
```
### Usage
```bash
python main.py discover-jurisdictions --limit 10
```
The system will:
- Download Census ZIP files automatically
- Extract and convert Excel → CSV
- Cache for 7 days to avoid re-downloading
- Process jurisdiction data into Delta Lake
---
## Data Source Reference
**Official Page**: https://www.census.gov/data/tables/2022/econ/gus/2022-governments.html
**Available Tables:**
- Table 2: Local Governments by Type and State
- Table 5: County Governments by Population-Size Group
- Table 6: Subcounty General-Purpose Governments
- Table 8: Special District Governments by Function
- Table 9: Public School Systems by Type
**Update Frequency**: Census of Governments runs every 5 years (2017, 2022, 2027...)
**Next Update**: 2027 Census of Governments
---
## Troubleshooting
### Missing openpyxl
```
ModuleNotFoundError: No module named 'openpyxl'
```
**Fix**: `pip install openpyxl`
### ZIP Extraction Fails
Check disk space in `data/cache/census/` directory
### Still Getting 404
The Census Bureau may have moved files. Check:
https://www.census.gov/programs-surveys/gus/data/datasets.html
---
## Alternative: Manual Download
If automated download fails:
1. Visit: https://www.census.gov/data/tables/2022/econ/gus/2022-governments.html
2. Download ZIP files manually
3. Extract Excel files
4. Place in `data/cache/census/` as:
- `counties_20260421.csv`
- `municipalities_20260421.csv`
- etc.
The system will use cached files automatically.
|