File size: 2,599 Bytes
896453f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Census Bureau Data URL Fix

## Problem
The original Census Bureau data URLs were returning 404 errors because the data structure changed.

## Solution

### Updated URLs (2022 Census of Governments)

The Census Bureau publishes data as **ZIP files containing Excel spreadsheets**, not direct CSV files.

**New URLs:**
- **Counties**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org05.zip
- **Municipalities**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org06.zip  
- **School Districts**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org09.zip
- **Special Districts**: https://www2.census.gov/programs-surveys/gus/tables/2022/cog2022_cg2200org08.zip

### Required Dependencies

To process Excel files from Census Bureau:

```bash
pip install openpyxl
```

### How It Works

1. **Downloads ZIP file** from Census Bureau
2. **Extracts Excel file** (.xlsx) from ZIP
3. **Converts to CSV** using pandas
4. **Caches locally** (7-day cache)

### Installation

```bash
source venv/bin/activate
pip install pyspark delta-spark openpyxl
```

### Usage

```bash
python main.py discover-jurisdictions --limit 10
```

The system will:
- Download Census ZIP files automatically
- Extract and convert Excel → CSV
- Cache for 7 days to avoid re-downloading
- Process jurisdiction data into Delta Lake

---

## Data Source Reference

**Official Page**: https://www.census.gov/data/tables/2022/econ/gus/2022-governments.html

**Available Tables:**
- Table 2: Local Governments by Type and State
- Table 5: County Governments by Population-Size Group
- Table 6: Subcounty General-Purpose Governments
- Table 8: Special District Governments by Function
- Table 9: Public School Systems by Type

**Update Frequency**: Census of Governments runs every 5 years (2017, 2022, 2027...)

**Next Update**: 2027 Census of Governments

---

## Troubleshooting

### Missing openpyxl
```
ModuleNotFoundError: No module named 'openpyxl'
```
**Fix**: `pip install openpyxl`

### ZIP Extraction Fails
Check disk space in `data/cache/census/` directory

### Still Getting 404
The Census Bureau may have moved files. Check:
https://www.census.gov/programs-surveys/gus/data/datasets.html

---

## Alternative: Manual Download

If automated download fails:

1. Visit: https://www.census.gov/data/tables/2022/econ/gus/2022-governments.html
2. Download ZIP files manually
3. Extract Excel files
4. Place in `data/cache/census/` as:
   - `counties_20260421.csv`
   - `municipalities_20260421.csv`
   - etc.

The system will use cached files automatically.