File size: 3,556 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# OpenStates Data Scripts

Scripts for working with [OpenStates](https://openstates.org/) legislative data.

## Data Source

- **Website**: https://openstates.org/
- **API Docs**: https://docs.openstates.org/api-v3/
- **Bulk Data**: https://openstates.org/data/
- **Coverage**: 52 US states and territories
- **Data Types**: Bills, legislators, votes, committees

## Scripts

### Download & Import
- `bulk_legislative_download.py` - Download OpenStates bulk data dumps (PostgreSQL or CSV)
- `download_documents.py` - Download 4.5M+ bill documents from PostgreSQL database
- `load_openstates_csv.sh` - Load CSV exports into database
- `load_openstates_people.py` - Load legislator data from GitHub repo

### Schema & Export
- `create_openstates_schema.py` - Create PostgreSQL schema for OpenStates data
- `export_openstates_to_gold.py` - Export from PostgreSQL to Gold Parquet files

### Processing
- `aggregate_bills_from_postgres.py` - Aggregate bill statistics by state/topic
- `legislative_tracker.py` - Track legislative activity

## Usage Examples

```bash
# Download April 2026 bulk data to PostgreSQL
python bulk_legislative_download.py --postgres --month 2026-04

# Download bill documents from database (2025 only)
python download_documents.py --years 2025 --type documents

# Download all documents for recent years
python download_documents.py --years 2024,2025 --resume

# Test document download with limit
python download_documents.py --limit 100 --dry-run

# Create database schema
python create_openstates_schema.py

# Load legislator data
python load_openstates_people.py

# Export to Gold format
python export_openstates_to_gold.py

# Aggregate bill statistics
python aggregate_bills_from_postgres.py
```

## Document Downloader Details

The `download_documents.py` script downloads actual bill documents (PDFs, Word docs, etc.) from state legislature websites.

**Database Requirements:**
- PostgreSQL with OpenStates data (from `bulk_legislative_download.py`)
- Default: `postgresql://postgres:password@localhost:5433/openstates`

**Document Types:**
- **Bill Versions** (3.5M): Text of bills as introduced, amended, enrolled, etc.
- **Bill Documents** (1M): Fiscal notes, committee statements, amendments, etc.

**Features:**
- Organizes by year: `/mnt/d/openstates_documents/documents/2025/`
- Snake_case filenames: `hb_1234_fiscal_note_introduced.pdf`
- Resume capability with JSON logging
- Progress updates every 1,000 files
- Rate limiting (0.1s between requests)
- Handles 403/404 errors gracefully

**Output Structure:**
```
/mnt/d/openstates_documents/
β”œβ”€β”€ versions/           # Bill version texts
β”‚   β”œβ”€β”€ 2017/
β”‚   β”œβ”€β”€ 2025/
β”‚   └── ...
β”œβ”€β”€ documents/          # Supporting documents
β”‚   β”œβ”€β”€ 2025/
β”‚   └── ...
└── download_log.json   # Progress tracking
```

**Recommended Usage:**
```bash
# Start with recent year (testing)
python download_documents.py --years 2025 --type documents --limit 1000

# Download all 2024-2025 documents
python download_documents.py --years 2024,2025 --resume

# Full download (WARNING: 4.5M files, ~2TB, takes days!)
python download_documents.py --resume
```

**Important Notes:**
- 4.5 million documents will take several days to download
- Estimated size: ~2TB (average 500KB per document)
- Use `--resume` to continue interrupted downloads
- Progress saved every 100 downloads
- Failed downloads logged separately

## Data License

OpenStates data is available under various open licenses. See https://openstates.org/data/ for details.