Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
OpenStates Data Scripts
Scripts for working with OpenStates legislative data.
Data Source
- Website: https://openstates.org/
- API Docs: https://docs.openstates.org/api-v3/
- Bulk Data: https://openstates.org/data/
- Coverage: 52 US states and territories
- Data Types: Bills, legislators, votes, committees
Scripts
Download & Import
bulk_legislative_download.py- Download OpenStates bulk data dumps (PostgreSQL or CSV)download_documents.py- Download 4.5M+ bill documents from PostgreSQL databaseload_openstates_csv.sh- Load CSV exports into databaseload_openstates_people.py- Load legislator data from GitHub repo
Schema & Export
create_openstates_schema.py- Create PostgreSQL schema for OpenStates dataexport_openstates_to_gold.py- Export from PostgreSQL to Gold Parquet files
Processing
aggregate_bills_from_postgres.py- Aggregate bill statistics by state/topiclegislative_tracker.py- Track legislative activity
Usage Examples
# Download April 2026 bulk data to PostgreSQL
python bulk_legislative_download.py --postgres --month 2026-04
# Download bill documents from database (2025 only)
python download_documents.py --years 2025 --type documents
# Download all documents for recent years
python download_documents.py --years 2024,2025 --resume
# Test document download with limit
python download_documents.py --limit 100 --dry-run
# Create database schema
python create_openstates_schema.py
# Load legislator data
python load_openstates_people.py
# Export to Gold format
python export_openstates_to_gold.py
# Aggregate bill statistics
python aggregate_bills_from_postgres.py
Document Downloader Details
The download_documents.py script downloads actual bill documents (PDFs, Word docs, etc.) from state legislature websites.
Database Requirements:
- PostgreSQL with OpenStates data (from
bulk_legislative_download.py) - Default:
postgresql://postgres:password@localhost:5433/openstates
Document Types:
- Bill Versions (3.5M): Text of bills as introduced, amended, enrolled, etc.
- Bill Documents (1M): Fiscal notes, committee statements, amendments, etc.
Features:
- Organizes by year:
/mnt/d/openstates_documents/documents/2025/ - Snake_case filenames:
hb_1234_fiscal_note_introduced.pdf - Resume capability with JSON logging
- Progress updates every 1,000 files
- Rate limiting (0.1s between requests)
- Handles 403/404 errors gracefully
Output Structure:
/mnt/d/openstates_documents/
βββ versions/ # Bill version texts
β βββ 2017/
β βββ 2025/
β βββ ...
βββ documents/ # Supporting documents
β βββ 2025/
β βββ ...
βββ download_log.json # Progress tracking
Recommended Usage:
# Start with recent year (testing)
python download_documents.py --years 2025 --type documents --limit 1000
# Download all 2024-2025 documents
python download_documents.py --years 2024,2025 --resume
# Full download (WARNING: 4.5M files, ~2TB, takes days!)
python download_documents.py --resume
Important Notes:
- 4.5 million documents will take several days to download
- Estimated size: ~2TB (average 500KB per document)
- Use
--resumeto continue interrupted downloads - Progress saved every 100 downloads
- Failed downloads logged separately
Data License
OpenStates data is available under various open licenses. See https://openstates.org/data/ for details.