open-navigator / website /docs /development /openstates-integration.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 5
---
# OpenStates Integration & Contribution Opportunities
This document outlines our integration with OpenStates/Plural Policy and potential opportunities to contribute code back to the open-source community.
## πŸ“š References Added to Citations
We have properly cited and referenced the following OpenStates resources:
### In Root Citations (CITATIONS.md)
- βœ… **OpenStates/Plural Policy** main site
- βœ… **Bulk data downloads** (CSV, JSON, PostgreSQL)
- βœ… **Scrapers repository**: https://github.com/openstates/openstates-scrapers
- βœ… **Local database documentation**: https://docs.openstates.org/contributing/local-database/
- βœ… **Code of Conduct**: https://docs.openstates.org/code-of-conduct/
- βœ… **Schema documentation**: https://github.com/openstates/people/blob/master/schema.md
### In Website Documentation (website/docs/data-sources/citations.md)
- βœ… Comprehensive OpenStates/Plural Policy section
- βœ… PostgreSQL dump setup instructions
- βœ… Contribution guidelines
- βœ… BibTeX citation
### In Contributing Guide (CONTRIBUTING.md)
- βœ… Code of Conduct alignment with OpenStates
- βœ… Upstream contribution guidelines
- βœ… Testing requirements for scraper contributions
---
## πŸ”„ Our Current OpenStates Integration
### Data We Use
1. **PostgreSQL Monthly Dumps** (9.8GB+)
- Complete legislative database for all 50 states
- Script: `scripts/bulk_legislative_download.py --postgres --month 2026-04`
- Setup: `scripts/setup_openstates_db.sh`
- Use: Local SQL queries, no API rate limits
2. **CSV/JSON Session Data**
- Per-state legislative sessions
- Bill text, votes, sponsors
- Committee assignments
3. **Video Sources**
- YouTube channel URLs from `sources` field
- Granicus video portal links
- Meeting archive locations
### Our Implementation
**File:** `discovery/openstates_sources.py`
- Fetches jurisdiction data via API
- Extracts video sources (YouTube, Vimeo, Granicus)
- Maps to our jurisdiction database
**File:** `scripts/bulk_legislative_download.py`
- Downloads PostgreSQL dumps
- Downloads CSV/JSON session data
- Handles all 50 states + DC + PR
---
## 🀝 Code We Could Contribute to OpenStates Scrapers
The [openstates-scrapers](https://github.com/openstates/openstates-scrapers) repository uses **Scrapy** to collect legislative data. We have complementary code that could enhance their project:
### 1. Video Source Discovery Patterns
**Our Code:** `discovery/youtube_channel_discovery.py`
**What it does:**
- Finds **all** YouTube channels for a jurisdiction (not just first match)
- Scrapes homepages for YouTube links
- Uses YouTube Data API for verification
- Discovers Vimeo and Granicus portals
**Potential Contribution:**
- Add video source extraction to OpenStates scrapers
- Enhance `sources` field with verified YouTube channels
- Automate Granicus portal discovery
**Example Pattern:**
```python
# Our code finds these patterns
patterns = {
"youtube_channel": r"youtube\.com/(?:c/|channel/|user/|@)([\w-]+)",
"vimeo_channel": r"vimeo\.com/([\w-]+)",
"granicus": r"granicus\.com/([^/]+)",
}
```
### 2. Legistar/Granicus Platform Detection
**Our Code:** `discovery/url_discovery_agent.py`
**What it does:**
- Identifies Legistar instances across cities
- Maps Granicus video portals
- Extracts meeting URLs and agendas
**Potential Contribution:**
- Enhance OpenStates scrapers with meeting video links
- Add Legistar meeting agenda extraction
- Contribute URL validation patterns
**Platform Patterns We Use:**
```python
platforms = {
"granicus": ["granicus.com", "legistar.com"],
"youtube": ["youtube.com", "youtu.be"],
"vimeo": ["vimeo.com"],
}
```
### 3. Meeting Archive Scraping
**Our Code:** `agents/scraper.py`
**What it does:**
- Scrapes PDF meeting minutes
- Extracts text from scanned documents (OCR)
- Parses meeting dates and types
- Handles multiple document formats
**Potential Contribution:**
- Add meeting minutes text extraction to OpenStates
- Enhance bill analysis with meeting context
- Link bills to meeting discussions
---
## πŸ“ How to Contribute to OpenStates Scrapers
Following their [local database documentation](https://docs.openstates.org/contributing/local-database/):
### 1. Setup OpenStates Development Environment
```bash
# Clone the scrapers repository
git clone https://github.com/openstates/openstates-scrapers.git
cd openstates-scrapers
# Install dependencies
pip install -r requirements.txt
# Setup local PostgreSQL database
createdb openstates
# Import schema (if needed)
psql -d openstates -f schema/openstates.sql
```
### 2. Test Your Scraper Locally
```bash
# Run a specific state scraper
os-update al --scrape --rpm 10
# Validate data
os-update al --scrape --validate
```
### 3. Follow Their Code of Conduct
All contributions must follow the [OpenStates Code of Conduct](https://docs.openstates.org/code-of-conduct/):
- Be respectful and professional
- Welcome diverse perspectives
- Focus on what's best for the community
- Show empathy towards other contributors
### 4. Submit Pull Request
```bash
# Create feature branch
git checkout -b feature/video-sources
# Make changes (add video discovery to a state scraper)
# Example: scrapers/al/videos.py
# Test thoroughly
os-update al --scrape --rpm 10
# Commit and push
git commit -m "Add video source discovery for Alabama legislature"
git push origin feature/video-sources
# Open PR on GitHub
```
---
## 🎯 Specific Contribution Ideas
### Priority 1: Add Video Sources to Scrapers
**Goal:** Enhance the `sources` field with verified video links
**States to Start With:**
- **Alabama** - Has YouTube channel, needs verification
- **California** - @CALegislature (well-documented)
- **Texas** - Multiple chambers on YouTube
- **New York** - Both Assembly and Senate channels
**Implementation:**
```python
# In scrapers/al/__init__.py
class AlabamaScraper(BaseScraper):
def scrape_sources(self):
"""Add video sources for Alabama legislature."""
return {
"youtube": "https://www.youtube.com/@AlabamaLegislature",
"granicus": "https://alabama.granicus.com/ViewPublisher.php?view_id=6",
}
```
### Priority 2: Meeting Minutes Integration
**Goal:** Link bills to meeting discussions
**Use Case:**
- When bill HB123 is discussed in committee
- Link to YouTube timestamp of discussion
- Extract quotes from meeting minutes
- Connect legislators' comments to votes
**Implementation:**
```python
# Add meeting metadata to bill objects
bill.add_source(
url="https://www.youtube.com/watch?v=xyz&t=1234s",
note="Committee discussion at 20:34"
)
```
### Priority 3: Granicus Portal Scraping
**Goal:** Automate discovery of Granicus video portals
**Pattern:**
- Many jurisdictions use Granicus for meeting videos
- URLs follow pattern: `{jurisdiction}.granicus.com/ViewPublisher.php?view_id={id}`
- Could automate discovery and link to OpenStates jurisdictions
---
## πŸ”’ License Compatibility
### Our License
- **Code:** Open source (check root LICENSE file)
- **Data:** Citations required (see CITATIONS.md)
### OpenStates License
- **Code:** BSD-style license (permissive)
- **Data:** Public domain (bulk downloads)
- **Content:** Varies by state (some restrictions)
βœ… **Compatible:** Our code contributions would be compatible with their license.
---
## πŸ“š Required Reading Before Contributing
Before submitting any code to OpenStates, review:
1. **Local Database Setup**: https://docs.openstates.org/contributing/local-database/
- How to set up PostgreSQL locally
- How to run scrapers in development
- How to test data quality
2. **Scraper Development Guide**: https://docs.openstates.org/contributing/scrapers/
- Scrapy patterns used
- Data validation requirements
- Testing procedures
3. **Code of Conduct**: https://docs.openstates.org/code-of-conduct/
- Community standards
- Communication guidelines
- Enforcement policies
4. **Schema Documentation**: https://github.com/openstates/people/blob/master/schema.md
- Data model structure
- Required vs optional fields
- Relationship patterns
---
## πŸš€ Next Steps
### For This Project
1. βœ… **Citations Added** - OpenStates properly credited
2. βœ… **Code of Conduct** - Aligned with their standards
3. βœ… **Local Database** - PostgreSQL dumps integrated
4. ⏳ **Test Contributions** - Validate our code works with their schema
### For Community Contribution
1. **Identify Target State** - Choose state needing video sources
2. **Test Locally** - Set up OpenStates dev environment
3. **Develop Scraper** - Add video discovery code
4. **Submit PR** - Follow their contribution guidelines
5. **Iterate** - Respond to code review feedback
---
## πŸ’‘ Benefits of Contributing
**For OpenStates:**
- Enhanced video source coverage
- Better meeting-to-bill linkage
- More comprehensive legislative tracking
**For Our Project:**
- Upstream improvements benefit us
- Community recognition
- Better data quality for all users
**For Civic Tech:**
- Shared infrastructure improvements
- Reduced duplication of effort
- Stronger open-source ecosystem
---
## πŸ“ž Questions?
- **OpenStates Discord**: https://discord.gg/openstates
- **GitHub Discussions**: https://github.com/openstates/openstates-scrapers/discussions
- **Email**: Open States team (check repository for contact info)
---
**Last Updated:** April 29, 2026
**Maintained By:** Open Navigator team