Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| sidebar_position: 5 | |
| # OpenStates Integration & Contribution Opportunities | |
| This document outlines our integration with OpenStates/Plural Policy and potential opportunities to contribute code back to the open-source community. | |
| ## π References Added to Citations | |
| We have properly cited and referenced the following OpenStates resources: | |
| ### In Root Citations (CITATIONS.md) | |
| - β **OpenStates/Plural Policy** main site | |
| - β **Bulk data downloads** (CSV, JSON, PostgreSQL) | |
| - β **Scrapers repository**: https://github.com/openstates/openstates-scrapers | |
| - β **Local database documentation**: https://docs.openstates.org/contributing/local-database/ | |
| - β **Code of Conduct**: https://docs.openstates.org/code-of-conduct/ | |
| - β **Schema documentation**: https://github.com/openstates/people/blob/master/schema.md | |
| ### In Website Documentation (website/docs/data-sources/citations.md) | |
| - β Comprehensive OpenStates/Plural Policy section | |
| - β PostgreSQL dump setup instructions | |
| - β Contribution guidelines | |
| - β BibTeX citation | |
| ### In Contributing Guide (CONTRIBUTING.md) | |
| - β Code of Conduct alignment with OpenStates | |
| - β Upstream contribution guidelines | |
| - β Testing requirements for scraper contributions | |
| --- | |
| ## π Our Current OpenStates Integration | |
| ### Data We Use | |
| 1. **PostgreSQL Monthly Dumps** (9.8GB+) | |
| - Complete legislative database for all 50 states | |
| - Script: `scripts/bulk_legislative_download.py --postgres --month 2026-04` | |
| - Setup: `scripts/setup_openstates_db.sh` | |
| - Use: Local SQL queries, no API rate limits | |
| 2. **CSV/JSON Session Data** | |
| - Per-state legislative sessions | |
| - Bill text, votes, sponsors | |
| - Committee assignments | |
| 3. **Video Sources** | |
| - YouTube channel URLs from `sources` field | |
| - Granicus video portal links | |
| - Meeting archive locations | |
| ### Our Implementation | |
| **File:** `discovery/openstates_sources.py` | |
| - Fetches jurisdiction data via API | |
| - Extracts video sources (YouTube, Vimeo, Granicus) | |
| - Maps to our jurisdiction database | |
| **File:** `scripts/bulk_legislative_download.py` | |
| - Downloads PostgreSQL dumps | |
| - Downloads CSV/JSON session data | |
| - Handles all 50 states + DC + PR | |
| --- | |
| ## π€ Code We Could Contribute to OpenStates Scrapers | |
| The [openstates-scrapers](https://github.com/openstates/openstates-scrapers) repository uses **Scrapy** to collect legislative data. We have complementary code that could enhance their project: | |
| ### 1. Video Source Discovery Patterns | |
| **Our Code:** `discovery/youtube_channel_discovery.py` | |
| **What it does:** | |
| - Finds **all** YouTube channels for a jurisdiction (not just first match) | |
| - Scrapes homepages for YouTube links | |
| - Uses YouTube Data API for verification | |
| - Discovers Vimeo and Granicus portals | |
| **Potential Contribution:** | |
| - Add video source extraction to OpenStates scrapers | |
| - Enhance `sources` field with verified YouTube channels | |
| - Automate Granicus portal discovery | |
| **Example Pattern:** | |
| ```python | |
| # Our code finds these patterns | |
| patterns = { | |
| "youtube_channel": r"youtube\.com/(?:c/|channel/|user/|@)([\w-]+)", | |
| "vimeo_channel": r"vimeo\.com/([\w-]+)", | |
| "granicus": r"granicus\.com/([^/]+)", | |
| } | |
| ``` | |
| ### 2. Legistar/Granicus Platform Detection | |
| **Our Code:** `discovery/url_discovery_agent.py` | |
| **What it does:** | |
| - Identifies Legistar instances across cities | |
| - Maps Granicus video portals | |
| - Extracts meeting URLs and agendas | |
| **Potential Contribution:** | |
| - Enhance OpenStates scrapers with meeting video links | |
| - Add Legistar meeting agenda extraction | |
| - Contribute URL validation patterns | |
| **Platform Patterns We Use:** | |
| ```python | |
| platforms = { | |
| "granicus": ["granicus.com", "legistar.com"], | |
| "youtube": ["youtube.com", "youtu.be"], | |
| "vimeo": ["vimeo.com"], | |
| } | |
| ``` | |
| ### 3. Meeting Archive Scraping | |
| **Our Code:** `agents/scraper.py` | |
| **What it does:** | |
| - Scrapes PDF meeting minutes | |
| - Extracts text from scanned documents (OCR) | |
| - Parses meeting dates and types | |
| - Handles multiple document formats | |
| **Potential Contribution:** | |
| - Add meeting minutes text extraction to OpenStates | |
| - Enhance bill analysis with meeting context | |
| - Link bills to meeting discussions | |
| --- | |
| ## π How to Contribute to OpenStates Scrapers | |
| Following their [local database documentation](https://docs.openstates.org/contributing/local-database/): | |
| ### 1. Setup OpenStates Development Environment | |
| ```bash | |
| # Clone the scrapers repository | |
| git clone https://github.com/openstates/openstates-scrapers.git | |
| cd openstates-scrapers | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Setup local PostgreSQL database | |
| createdb openstates | |
| # Import schema (if needed) | |
| psql -d openstates -f schema/openstates.sql | |
| ``` | |
| ### 2. Test Your Scraper Locally | |
| ```bash | |
| # Run a specific state scraper | |
| os-update al --scrape --rpm 10 | |
| # Validate data | |
| os-update al --scrape --validate | |
| ``` | |
| ### 3. Follow Their Code of Conduct | |
| All contributions must follow the [OpenStates Code of Conduct](https://docs.openstates.org/code-of-conduct/): | |
| - Be respectful and professional | |
| - Welcome diverse perspectives | |
| - Focus on what's best for the community | |
| - Show empathy towards other contributors | |
| ### 4. Submit Pull Request | |
| ```bash | |
| # Create feature branch | |
| git checkout -b feature/video-sources | |
| # Make changes (add video discovery to a state scraper) | |
| # Example: scrapers/al/videos.py | |
| # Test thoroughly | |
| os-update al --scrape --rpm 10 | |
| # Commit and push | |
| git commit -m "Add video source discovery for Alabama legislature" | |
| git push origin feature/video-sources | |
| # Open PR on GitHub | |
| ``` | |
| --- | |
| ## π― Specific Contribution Ideas | |
| ### Priority 1: Add Video Sources to Scrapers | |
| **Goal:** Enhance the `sources` field with verified video links | |
| **States to Start With:** | |
| - **Alabama** - Has YouTube channel, needs verification | |
| - **California** - @CALegislature (well-documented) | |
| - **Texas** - Multiple chambers on YouTube | |
| - **New York** - Both Assembly and Senate channels | |
| **Implementation:** | |
| ```python | |
| # In scrapers/al/__init__.py | |
| class AlabamaScraper(BaseScraper): | |
| def scrape_sources(self): | |
| """Add video sources for Alabama legislature.""" | |
| return { | |
| "youtube": "https://www.youtube.com/@AlabamaLegislature", | |
| "granicus": "https://alabama.granicus.com/ViewPublisher.php?view_id=6", | |
| } | |
| ``` | |
| ### Priority 2: Meeting Minutes Integration | |
| **Goal:** Link bills to meeting discussions | |
| **Use Case:** | |
| - When bill HB123 is discussed in committee | |
| - Link to YouTube timestamp of discussion | |
| - Extract quotes from meeting minutes | |
| - Connect legislators' comments to votes | |
| **Implementation:** | |
| ```python | |
| # Add meeting metadata to bill objects | |
| bill.add_source( | |
| url="https://www.youtube.com/watch?v=xyz&t=1234s", | |
| note="Committee discussion at 20:34" | |
| ) | |
| ``` | |
| ### Priority 3: Granicus Portal Scraping | |
| **Goal:** Automate discovery of Granicus video portals | |
| **Pattern:** | |
| - Many jurisdictions use Granicus for meeting videos | |
| - URLs follow pattern: `{jurisdiction}.granicus.com/ViewPublisher.php?view_id={id}` | |
| - Could automate discovery and link to OpenStates jurisdictions | |
| --- | |
| ## π License Compatibility | |
| ### Our License | |
| - **Code:** Open source (check root LICENSE file) | |
| - **Data:** Citations required (see CITATIONS.md) | |
| ### OpenStates License | |
| - **Code:** BSD-style license (permissive) | |
| - **Data:** Public domain (bulk downloads) | |
| - **Content:** Varies by state (some restrictions) | |
| β **Compatible:** Our code contributions would be compatible with their license. | |
| --- | |
| ## π Required Reading Before Contributing | |
| Before submitting any code to OpenStates, review: | |
| 1. **Local Database Setup**: https://docs.openstates.org/contributing/local-database/ | |
| - How to set up PostgreSQL locally | |
| - How to run scrapers in development | |
| - How to test data quality | |
| 2. **Scraper Development Guide**: https://docs.openstates.org/contributing/scrapers/ | |
| - Scrapy patterns used | |
| - Data validation requirements | |
| - Testing procedures | |
| 3. **Code of Conduct**: https://docs.openstates.org/code-of-conduct/ | |
| - Community standards | |
| - Communication guidelines | |
| - Enforcement policies | |
| 4. **Schema Documentation**: https://github.com/openstates/people/blob/master/schema.md | |
| - Data model structure | |
| - Required vs optional fields | |
| - Relationship patterns | |
| --- | |
| ## π Next Steps | |
| ### For This Project | |
| 1. β **Citations Added** - OpenStates properly credited | |
| 2. β **Code of Conduct** - Aligned with their standards | |
| 3. β **Local Database** - PostgreSQL dumps integrated | |
| 4. β³ **Test Contributions** - Validate our code works with their schema | |
| ### For Community Contribution | |
| 1. **Identify Target State** - Choose state needing video sources | |
| 2. **Test Locally** - Set up OpenStates dev environment | |
| 3. **Develop Scraper** - Add video discovery code | |
| 4. **Submit PR** - Follow their contribution guidelines | |
| 5. **Iterate** - Respond to code review feedback | |
| --- | |
| ## π‘ Benefits of Contributing | |
| **For OpenStates:** | |
| - Enhanced video source coverage | |
| - Better meeting-to-bill linkage | |
| - More comprehensive legislative tracking | |
| **For Our Project:** | |
| - Upstream improvements benefit us | |
| - Community recognition | |
| - Better data quality for all users | |
| **For Civic Tech:** | |
| - Shared infrastructure improvements | |
| - Reduced duplication of effort | |
| - Stronger open-source ecosystem | |
| --- | |
| ## π Questions? | |
| - **OpenStates Discord**: https://discord.gg/openstates | |
| - **GitHub Discussions**: https://github.com/openstates/openstates-scrapers/discussions | |
| - **Email**: Open States team (check repository for contact info) | |
| --- | |
| **Last Updated:** April 29, 2026 | |
| **Maintained By:** Open Navigator team | |