Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 9,517 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 | ---
sidebar_position: 5
---
# OpenStates Integration & Contribution Opportunities
This document outlines our integration with OpenStates/Plural Policy and potential opportunities to contribute code back to the open-source community.
## π References Added to Citations
We have properly cited and referenced the following OpenStates resources:
### In Root Citations (CITATIONS.md)
- β
**OpenStates/Plural Policy** main site
- β
**Bulk data downloads** (CSV, JSON, PostgreSQL)
- β
**Scrapers repository**: https://github.com/openstates/openstates-scrapers
- β
**Local database documentation**: https://docs.openstates.org/contributing/local-database/
- β
**Code of Conduct**: https://docs.openstates.org/code-of-conduct/
- β
**Schema documentation**: https://github.com/openstates/people/blob/master/schema.md
### In Website Documentation (website/docs/data-sources/citations.md)
- β
Comprehensive OpenStates/Plural Policy section
- β
PostgreSQL dump setup instructions
- β
Contribution guidelines
- β
BibTeX citation
### In Contributing Guide (CONTRIBUTING.md)
- β
Code of Conduct alignment with OpenStates
- β
Upstream contribution guidelines
- β
Testing requirements for scraper contributions
---
## π Our Current OpenStates Integration
### Data We Use
1. **PostgreSQL Monthly Dumps** (9.8GB+)
- Complete legislative database for all 50 states
- Script: `scripts/bulk_legislative_download.py --postgres --month 2026-04`
- Setup: `scripts/setup_openstates_db.sh`
- Use: Local SQL queries, no API rate limits
2. **CSV/JSON Session Data**
- Per-state legislative sessions
- Bill text, votes, sponsors
- Committee assignments
3. **Video Sources**
- YouTube channel URLs from `sources` field
- Granicus video portal links
- Meeting archive locations
### Our Implementation
**File:** `discovery/openstates_sources.py`
- Fetches jurisdiction data via API
- Extracts video sources (YouTube, Vimeo, Granicus)
- Maps to our jurisdiction database
**File:** `scripts/bulk_legislative_download.py`
- Downloads PostgreSQL dumps
- Downloads CSV/JSON session data
- Handles all 50 states + DC + PR
---
## π€ Code We Could Contribute to OpenStates Scrapers
The [openstates-scrapers](https://github.com/openstates/openstates-scrapers) repository uses **Scrapy** to collect legislative data. We have complementary code that could enhance their project:
### 1. Video Source Discovery Patterns
**Our Code:** `discovery/youtube_channel_discovery.py`
**What it does:**
- Finds **all** YouTube channels for a jurisdiction (not just first match)
- Scrapes homepages for YouTube links
- Uses YouTube Data API for verification
- Discovers Vimeo and Granicus portals
**Potential Contribution:**
- Add video source extraction to OpenStates scrapers
- Enhance `sources` field with verified YouTube channels
- Automate Granicus portal discovery
**Example Pattern:**
```python
# Our code finds these patterns
patterns = {
"youtube_channel": r"youtube\.com/(?:c/|channel/|user/|@)([\w-]+)",
"vimeo_channel": r"vimeo\.com/([\w-]+)",
"granicus": r"granicus\.com/([^/]+)",
}
```
### 2. Legistar/Granicus Platform Detection
**Our Code:** `discovery/url_discovery_agent.py`
**What it does:**
- Identifies Legistar instances across cities
- Maps Granicus video portals
- Extracts meeting URLs and agendas
**Potential Contribution:**
- Enhance OpenStates scrapers with meeting video links
- Add Legistar meeting agenda extraction
- Contribute URL validation patterns
**Platform Patterns We Use:**
```python
platforms = {
"granicus": ["granicus.com", "legistar.com"],
"youtube": ["youtube.com", "youtu.be"],
"vimeo": ["vimeo.com"],
}
```
### 3. Meeting Archive Scraping
**Our Code:** `agents/scraper.py`
**What it does:**
- Scrapes PDF meeting minutes
- Extracts text from scanned documents (OCR)
- Parses meeting dates and types
- Handles multiple document formats
**Potential Contribution:**
- Add meeting minutes text extraction to OpenStates
- Enhance bill analysis with meeting context
- Link bills to meeting discussions
---
## π How to Contribute to OpenStates Scrapers
Following their [local database documentation](https://docs.openstates.org/contributing/local-database/):
### 1. Setup OpenStates Development Environment
```bash
# Clone the scrapers repository
git clone https://github.com/openstates/openstates-scrapers.git
cd openstates-scrapers
# Install dependencies
pip install -r requirements.txt
# Setup local PostgreSQL database
createdb openstates
# Import schema (if needed)
psql -d openstates -f schema/openstates.sql
```
### 2. Test Your Scraper Locally
```bash
# Run a specific state scraper
os-update al --scrape --rpm 10
# Validate data
os-update al --scrape --validate
```
### 3. Follow Their Code of Conduct
All contributions must follow the [OpenStates Code of Conduct](https://docs.openstates.org/code-of-conduct/):
- Be respectful and professional
- Welcome diverse perspectives
- Focus on what's best for the community
- Show empathy towards other contributors
### 4. Submit Pull Request
```bash
# Create feature branch
git checkout -b feature/video-sources
# Make changes (add video discovery to a state scraper)
# Example: scrapers/al/videos.py
# Test thoroughly
os-update al --scrape --rpm 10
# Commit and push
git commit -m "Add video source discovery for Alabama legislature"
git push origin feature/video-sources
# Open PR on GitHub
```
---
## π― Specific Contribution Ideas
### Priority 1: Add Video Sources to Scrapers
**Goal:** Enhance the `sources` field with verified video links
**States to Start With:**
- **Alabama** - Has YouTube channel, needs verification
- **California** - @CALegislature (well-documented)
- **Texas** - Multiple chambers on YouTube
- **New York** - Both Assembly and Senate channels
**Implementation:**
```python
# In scrapers/al/__init__.py
class AlabamaScraper(BaseScraper):
def scrape_sources(self):
"""Add video sources for Alabama legislature."""
return {
"youtube": "https://www.youtube.com/@AlabamaLegislature",
"granicus": "https://alabama.granicus.com/ViewPublisher.php?view_id=6",
}
```
### Priority 2: Meeting Minutes Integration
**Goal:** Link bills to meeting discussions
**Use Case:**
- When bill HB123 is discussed in committee
- Link to YouTube timestamp of discussion
- Extract quotes from meeting minutes
- Connect legislators' comments to votes
**Implementation:**
```python
# Add meeting metadata to bill objects
bill.add_source(
url="https://www.youtube.com/watch?v=xyz&t=1234s",
note="Committee discussion at 20:34"
)
```
### Priority 3: Granicus Portal Scraping
**Goal:** Automate discovery of Granicus video portals
**Pattern:**
- Many jurisdictions use Granicus for meeting videos
- URLs follow pattern: `{jurisdiction}.granicus.com/ViewPublisher.php?view_id={id}`
- Could automate discovery and link to OpenStates jurisdictions
---
## π License Compatibility
### Our License
- **Code:** Open source (check root LICENSE file)
- **Data:** Citations required (see CITATIONS.md)
### OpenStates License
- **Code:** BSD-style license (permissive)
- **Data:** Public domain (bulk downloads)
- **Content:** Varies by state (some restrictions)
β
**Compatible:** Our code contributions would be compatible with their license.
---
## π Required Reading Before Contributing
Before submitting any code to OpenStates, review:
1. **Local Database Setup**: https://docs.openstates.org/contributing/local-database/
- How to set up PostgreSQL locally
- How to run scrapers in development
- How to test data quality
2. **Scraper Development Guide**: https://docs.openstates.org/contributing/scrapers/
- Scrapy patterns used
- Data validation requirements
- Testing procedures
3. **Code of Conduct**: https://docs.openstates.org/code-of-conduct/
- Community standards
- Communication guidelines
- Enforcement policies
4. **Schema Documentation**: https://github.com/openstates/people/blob/master/schema.md
- Data model structure
- Required vs optional fields
- Relationship patterns
---
## π Next Steps
### For This Project
1. β
**Citations Added** - OpenStates properly credited
2. β
**Code of Conduct** - Aligned with their standards
3. β
**Local Database** - PostgreSQL dumps integrated
4. β³ **Test Contributions** - Validate our code works with their schema
### For Community Contribution
1. **Identify Target State** - Choose state needing video sources
2. **Test Locally** - Set up OpenStates dev environment
3. **Develop Scraper** - Add video discovery code
4. **Submit PR** - Follow their contribution guidelines
5. **Iterate** - Respond to code review feedback
---
## π‘ Benefits of Contributing
**For OpenStates:**
- Enhanced video source coverage
- Better meeting-to-bill linkage
- More comprehensive legislative tracking
**For Our Project:**
- Upstream improvements benefit us
- Community recognition
- Better data quality for all users
**For Civic Tech:**
- Shared infrastructure improvements
- Reduced duplication of effort
- Stronger open-source ecosystem
---
## π Questions?
- **OpenStates Discord**: https://discord.gg/openstates
- **GitHub Discussions**: https://github.com/openstates/openstates-scrapers/discussions
- **Email**: Open States team (check repository for contact info)
---
**Last Updated:** April 29, 2026
**Maintained By:** Open Navigator team
|