File size: 9,517 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
---
sidebar_position: 5
---

# OpenStates Integration & Contribution Opportunities

This document outlines our integration with OpenStates/Plural Policy and potential opportunities to contribute code back to the open-source community.

## πŸ“š References Added to Citations

We have properly cited and referenced the following OpenStates resources:

### In Root Citations (CITATIONS.md)
- βœ… **OpenStates/Plural Policy** main site
- βœ… **Bulk data downloads** (CSV, JSON, PostgreSQL)
- βœ… **Scrapers repository**: https://github.com/openstates/openstates-scrapers
- βœ… **Local database documentation**: https://docs.openstates.org/contributing/local-database/
- βœ… **Code of Conduct**: https://docs.openstates.org/code-of-conduct/
- βœ… **Schema documentation**: https://github.com/openstates/people/blob/master/schema.md

### In Website Documentation (website/docs/data-sources/citations.md)
- βœ… Comprehensive OpenStates/Plural Policy section
- βœ… PostgreSQL dump setup instructions
- βœ… Contribution guidelines
- βœ… BibTeX citation

### In Contributing Guide (CONTRIBUTING.md)
- βœ… Code of Conduct alignment with OpenStates
- βœ… Upstream contribution guidelines
- βœ… Testing requirements for scraper contributions

---

## πŸ”„ Our Current OpenStates Integration

### Data We Use

1. **PostgreSQL Monthly Dumps** (9.8GB+)
   - Complete legislative database for all 50 states
   - Script: `scripts/bulk_legislative_download.py --postgres --month 2026-04`
   - Setup: `scripts/setup_openstates_db.sh`
   - Use: Local SQL queries, no API rate limits

2. **CSV/JSON Session Data**
   - Per-state legislative sessions
   - Bill text, votes, sponsors
   - Committee assignments

3. **Video Sources**
   - YouTube channel URLs from `sources` field
   - Granicus video portal links
   - Meeting archive locations

### Our Implementation

**File:** `discovery/openstates_sources.py`
- Fetches jurisdiction data via API
- Extracts video sources (YouTube, Vimeo, Granicus)
- Maps to our jurisdiction database

**File:** `scripts/bulk_legislative_download.py`
- Downloads PostgreSQL dumps
- Downloads CSV/JSON session data
- Handles all 50 states + DC + PR

---

## 🀝 Code We Could Contribute to OpenStates Scrapers

The [openstates-scrapers](https://github.com/openstates/openstates-scrapers) repository uses **Scrapy** to collect legislative data. We have complementary code that could enhance their project:

### 1. Video Source Discovery Patterns

**Our Code:** `discovery/youtube_channel_discovery.py`

**What it does:**
- Finds **all** YouTube channels for a jurisdiction (not just first match)
- Scrapes homepages for YouTube links
- Uses YouTube Data API for verification
- Discovers Vimeo and Granicus portals

**Potential Contribution:**
- Add video source extraction to OpenStates scrapers
- Enhance `sources` field with verified YouTube channels
- Automate Granicus portal discovery

**Example Pattern:**
```python
# Our code finds these patterns
patterns = {
    "youtube_channel": r"youtube\.com/(?:c/|channel/|user/|@)([\w-]+)",
    "vimeo_channel": r"vimeo\.com/([\w-]+)",
    "granicus": r"granicus\.com/([^/]+)",
}
```

### 2. Legistar/Granicus Platform Detection

**Our Code:** `discovery/url_discovery_agent.py`

**What it does:**
- Identifies Legistar instances across cities
- Maps Granicus video portals
- Extracts meeting URLs and agendas

**Potential Contribution:**
- Enhance OpenStates scrapers with meeting video links
- Add Legistar meeting agenda extraction
- Contribute URL validation patterns

**Platform Patterns We Use:**
```python
platforms = {
    "granicus": ["granicus.com", "legistar.com"],
    "youtube": ["youtube.com", "youtu.be"],
    "vimeo": ["vimeo.com"],
}
```

### 3. Meeting Archive Scraping

**Our Code:** `agents/scraper.py`

**What it does:**
- Scrapes PDF meeting minutes
- Extracts text from scanned documents (OCR)
- Parses meeting dates and types
- Handles multiple document formats

**Potential Contribution:**
- Add meeting minutes text extraction to OpenStates
- Enhance bill analysis with meeting context
- Link bills to meeting discussions

---

## πŸ“ How to Contribute to OpenStates Scrapers

Following their [local database documentation](https://docs.openstates.org/contributing/local-database/):

### 1. Setup OpenStates Development Environment

```bash
# Clone the scrapers repository
git clone https://github.com/openstates/openstates-scrapers.git
cd openstates-scrapers

# Install dependencies
pip install -r requirements.txt

# Setup local PostgreSQL database
createdb openstates

# Import schema (if needed)
psql -d openstates -f schema/openstates.sql
```

### 2. Test Your Scraper Locally

```bash
# Run a specific state scraper
os-update al --scrape --rpm 10

# Validate data
os-update al --scrape --validate
```

### 3. Follow Their Code of Conduct

All contributions must follow the [OpenStates Code of Conduct](https://docs.openstates.org/code-of-conduct/):
- Be respectful and professional
- Welcome diverse perspectives
- Focus on what's best for the community
- Show empathy towards other contributors

### 4. Submit Pull Request

```bash
# Create feature branch
git checkout -b feature/video-sources

# Make changes (add video discovery to a state scraper)
# Example: scrapers/al/videos.py

# Test thoroughly
os-update al --scrape --rpm 10

# Commit and push
git commit -m "Add video source discovery for Alabama legislature"
git push origin feature/video-sources

# Open PR on GitHub
```

---

## 🎯 Specific Contribution Ideas

### Priority 1: Add Video Sources to Scrapers

**Goal:** Enhance the `sources` field with verified video links

**States to Start With:**
- **Alabama** - Has YouTube channel, needs verification
- **California** - @CALegislature (well-documented)
- **Texas** - Multiple chambers on YouTube
- **New York** - Both Assembly and Senate channels

**Implementation:**
```python
# In scrapers/al/__init__.py
class AlabamaScraper(BaseScraper):
    def scrape_sources(self):
        """Add video sources for Alabama legislature."""
        return {
            "youtube": "https://www.youtube.com/@AlabamaLegislature",
            "granicus": "https://alabama.granicus.com/ViewPublisher.php?view_id=6",
        }
```

### Priority 2: Meeting Minutes Integration

**Goal:** Link bills to meeting discussions

**Use Case:**
- When bill HB123 is discussed in committee
- Link to YouTube timestamp of discussion
- Extract quotes from meeting minutes
- Connect legislators' comments to votes

**Implementation:**
```python
# Add meeting metadata to bill objects
bill.add_source(
    url="https://www.youtube.com/watch?v=xyz&t=1234s",
    note="Committee discussion at 20:34"
)
```

### Priority 3: Granicus Portal Scraping

**Goal:** Automate discovery of Granicus video portals

**Pattern:**
- Many jurisdictions use Granicus for meeting videos
- URLs follow pattern: `{jurisdiction}.granicus.com/ViewPublisher.php?view_id={id}`
- Could automate discovery and link to OpenStates jurisdictions

---

## πŸ”’ License Compatibility

### Our License
- **Code:** Open source (check root LICENSE file)
- **Data:** Citations required (see CITATIONS.md)

### OpenStates License
- **Code:** BSD-style license (permissive)
- **Data:** Public domain (bulk downloads)
- **Content:** Varies by state (some restrictions)

βœ… **Compatible:** Our code contributions would be compatible with their license.

---

## πŸ“š Required Reading Before Contributing

Before submitting any code to OpenStates, review:

1. **Local Database Setup**: https://docs.openstates.org/contributing/local-database/
   - How to set up PostgreSQL locally
   - How to run scrapers in development
   - How to test data quality

2. **Scraper Development Guide**: https://docs.openstates.org/contributing/scrapers/
   - Scrapy patterns used
   - Data validation requirements
   - Testing procedures

3. **Code of Conduct**: https://docs.openstates.org/code-of-conduct/
   - Community standards
   - Communication guidelines
   - Enforcement policies

4. **Schema Documentation**: https://github.com/openstates/people/blob/master/schema.md
   - Data model structure
   - Required vs optional fields
   - Relationship patterns

---

## πŸš€ Next Steps

### For This Project

1. βœ… **Citations Added** - OpenStates properly credited
2. βœ… **Code of Conduct** - Aligned with their standards
3. βœ… **Local Database** - PostgreSQL dumps integrated
4. ⏳ **Test Contributions** - Validate our code works with their schema

### For Community Contribution

1. **Identify Target State** - Choose state needing video sources
2. **Test Locally** - Set up OpenStates dev environment
3. **Develop Scraper** - Add video discovery code
4. **Submit PR** - Follow their contribution guidelines
5. **Iterate** - Respond to code review feedback

---

## πŸ’‘ Benefits of Contributing

**For OpenStates:**
- Enhanced video source coverage
- Better meeting-to-bill linkage
- More comprehensive legislative tracking

**For Our Project:**
- Upstream improvements benefit us
- Community recognition
- Better data quality for all users

**For Civic Tech:**
- Shared infrastructure improvements
- Reduced duplication of effort
- Stronger open-source ecosystem

---

## πŸ“ž Questions?

- **OpenStates Discord**: https://discord.gg/openstates
- **GitHub Discussions**: https://github.com/openstates/openstates-scrapers/discussions
- **Email**: Open States team (check repository for contact info)

---

**Last Updated:** April 29, 2026  
**Maintained By:** Open Navigator team