File size: 27,295 Bytes
896453f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
# Scale and Search Patterns: End-to-End Civic Tech Projects

This guide analyzes **6 additional civic tech projects** focused on full-stack deployments, large-scale data aggregation, and public search portals. These complement our existing integration (Civic Scraper, City Scrapers, CDP, Engagic, Councilmatic) with new patterns for:

- πŸ€– **AI summarization** (OpenTowns, MeetingBank)
- πŸ” **Multi-jurisdiction search** (CivicBand, LocalView)
- πŸ”” **Keyword alerting** (OpenTowns)
- πŸ“Š **Research-grade pipelines** (LocalView, MeetingBank)
- 🌍 **International adaptability** (OpenCouncil)

---

## 🎯 What's NEW vs. Our Existing Integration

| Pattern | Already Have | NEW from These Projects |
|---------|--------------|-------------------------|
| Platform detection | βœ… Civic Scraper | - |
| Event schema | βœ… City Scrapers | - |
| Video ingestion | βœ… CDP | βœ… LocalView scale patterns |
| Matter tracking | βœ… Engagic | - |
| Search UX | βœ… Councilmatic | βœ… CivicBand cross-jurisdiction |
| **AI Summarization** | ❌ | βœ… **OpenTowns, MeetingBank** |
| **Keyword Alerts** | ❌ | βœ… **OpenTowns** |
| **Scale (1,000+ jurisdictions)** | ⚠️ Partial | βœ… **CivicBand, LocalView** |
| **International patterns** | ❌ | βœ… **OpenCouncil** |

---

## πŸ“š Project Analysis

### 1. Council Data Project (CDP) ⭐ Already Integrated

**Status**: Already documented in `INTEGRATION_GUIDE.md`

**Key patterns we already use**:
- Video transcript ingestion
- Searchable transcript storage
- Event indexing pipeline

**See**: `docs/INTEGRATION_GUIDE.md` Section 4

---

### 2. OpenTowns πŸ†• AI Summarization Pioneer

**GitHub**: https://opentowns.org  
**License**: Open civic-tech (check specific repo)  
**Focus**: Small towns, AI-generated summaries, keyword alerts

#### πŸ”₯ What to Adopt

**A. AI Summarization Pattern**
```python
# They generate readable summaries from raw transcripts/PDFs
# Pattern: transcript β†’ summary β†’ key decisions

from openai import OpenAI
from models.meeting_event import MeetingEvent

async def generate_meeting_summary(event: MeetingEvent, transcript: str) -> dict:
    """
    OpenTowns pattern: Generate human-readable meeting summaries.
    
    Returns:
        {
            'executive_summary': str,      # 2-3 sentences
            'key_decisions': list[str],     # Bullet points
            'health_policy_items': list[str],  # Filtered for oral health
            'next_actions': list[str]       # Follow-up items
        }
    """
    client = OpenAI()
    
    prompt = f"""
    Summarize this local government meeting for public understanding.
    
    Meeting: {event.title}
    Date: {event.start.strftime('%B %d, %Y')}
    Transcript: {transcript[:10000]}  # First 10k chars
    
    Provide:
    1. Executive summary (2-3 sentences)
    2. Key decisions made (bullet points)
    3. Health policy items (if any)
    4. Next actions/follow-ups
    
    Focus on: What decisions were made? What happens next?
    """
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Cost-effective for summaries
        messages=[
            {"role": "system", "content": "You are a civic engagement assistant helping residents understand local government."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3  # Lower for factual accuracy
    )
    
    # Parse response into structured format
    summary_text = response.choices[0].message.content
    
    return {
        'executive_summary': extract_section(summary_text, 'Executive summary'),
        'key_decisions': extract_bullets(summary_text, 'Key decisions'),
        'health_policy_items': extract_bullets(summary_text, 'Health policy'),
        'next_actions': extract_bullets(summary_text, 'Next actions'),
        'raw_summary': summary_text
    }
```

**B. Keyword Alert System**
```python
# OpenTowns sends alerts when keywords appear in meetings
# Pattern: Watch list β†’ match detection β†’ user notification

from typing import List, Dict
import re

class KeywordAlertSystem:
    """
    OpenTowns pattern: Alert users when keywords appear in meetings.
    """
    
    # Oral health keyword categories
    KEYWORD_CATEGORIES = {
        'fluoridation': [
            'fluoride', 'fluoridation', 'water treatment',
            'community water fluoridation', 'CWF'
        ],
        'dental_access': [
            'dental', 'dentist', 'oral health', 'teeth',
            'medicaid dental', 'dental clinic'
        ],
        'public_health': [
            'health department', 'public health', 'CDC',
            'preventive care', 'health equity'
        ]
    }
    
    def detect_keywords(self, text: str) -> Dict[str, List[str]]:
        """
        Find all matching keywords in text.
        
        Returns: {'fluoridation': ['fluoride', 'CWF'], ...}
        """
        text_lower = text.lower()
        matches = {}
        
        for category, keywords in self.KEYWORD_CATEGORIES.items():
            found = []
            for keyword in keywords:
                # Word boundary matching
                pattern = r'\b' + re.escape(keyword.lower()) + r'\b'
                if re.search(pattern, text_lower):
                    found.append(keyword)
            
            if found:
                matches[category] = found
        
        return matches
    
    def generate_alert(self, event: MeetingEvent, matches: Dict[str, List[str]]) -> dict:
        """
        Create alert notification for users.
        """
        return {
            'alert_type': 'keyword_match',
            'jurisdiction': f"{event.jurisdiction_name}, {event.state_code}",
            'meeting_title': event.title,
            'meeting_date': event.start.isoformat(),
            'categories_matched': list(matches.keys()),
            'keywords_found': [kw for kws in matches.values() for kw in kws],
            'meeting_url': event.source,
            'priority': 'high' if 'fluoridation' in matches else 'medium'
        }
```

**Implementation Priority**: πŸ”₯ **HIGH** - Summaries make data usable for advocates

---

### 3. LocalView πŸ†• Research-Grade Scale

**Website**: https://www.localview.net  
**GitHub**: https://mellonurbanism.harvard.edu/localview  
**License**: Open-source data pipeline  
**Scale**: Nationwide coverage, largest public dataset

#### πŸ”₯ What to Adopt

**A. Scale Architecture Patterns**

LocalView handles **thousands of jurisdictions** with:
1. **Batch processing** (not real-time)
2. **Distributed storage** (videos + transcripts)
3. **Quality metrics** (completeness scoring)

```python
# LocalView pattern: Process jurisdictions in batches with quality tracking

from dataclasses import dataclass
from datetime import datetime
from typing import Optional

@dataclass
class JurisdictionQuality:
    """
    LocalView pattern: Track data quality per jurisdiction.
    """
    jurisdiction_name: str
    state_code: str
    
    # Completeness metrics
    total_meetings_expected: int  # Based on calendar
    total_meetings_found: int
    meetings_with_agendas: int
    meetings_with_minutes: int
    meetings_with_videos: int
    meetings_with_transcripts: int
    
    # Freshness
    last_scraped: datetime
    last_meeting_found: Optional[datetime]
    scraping_frequency: str  # 'daily', 'weekly', 'monthly'
    
    # Health metrics
    consecutive_failures: int
    last_success: Optional[datetime]
    
    @property
    def completeness_score(self) -> float:
        """
        Overall data quality score (0-100).
        """
        if self.total_meetings_expected == 0:
            return 0.0
        
        found_rate = self.total_meetings_found / self.total_meetings_expected
        agenda_rate = self.meetings_with_agendas / max(self.total_meetings_found, 1)
        minutes_rate = self.meetings_with_minutes / max(self.total_meetings_found, 1)
        
        # Weighted average
        score = (
            found_rate * 40 +      # 40%: Finding meetings
            agenda_rate * 30 +      # 30%: Having agendas
            minutes_rate * 30       # 30%: Having minutes
        )
        
        return min(score * 100, 100.0)
    
    @property
    def health_status(self) -> str:
        """
        Scraper health: healthy, degraded, failed
        """
        if self.consecutive_failures >= 5:
            return 'failed'
        elif self.consecutive_failures >= 2:
            return 'degraded'
        else:
            return 'healthy'
```

**B. Batch Processing Strategy**
```python
# LocalView processes in batches, not all-at-once

from pyspark.sql import SparkSession
from typing import Iterator

def process_jurisdictions_in_batches(
    spark: SparkSession,
    batch_size: int = 100,
    priority_filter: str = 'high'
) -> Iterator[dict]:
    """
    LocalView pattern: Process large numbers of jurisdictions efficiently.
    
    Strategy:
    1. Load high-priority jurisdictions first
    2. Process in batches to manage memory
    3. Track quality metrics per batch
    4. Resume from failures
    """
    # Load targets from Gold layer
    targets_df = spark.read.format("delta").load("data/delta/gold/scraping_targets")
    
    # Filter and sort
    priority_targets = targets_df \
        .filter(f"priority_tier = '{priority_filter}'") \
        .orderBy("priority_score", ascending=False)
    
    total_targets = priority_targets.count()
    
    # Process in batches
    for offset in range(0, total_targets, batch_size):
        batch_df = priority_targets.limit(batch_size).offset(offset)
        
        batch_results = {
            'batch_number': offset // batch_size + 1,
            'batch_size': batch_size,
            'jurisdictions_processed': 0,
            'meetings_found': 0,
            'errors': []
        }
        
        for row in batch_df.collect():
            try:
                # Scrape jurisdiction
                meetings = scrape_jurisdiction(row['url'], row['platform'])
                batch_results['jurisdictions_processed'] += 1
                batch_results['meetings_found'] += len(meetings)
                
            except Exception as e:
                batch_results['errors'].append({
                    'jurisdiction': row['jurisdiction_name'],
                    'error': str(e)
                })
        
        yield batch_results
```

**Implementation Priority**: πŸ”₯ **HIGH** - Essential for scaling to 32,333 municipalities

---

### 4. MeetingBank πŸ†• Summarization Research

**Website**: https://meetingbank.github.io  
**GitHub**: Linked from site  
**License**: Open dataset  
**Focus**: 6 cities, high-quality summarization benchmark

#### πŸ”₯ What to Adopt

**A. Summarization Quality Benchmarks**

MeetingBank is used in academic research for summarization. They have:
- **Gold-standard human summaries** (for validation)
- **Multiple summary lengths** (short, medium, long)
- **Evaluation metrics** (ROUGE, BERTScore)

```python
# MeetingBank pattern: Validate AI summaries against quality benchmarks

from typing import Dict
import numpy as np

class SummaryQualityValidator:
    """
    MeetingBank pattern: Ensure AI summaries meet quality standards.
    """
    
    # Quality thresholds from academic research
    MIN_ROUGE_L = 0.25  # ROUGE-L F1 score
    MIN_LENGTH_RATIO = 0.05  # Summary should be 5-20% of original
    MAX_LENGTH_RATIO = 0.20
    
    def validate_summary(self, original: str, summary: str) -> Dict[str, any]:
        """
        Check if summary meets quality standards.
        """
        # Length checks
        orig_words = len(original.split())
        summ_words = len(summary.split())
        length_ratio = summ_words / orig_words if orig_words > 0 else 0
        
        # Basic quality checks
        checks = {
            'length_appropriate': self.MIN_LENGTH_RATIO <= length_ratio <= self.MAX_LENGTH_RATIO,
            'has_key_terms': self._check_key_terms(original, summary),
            'no_repetition': self._check_repetition(summary),
            'proper_structure': self._check_structure(summary),
        }
        
        return {
            'passes_validation': all(checks.values()),
            'checks': checks,
            'length_ratio': length_ratio,
            'word_count': summ_words,
            'quality_score': sum(checks.values()) / len(checks)
        }
    
    def _check_key_terms(self, original: str, summary: str) -> bool:
        """
        Ensure summary includes key terms from original.
        """
        # Extract important terms (simplified - use TF-IDF in production)
        orig_words = set(original.lower().split())
        summ_words = set(summary.lower().split())
        
        # At least 30% overlap of unique terms
        overlap = len(orig_words & summ_words) / len(orig_words)
        return overlap >= 0.30
    
    def _check_repetition(self, summary: str) -> bool:
        """
        Check for excessive repetition (indicates poor quality).
        """
        sentences = summary.split('.')
        unique_ratio = len(set(sentences)) / len(sentences) if sentences else 0
        return unique_ratio >= 0.80  # At least 80% unique sentences
    
    def _check_structure(self, summary: str) -> bool:
        """
        Check for proper summary structure.
        """
        # Should have multiple sentences
        sentences = [s.strip() for s in summary.split('.') if s.strip()]
        return len(sentences) >= 2 and len(sentences) <= 10
```

**Implementation Priority**: 🟑 **MEDIUM** - Important for quality, but MVP can use basic summaries

---

### 5. CivicBand πŸ†• Multi-Jurisdiction Search

**Website**: https://civic.band  
**GitHub**: Linked from site (Raft Foundation)  
**Scale**: 1,000+ municipalities  
**Focus**: Google-like search across jurisdictions

#### πŸ”₯ What to Adopt

**A. Cross-Jurisdiction Search Architecture**

CivicBand lets users search "fluoridation" and get results from **all municipalities** at once.

```python
# CivicBand pattern: Federated search across jurisdictions

from elasticsearch import Elasticsearch  # Or Meilisearch for open-source
from typing import List, Dict
from models.meeting_event import MeetingEvent

class CrossJurisdictionSearch:
    """
    CivicBand pattern: Search meetings across all jurisdictions.
    """
    
    def __init__(self):
        # Use Meilisearch (open-source) or Elasticsearch
        self.es = Elasticsearch(['http://localhost:9200'])
        self.index_name = 'meeting_events'
    
    def index_meeting(self, event: MeetingEvent):
        """
        Add meeting to search index.
        """
        doc = {
            'id': event.id,
            'title': event.title,
            'description': event.description,
            'jurisdiction': event.jurisdiction_name,
            'state': event.state_code,
            'date': event.start.isoformat(),
            'full_text': self._build_searchable_text(event),
            'agenda_url': next((link.href for link in event.links if 'agenda' in link.title.lower()), None),
            'oral_health_relevant': event.oral_health_relevant,
            'keywords': event.keywords_found
        }
        
        self.es.index(index=self.index_name, id=event.id, document=doc)
    
    def search(
        self,
        query: str,
        states: List[str] = None,
        date_range: tuple = None,
        oral_health_only: bool = False
    ) -> List[Dict]:
        """
        Search across all jurisdictions.
        
        Example:
            search("fluoridation", states=['AL', 'GA'], oral_health_only=True)
        """
        must_clauses = [
            {"multi_match": {
                "query": query,
                "fields": ["title^3", "description^2", "full_text"],  # Boost title matches
                "type": "best_fields"
            }}
        ]
        
        # Filter by state
        if states:
            must_clauses.append({"terms": {"state": states}})
        
        # Filter by date range
        if date_range:
            must_clauses.append({
                "range": {"date": {"gte": date_range[0], "lte": date_range[1]}}
            })
        
        # Filter oral health only
        if oral_health_only:
            must_clauses.append({"term": {"oral_health_relevant": True}})
        
        search_query = {
            "query": {"bool": {"must": must_clauses}},
            "size": 100,
            "highlight": {
                "fields": {
                    "title": {},
                    "description": {},
                    "full_text": {"fragment_size": 150}
                }
            },
            "sort": [
                {"_score": "desc"},
                {"date": "desc"}
            ]
        }
        
        results = self.es.search(index=self.index_name, body=search_query)
        
        return [{
            'jurisdiction': hit['_source']['jurisdiction'],
            'state': hit['_source']['state'],
            'title': hit['_source']['title'],
            'date': hit['_source']['date'],
            'snippet': hit.get('highlight', {}).get('full_text', [''])[0],
            'url': hit['_source']['agenda_url'],
            'relevance_score': hit['_score']
        } for hit in results['hits']['hits']]
    
    def _build_searchable_text(self, event: MeetingEvent) -> str:
        """
        Combine all text fields for indexing.
        """
        parts = [
            event.title or '',
            event.description or '',
            ' '.join(event.keywords_found),
            ' '.join(link.title for link in event.links)
        ]
        return ' '.join(parts)
```

**B. Jurisdiction Faceting**
```python
# CivicBand shows result counts by jurisdiction

def get_search_facets(query: str) -> Dict[str, int]:
    """
    Show how many results per jurisdiction.
    
    Example output:
        {
            'Birmingham, AL': 12,
            'Atlanta, GA': 8,
            'Montgomery, AL': 5
        }
    """
    search_query = {
        "query": {"multi_match": {"query": query, "fields": ["title", "full_text"]}},
        "size": 0,  # We only want aggregations
        "aggs": {
            "by_jurisdiction": {
                "terms": {
                    "field": "jurisdiction.keyword",
                    "size": 50  # Top 50 jurisdictions
                },
                "aggs": {
                    "by_state": {
                        "terms": {"field": "state.keyword"}
                    }
                }
            }
        }
    }
    
    results = self.es.search(index=self.index_name, body=search_query)
    
    facets = {}
    for bucket in results['aggregations']['by_jurisdiction']['buckets']:
        jurisdiction = bucket['key']
        count = bucket['doc_count']
        state = bucket['by_state']['buckets'][0]['key']
        facets[f"{jurisdiction}, {state}"] = count
    
    return facets
```

**Implementation Priority**: 🟑 **MEDIUM** - Valuable for end-users, but scraping comes first

---

### 6. OpenCouncil πŸ†• International Adaptability

**Website**: https://opencouncil.gr  
**GitHub**: https://github.com/schemalabz/opencouncil  
**License**: Open-source  
**Focus**: Greek councils, but adaptable to U.S.

#### πŸ”₯ What to Adopt

**A. Internationalization Patterns**

OpenCouncil works in Greece (different government structure). This teaches us:
- **Flexible schema** (not hardcoded to U.S. structures)
- **Configurable jurisdiction types** (councils, boards, commissions)
- **Multi-language support** (not needed now, but good architecture)

```python
# OpenCouncil pattern: Flexible jurisdiction configuration

from enum import Enum
from dataclasses import dataclass
from typing import List, Optional

class GovernmentLevel(Enum):
    """
    OpenCouncil pattern: Support multiple government structures.
    """
    MUNICIPAL = "municipal"          # City/town councils
    COUNTY = "county"                # County boards
    TOWNSHIP = "township"            # Township boards
    SCHOOL_DISTRICT = "school"       # School boards
    SPECIAL_DISTRICT = "special"     # Water, fire, etc.
    STATE = "state"                  # State agencies (future)

@dataclass
class JurisdictionConfig:
    """
    OpenCouncil pattern: Configure each jurisdiction's unique structure.
    """
    jurisdiction_name: str
    government_level: GovernmentLevel
    
    # Meeting schedule
    typical_meeting_frequency: str  # 'weekly', 'biweekly', 'monthly'
    typical_meeting_days: List[str]  # ['Monday', 'Thursday']
    typical_meeting_time: str  # '18:00'
    
    # Website structure
    calendar_url: Optional[str]
    agenda_url_pattern: Optional[str]  # Template: "https://example.gov/agenda-{date}"
    minutes_url_pattern: Optional[str]
    
    # Legislative bodies
    bodies: List[str]  # ['City Council', 'Planning Commission', 'Board of Health']
    
    # Custom fields
    metadata: dict  # For jurisdiction-specific data

# Example: Configure Birmingham, AL
BIRMINGHAM_CONFIG = JurisdictionConfig(
    jurisdiction_name="Birmingham",
    government_level=GovernmentLevel.MUNICIPAL,
    typical_meeting_frequency='biweekly',
    typical_meeting_days=['Tuesday'],
    typical_meeting_time='18:00',
    calendar_url="https://birminghamal.gov/council/meetings",
    bodies=['City Council', 'Board of Health', 'Planning Commission'],
    metadata={'population': 200733, 'oral_health_priority': 'high'}
)
```

**Implementation Priority**: 🟒 **LOW** - Good architecture, but not urgent

---

## 🎯 Implementation Roadmap

### Phase 1: AI Summarization (OpenTowns pattern) πŸ”₯
**Priority**: HIGH  
**Timeline**: 1-2 weeks  
**Depends on**: Existing OpenAI integration

```python
# TODO: Implement in extraction/summarizer.py
- [ ] Generate executive summaries from meeting transcripts
- [ ] Extract key decisions as bullet points
- [ ] Identify health policy items
- [ ] Add quality validation (MeetingBank patterns)
```

### Phase 2: Keyword Alerts (OpenTowns pattern) πŸ”₯
**Priority**: HIGH  
**Timeline**: 1 week  
**Depends on**: Meeting data ingestion

```python
# TODO: Implement in alerts/keyword_monitor.py
- [ ] Define oral health keyword categories
- [ ] Pattern matching with word boundaries
- [ ] Generate alerts for users
- [ ] Email/webhook notification system
```

### Phase 3: Scale Architecture (LocalView pattern) πŸ”₯
**Priority**: HIGH  
**Timeline**: 2 weeks  
**Depends on**: Platform scrapers

```python
# TODO: Implement in discovery/batch_processor.py
- [ ] Quality metrics per jurisdiction
- [ ] Batch processing (100 at a time)
- [ ] Failure tracking and retry
- [ ] Completeness scoring
```

### Phase 4: Multi-Jurisdiction Search (CivicBand pattern) 🟑
**Priority**: MEDIUM  
**Timeline**: 2-3 weeks  
**Depends on**: Significant meeting data

```python
# TODO: Implement in search/federated_search.py
- [ ] Set up Elasticsearch or Meilisearch
- [ ] Index all meetings
- [ ] Cross-jurisdiction search API
- [ ] Jurisdiction faceting
```

### Phase 5: Quality Validation (MeetingBank pattern) 🟑
**Priority**: MEDIUM  
**Timeline**: 1 week  
**Depends on**: AI summarization

```python
# TODO: Implement in extraction/quality_validator.py
- [ ] Summary length validation
- [ ] Key term extraction
- [ ] Repetition detection
- [ ] Structure checking
```

### Phase 6: Flexible Config (OpenCouncil pattern) 🟒
**Priority**: LOW  
**Timeline**: 1 week  
**Depends on**: None

```python
# TODO: Implement in config/jurisdiction_configs.py
- [ ] Per-jurisdiction configuration
- [ ] Meeting schedule patterns
- [ ] Legislative body tracking
```

---

## πŸ“Š Comparison with Existing Integration

| Capability | Original 5 Projects | New 6 Projects | Status |
|------------|-------------------|---------------|--------|
| Platform detection | βœ… Civic Scraper | - | **Complete** |
| Event schema | βœ… City Scrapers | - | **Complete** |
| Video ingestion | βœ… CDP | βœ… LocalView (scale) | **Need scale patterns** |
| Matter tracking | βœ… Engagic | - | **Complete** |
| Person/vote tracking | βœ… Councilmatic | - | Roadmapped |
| **AI Summarization** | ❌ | βœ… OpenTowns, MeetingBank | **TODO: High priority** |
| **Keyword Alerts** | ❌ | βœ… OpenTowns | **TODO: High priority** |
| **Cross-jurisdiction search** | ⚠️ Basic | βœ… CivicBand | **TODO: Medium priority** |
| **Quality metrics** | ❌ | βœ… LocalView, MeetingBank | **TODO: Medium priority** |
| **Batch processing** | ⚠️ Basic | βœ… LocalView | **TODO: High priority** |

---

## πŸ’» Quick Start: Integrate Summarization

Here's how to add OpenTowns-style summarization **right now**:

```python
# File: extraction/summarizer.py

from openai import OpenAI
from models.meeting_event import MeetingEvent
from config.settings import settings

client = OpenAI(api_key=settings.openai_api_key)

def summarize_meeting(event: MeetingEvent, full_text: str) -> dict:
    """
    Generate OpenTowns-style summary with oral health focus.
    """
    prompt = f"""
    You are summarizing a local government meeting for public health advocates.
    
    Meeting: {event.title}
    Jurisdiction: {event.jurisdiction_name}, {event.state_code}
    Date: {event.start.strftime('%B %d, %Y')}
    
    Full text (first 8000 chars):
    {full_text[:8000]}
    
    Provide:
    1. Executive Summary (2-3 sentences)
    2. Key Decisions (bullet list)
    3. Oral Health Items (if any - fluoridation, dental access, etc.)
    4. Next Actions (follow-ups, future meetings)
    
    Focus on: What was decided? What's happening next?
    """
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You summarize local government meetings for public understanding."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3
    )
    
    return {
        'summary': response.choices[0].message.content,
        'model': 'gpt-4o-mini',
        'tokens_used': response.usage.total_tokens
    }

# Usage:
# summary = summarize_meeting(event, full_transcript)
# event.description = summary['summary']
```

---

## 🎬 Next Steps

1. **Implement AI summarization** (OpenTowns pattern) β†’ Makes data usable
2. **Add keyword alerts** (OpenTowns pattern) β†’ Engage advocates
3. **Add batch processing** (LocalView pattern) β†’ Scale to 1,000+ jurisdictions
4. **Build search interface** (CivicBand pattern) β†’ User discovery
5. **Add quality metrics** (LocalView + MeetingBank) β†’ Monitor data health

---

## πŸ“– References

- **OpenTowns**: https://opentowns.org
- **LocalView**: https://www.localview.net
- **MeetingBank**: https://meetingbank.github.io
- **CivicBand**: https://civic.band
- **OpenCouncil**: https://github.com/schemalabz/opencouncil
- **Council Data Project**: https://councildataproject.org (see INTEGRATION_GUIDE.md)

---

## πŸ“ License & Attribution

All patterns documented here are derived from open-source projects:
- OpenTowns: Open civic-tech project
- LocalView: Open-source (Harvard Mellon Urbanism)
- MeetingBank: Open dataset
- CivicBand: Open-source (Raft Foundation)
- OpenCouncil: Open-source (MIT)
- CDP: MIT License

When using code patterns, maintain attribution per each project's license.