Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| # Scale and Search Patterns: End-to-End Civic Tech Projects | |
| This guide analyzes **6 additional civic tech projects** focused on full-stack deployments, large-scale data aggregation, and public search portals. These complement our existing integration (Civic Scraper, City Scrapers, CDP, Engagic, Councilmatic) with new patterns for: | |
| - π€ **AI summarization** (OpenTowns, MeetingBank) | |
| - π **Multi-jurisdiction search** (CivicBand, LocalView) | |
| - π **Keyword alerting** (OpenTowns) | |
| - π **Research-grade pipelines** (LocalView, MeetingBank) | |
| - π **International adaptability** (OpenCouncil) | |
| --- | |
| ## π― What's NEW vs. Our Existing Integration | |
| | Pattern | Already Have | NEW from These Projects | | |
| |---------|--------------|-------------------------| | |
| | Platform detection | β Civic Scraper | - | | |
| | Event schema | β City Scrapers | - | | |
| | Video ingestion | β CDP | β LocalView scale patterns | | |
| | Matter tracking | β Engagic | - | | |
| | Search UX | β Councilmatic | β CivicBand cross-jurisdiction | | |
| | **AI Summarization** | β | β **OpenTowns, MeetingBank** | | |
| | **Keyword Alerts** | β | β **OpenTowns** | | |
| | **Scale (1,000+ jurisdictions)** | β οΈ Partial | β **CivicBand, LocalView** | | |
| | **International patterns** | β | β **OpenCouncil** | | |
| --- | |
| ## π Project Analysis | |
| ### 1. Council Data Project (CDP) β Already Integrated | |
| **Status**: Already documented in `INTEGRATION_GUIDE.md` | |
| **Key patterns we already use**: | |
| - Video transcript ingestion | |
| - Searchable transcript storage | |
| - Event indexing pipeline | |
| **See**: `docs/INTEGRATION_GUIDE.md` Section 4 | |
| --- | |
| ### 2. OpenTowns π AI Summarization Pioneer | |
| **GitHub**: https://opentowns.org | |
| **License**: Open civic-tech (check specific repo) | |
| **Focus**: Small towns, AI-generated summaries, keyword alerts | |
| #### π₯ What to Adopt | |
| **A. AI Summarization Pattern** | |
| ```python | |
| # They generate readable summaries from raw transcripts/PDFs | |
| # Pattern: transcript β summary β key decisions | |
| from openai import OpenAI | |
| from models.meeting_event import MeetingEvent | |
| async def generate_meeting_summary(event: MeetingEvent, transcript: str) -> dict: | |
| """ | |
| OpenTowns pattern: Generate human-readable meeting summaries. | |
| Returns: | |
| { | |
| 'executive_summary': str, # 2-3 sentences | |
| 'key_decisions': list[str], # Bullet points | |
| 'health_policy_items': list[str], # Filtered for oral health | |
| 'next_actions': list[str] # Follow-up items | |
| } | |
| """ | |
| client = OpenAI() | |
| prompt = f""" | |
| Summarize this local government meeting for public understanding. | |
| Meeting: {event.title} | |
| Date: {event.start.strftime('%B %d, %Y')} | |
| Transcript: {transcript[:10000]} # First 10k chars | |
| Provide: | |
| 1. Executive summary (2-3 sentences) | |
| 2. Key decisions made (bullet points) | |
| 3. Health policy items (if any) | |
| 4. Next actions/follow-ups | |
| Focus on: What decisions were made? What happens next? | |
| """ | |
| response = client.chat.completions.create( | |
| model="gpt-4o-mini", # Cost-effective for summaries | |
| messages=[ | |
| {"role": "system", "content": "You are a civic engagement assistant helping residents understand local government."}, | |
| {"role": "user", "content": prompt} | |
| ], | |
| temperature=0.3 # Lower for factual accuracy | |
| ) | |
| # Parse response into structured format | |
| summary_text = response.choices[0].message.content | |
| return { | |
| 'executive_summary': extract_section(summary_text, 'Executive summary'), | |
| 'key_decisions': extract_bullets(summary_text, 'Key decisions'), | |
| 'health_policy_items': extract_bullets(summary_text, 'Health policy'), | |
| 'next_actions': extract_bullets(summary_text, 'Next actions'), | |
| 'raw_summary': summary_text | |
| } | |
| ``` | |
| **B. Keyword Alert System** | |
| ```python | |
| # OpenTowns sends alerts when keywords appear in meetings | |
| # Pattern: Watch list β match detection β user notification | |
| from typing import List, Dict | |
| import re | |
| class KeywordAlertSystem: | |
| """ | |
| OpenTowns pattern: Alert users when keywords appear in meetings. | |
| """ | |
| # Oral health keyword categories | |
| KEYWORD_CATEGORIES = { | |
| 'fluoridation': [ | |
| 'fluoride', 'fluoridation', 'water treatment', | |
| 'community water fluoridation', 'CWF' | |
| ], | |
| 'dental_access': [ | |
| 'dental', 'dentist', 'oral health', 'teeth', | |
| 'medicaid dental', 'dental clinic' | |
| ], | |
| 'public_health': [ | |
| 'health department', 'public health', 'CDC', | |
| 'preventive care', 'health equity' | |
| ] | |
| } | |
| def detect_keywords(self, text: str) -> Dict[str, List[str]]: | |
| """ | |
| Find all matching keywords in text. | |
| Returns: {'fluoridation': ['fluoride', 'CWF'], ...} | |
| """ | |
| text_lower = text.lower() | |
| matches = {} | |
| for category, keywords in self.KEYWORD_CATEGORIES.items(): | |
| found = [] | |
| for keyword in keywords: | |
| # Word boundary matching | |
| pattern = r'\b' + re.escape(keyword.lower()) + r'\b' | |
| if re.search(pattern, text_lower): | |
| found.append(keyword) | |
| if found: | |
| matches[category] = found | |
| return matches | |
| def generate_alert(self, event: MeetingEvent, matches: Dict[str, List[str]]) -> dict: | |
| """ | |
| Create alert notification for users. | |
| """ | |
| return { | |
| 'alert_type': 'keyword_match', | |
| 'jurisdiction': f"{event.jurisdiction_name}, {event.state_code}", | |
| 'meeting_title': event.title, | |
| 'meeting_date': event.start.isoformat(), | |
| 'categories_matched': list(matches.keys()), | |
| 'keywords_found': [kw for kws in matches.values() for kw in kws], | |
| 'meeting_url': event.source, | |
| 'priority': 'high' if 'fluoridation' in matches else 'medium' | |
| } | |
| ``` | |
| **Implementation Priority**: π₯ **HIGH** - Summaries make data usable for advocates | |
| --- | |
| ### 3. LocalView π Research-Grade Scale | |
| **Website**: https://www.localview.net | |
| **GitHub**: https://mellonurbanism.harvard.edu/localview | |
| **License**: Open-source data pipeline | |
| **Scale**: Nationwide coverage, largest public dataset | |
| #### π₯ What to Adopt | |
| **A. Scale Architecture Patterns** | |
| LocalView handles **thousands of jurisdictions** with: | |
| 1. **Batch processing** (not real-time) | |
| 2. **Distributed storage** (videos + transcripts) | |
| 3. **Quality metrics** (completeness scoring) | |
| ```python | |
| # LocalView pattern: Process jurisdictions in batches with quality tracking | |
| from dataclasses import dataclass | |
| from datetime import datetime | |
| from typing import Optional | |
| @dataclass | |
| class JurisdictionQuality: | |
| """ | |
| LocalView pattern: Track data quality per jurisdiction. | |
| """ | |
| jurisdiction_name: str | |
| state_code: str | |
| # Completeness metrics | |
| total_meetings_expected: int # Based on calendar | |
| total_meetings_found: int | |
| meetings_with_agendas: int | |
| meetings_with_minutes: int | |
| meetings_with_videos: int | |
| meetings_with_transcripts: int | |
| # Freshness | |
| last_scraped: datetime | |
| last_meeting_found: Optional[datetime] | |
| scraping_frequency: str # 'daily', 'weekly', 'monthly' | |
| # Health metrics | |
| consecutive_failures: int | |
| last_success: Optional[datetime] | |
| @property | |
| def completeness_score(self) -> float: | |
| """ | |
| Overall data quality score (0-100). | |
| """ | |
| if self.total_meetings_expected == 0: | |
| return 0.0 | |
| found_rate = self.total_meetings_found / self.total_meetings_expected | |
| agenda_rate = self.meetings_with_agendas / max(self.total_meetings_found, 1) | |
| minutes_rate = self.meetings_with_minutes / max(self.total_meetings_found, 1) | |
| # Weighted average | |
| score = ( | |
| found_rate * 40 + # 40%: Finding meetings | |
| agenda_rate * 30 + # 30%: Having agendas | |
| minutes_rate * 30 # 30%: Having minutes | |
| ) | |
| return min(score * 100, 100.0) | |
| @property | |
| def health_status(self) -> str: | |
| """ | |
| Scraper health: healthy, degraded, failed | |
| """ | |
| if self.consecutive_failures >= 5: | |
| return 'failed' | |
| elif self.consecutive_failures >= 2: | |
| return 'degraded' | |
| else: | |
| return 'healthy' | |
| ``` | |
| **B. Batch Processing Strategy** | |
| ```python | |
| # LocalView processes in batches, not all-at-once | |
| from pyspark.sql import SparkSession | |
| from typing import Iterator | |
| def process_jurisdictions_in_batches( | |
| spark: SparkSession, | |
| batch_size: int = 100, | |
| priority_filter: str = 'high' | |
| ) -> Iterator[dict]: | |
| """ | |
| LocalView pattern: Process large numbers of jurisdictions efficiently. | |
| Strategy: | |
| 1. Load high-priority jurisdictions first | |
| 2. Process in batches to manage memory | |
| 3. Track quality metrics per batch | |
| 4. Resume from failures | |
| """ | |
| # Load targets from Gold layer | |
| targets_df = spark.read.format("delta").load("data/delta/gold/scraping_targets") | |
| # Filter and sort | |
| priority_targets = targets_df \ | |
| .filter(f"priority_tier = '{priority_filter}'") \ | |
| .orderBy("priority_score", ascending=False) | |
| total_targets = priority_targets.count() | |
| # Process in batches | |
| for offset in range(0, total_targets, batch_size): | |
| batch_df = priority_targets.limit(batch_size).offset(offset) | |
| batch_results = { | |
| 'batch_number': offset // batch_size + 1, | |
| 'batch_size': batch_size, | |
| 'jurisdictions_processed': 0, | |
| 'meetings_found': 0, | |
| 'errors': [] | |
| } | |
| for row in batch_df.collect(): | |
| try: | |
| # Scrape jurisdiction | |
| meetings = scrape_jurisdiction(row['url'], row['platform']) | |
| batch_results['jurisdictions_processed'] += 1 | |
| batch_results['meetings_found'] += len(meetings) | |
| except Exception as e: | |
| batch_results['errors'].append({ | |
| 'jurisdiction': row['jurisdiction_name'], | |
| 'error': str(e) | |
| }) | |
| yield batch_results | |
| ``` | |
| **Implementation Priority**: π₯ **HIGH** - Essential for scaling to 32,333 municipalities | |
| --- | |
| ### 4. MeetingBank π Summarization Research | |
| **Website**: https://meetingbank.github.io | |
| **GitHub**: Linked from site | |
| **License**: Open dataset | |
| **Focus**: 6 cities, high-quality summarization benchmark | |
| #### π₯ What to Adopt | |
| **A. Summarization Quality Benchmarks** | |
| MeetingBank is used in academic research for summarization. They have: | |
| - **Gold-standard human summaries** (for validation) | |
| - **Multiple summary lengths** (short, medium, long) | |
| - **Evaluation metrics** (ROUGE, BERTScore) | |
| ```python | |
| # MeetingBank pattern: Validate AI summaries against quality benchmarks | |
| from typing import Dict | |
| import numpy as np | |
| class SummaryQualityValidator: | |
| """ | |
| MeetingBank pattern: Ensure AI summaries meet quality standards. | |
| """ | |
| # Quality thresholds from academic research | |
| MIN_ROUGE_L = 0.25 # ROUGE-L F1 score | |
| MIN_LENGTH_RATIO = 0.05 # Summary should be 5-20% of original | |
| MAX_LENGTH_RATIO = 0.20 | |
| def validate_summary(self, original: str, summary: str) -> Dict[str, any]: | |
| """ | |
| Check if summary meets quality standards. | |
| """ | |
| # Length checks | |
| orig_words = len(original.split()) | |
| summ_words = len(summary.split()) | |
| length_ratio = summ_words / orig_words if orig_words > 0 else 0 | |
| # Basic quality checks | |
| checks = { | |
| 'length_appropriate': self.MIN_LENGTH_RATIO <= length_ratio <= self.MAX_LENGTH_RATIO, | |
| 'has_key_terms': self._check_key_terms(original, summary), | |
| 'no_repetition': self._check_repetition(summary), | |
| 'proper_structure': self._check_structure(summary), | |
| } | |
| return { | |
| 'passes_validation': all(checks.values()), | |
| 'checks': checks, | |
| 'length_ratio': length_ratio, | |
| 'word_count': summ_words, | |
| 'quality_score': sum(checks.values()) / len(checks) | |
| } | |
| def _check_key_terms(self, original: str, summary: str) -> bool: | |
| """ | |
| Ensure summary includes key terms from original. | |
| """ | |
| # Extract important terms (simplified - use TF-IDF in production) | |
| orig_words = set(original.lower().split()) | |
| summ_words = set(summary.lower().split()) | |
| # At least 30% overlap of unique terms | |
| overlap = len(orig_words & summ_words) / len(orig_words) | |
| return overlap >= 0.30 | |
| def _check_repetition(self, summary: str) -> bool: | |
| """ | |
| Check for excessive repetition (indicates poor quality). | |
| """ | |
| sentences = summary.split('.') | |
| unique_ratio = len(set(sentences)) / len(sentences) if sentences else 0 | |
| return unique_ratio >= 0.80 # At least 80% unique sentences | |
| def _check_structure(self, summary: str) -> bool: | |
| """ | |
| Check for proper summary structure. | |
| """ | |
| # Should have multiple sentences | |
| sentences = [s.strip() for s in summary.split('.') if s.strip()] | |
| return len(sentences) >= 2 and len(sentences) <= 10 | |
| ``` | |
| **Implementation Priority**: π‘ **MEDIUM** - Important for quality, but MVP can use basic summaries | |
| --- | |
| ### 5. CivicBand π Multi-Jurisdiction Search | |
| **Website**: https://civic.band | |
| **GitHub**: Linked from site (Raft Foundation) | |
| **Scale**: 1,000+ municipalities | |
| **Focus**: Google-like search across jurisdictions | |
| #### π₯ What to Adopt | |
| **A. Cross-Jurisdiction Search Architecture** | |
| CivicBand lets users search "fluoridation" and get results from **all municipalities** at once. | |
| ```python | |
| # CivicBand pattern: Federated search across jurisdictions | |
| from elasticsearch import Elasticsearch # Or Meilisearch for open-source | |
| from typing import List, Dict | |
| from models.meeting_event import MeetingEvent | |
| class CrossJurisdictionSearch: | |
| """ | |
| CivicBand pattern: Search meetings across all jurisdictions. | |
| """ | |
| def __init__(self): | |
| # Use Meilisearch (open-source) or Elasticsearch | |
| self.es = Elasticsearch(['http://localhost:9200']) | |
| self.index_name = 'meeting_events' | |
| def index_meeting(self, event: MeetingEvent): | |
| """ | |
| Add meeting to search index. | |
| """ | |
| doc = { | |
| 'id': event.id, | |
| 'title': event.title, | |
| 'description': event.description, | |
| 'jurisdiction': event.jurisdiction_name, | |
| 'state': event.state_code, | |
| 'date': event.start.isoformat(), | |
| 'full_text': self._build_searchable_text(event), | |
| 'agenda_url': next((link.href for link in event.links if 'agenda' in link.title.lower()), None), | |
| 'oral_health_relevant': event.oral_health_relevant, | |
| 'keywords': event.keywords_found | |
| } | |
| self.es.index(index=self.index_name, id=event.id, document=doc) | |
| def search( | |
| self, | |
| query: str, | |
| states: List[str] = None, | |
| date_range: tuple = None, | |
| oral_health_only: bool = False | |
| ) -> List[Dict]: | |
| """ | |
| Search across all jurisdictions. | |
| Example: | |
| search("fluoridation", states=['AL', 'GA'], oral_health_only=True) | |
| """ | |
| must_clauses = [ | |
| {"multi_match": { | |
| "query": query, | |
| "fields": ["title^3", "description^2", "full_text"], # Boost title matches | |
| "type": "best_fields" | |
| }} | |
| ] | |
| # Filter by state | |
| if states: | |
| must_clauses.append({"terms": {"state": states}}) | |
| # Filter by date range | |
| if date_range: | |
| must_clauses.append({ | |
| "range": {"date": {"gte": date_range[0], "lte": date_range[1]}} | |
| }) | |
| # Filter oral health only | |
| if oral_health_only: | |
| must_clauses.append({"term": {"oral_health_relevant": True}}) | |
| search_query = { | |
| "query": {"bool": {"must": must_clauses}}, | |
| "size": 100, | |
| "highlight": { | |
| "fields": { | |
| "title": {}, | |
| "description": {}, | |
| "full_text": {"fragment_size": 150} | |
| } | |
| }, | |
| "sort": [ | |
| {"_score": "desc"}, | |
| {"date": "desc"} | |
| ] | |
| } | |
| results = self.es.search(index=self.index_name, body=search_query) | |
| return [{ | |
| 'jurisdiction': hit['_source']['jurisdiction'], | |
| 'state': hit['_source']['state'], | |
| 'title': hit['_source']['title'], | |
| 'date': hit['_source']['date'], | |
| 'snippet': hit.get('highlight', {}).get('full_text', [''])[0], | |
| 'url': hit['_source']['agenda_url'], | |
| 'relevance_score': hit['_score'] | |
| } for hit in results['hits']['hits']] | |
| def _build_searchable_text(self, event: MeetingEvent) -> str: | |
| """ | |
| Combine all text fields for indexing. | |
| """ | |
| parts = [ | |
| event.title or '', | |
| event.description or '', | |
| ' '.join(event.keywords_found), | |
| ' '.join(link.title for link in event.links) | |
| ] | |
| return ' '.join(parts) | |
| ``` | |
| **B. Jurisdiction Faceting** | |
| ```python | |
| # CivicBand shows result counts by jurisdiction | |
| def get_search_facets(query: str) -> Dict[str, int]: | |
| """ | |
| Show how many results per jurisdiction. | |
| Example output: | |
| { | |
| 'Birmingham, AL': 12, | |
| 'Atlanta, GA': 8, | |
| 'Montgomery, AL': 5 | |
| } | |
| """ | |
| search_query = { | |
| "query": {"multi_match": {"query": query, "fields": ["title", "full_text"]}}, | |
| "size": 0, # We only want aggregations | |
| "aggs": { | |
| "by_jurisdiction": { | |
| "terms": { | |
| "field": "jurisdiction.keyword", | |
| "size": 50 # Top 50 jurisdictions | |
| }, | |
| "aggs": { | |
| "by_state": { | |
| "terms": {"field": "state.keyword"} | |
| } | |
| } | |
| } | |
| } | |
| } | |
| results = self.es.search(index=self.index_name, body=search_query) | |
| facets = {} | |
| for bucket in results['aggregations']['by_jurisdiction']['buckets']: | |
| jurisdiction = bucket['key'] | |
| count = bucket['doc_count'] | |
| state = bucket['by_state']['buckets'][0]['key'] | |
| facets[f"{jurisdiction}, {state}"] = count | |
| return facets | |
| ``` | |
| **Implementation Priority**: π‘ **MEDIUM** - Valuable for end-users, but scraping comes first | |
| --- | |
| ### 6. OpenCouncil π International Adaptability | |
| **Website**: https://opencouncil.gr | |
| **GitHub**: https://github.com/schemalabz/opencouncil | |
| **License**: Open-source | |
| **Focus**: Greek councils, but adaptable to U.S. | |
| #### π₯ What to Adopt | |
| **A. Internationalization Patterns** | |
| OpenCouncil works in Greece (different government structure). This teaches us: | |
| - **Flexible schema** (not hardcoded to U.S. structures) | |
| - **Configurable jurisdiction types** (councils, boards, commissions) | |
| - **Multi-language support** (not needed now, but good architecture) | |
| ```python | |
| # OpenCouncil pattern: Flexible jurisdiction configuration | |
| from enum import Enum | |
| from dataclasses import dataclass | |
| from typing import List, Optional | |
| class GovernmentLevel(Enum): | |
| """ | |
| OpenCouncil pattern: Support multiple government structures. | |
| """ | |
| MUNICIPAL = "municipal" # City/town councils | |
| COUNTY = "county" # County boards | |
| TOWNSHIP = "township" # Township boards | |
| SCHOOL_DISTRICT = "school" # School boards | |
| SPECIAL_DISTRICT = "special" # Water, fire, etc. | |
| STATE = "state" # State agencies (future) | |
| @dataclass | |
| class JurisdictionConfig: | |
| """ | |
| OpenCouncil pattern: Configure each jurisdiction's unique structure. | |
| """ | |
| jurisdiction_name: str | |
| government_level: GovernmentLevel | |
| # Meeting schedule | |
| typical_meeting_frequency: str # 'weekly', 'biweekly', 'monthly' | |
| typical_meeting_days: List[str] # ['Monday', 'Thursday'] | |
| typical_meeting_time: str # '18:00' | |
| # Website structure | |
| calendar_url: Optional[str] | |
| agenda_url_pattern: Optional[str] # Template: "https://example.gov/agenda-{date}" | |
| minutes_url_pattern: Optional[str] | |
| # Legislative bodies | |
| bodies: List[str] # ['City Council', 'Planning Commission', 'Board of Health'] | |
| # Custom fields | |
| metadata: dict # For jurisdiction-specific data | |
| # Example: Configure Birmingham, AL | |
| BIRMINGHAM_CONFIG = JurisdictionConfig( | |
| jurisdiction_name="Birmingham", | |
| government_level=GovernmentLevel.MUNICIPAL, | |
| typical_meeting_frequency='biweekly', | |
| typical_meeting_days=['Tuesday'], | |
| typical_meeting_time='18:00', | |
| calendar_url="https://birminghamal.gov/council/meetings", | |
| bodies=['City Council', 'Board of Health', 'Planning Commission'], | |
| metadata={'population': 200733, 'oral_health_priority': 'high'} | |
| ) | |
| ``` | |
| **Implementation Priority**: π’ **LOW** - Good architecture, but not urgent | |
| --- | |
| ## π― Implementation Roadmap | |
| ### Phase 1: AI Summarization (OpenTowns pattern) π₯ | |
| **Priority**: HIGH | |
| **Timeline**: 1-2 weeks | |
| **Depends on**: Existing OpenAI integration | |
| ```python | |
| # TODO: Implement in extraction/summarizer.py | |
| - [ ] Generate executive summaries from meeting transcripts | |
| - [ ] Extract key decisions as bullet points | |
| - [ ] Identify health policy items | |
| - [ ] Add quality validation (MeetingBank patterns) | |
| ``` | |
| ### Phase 2: Keyword Alerts (OpenTowns pattern) π₯ | |
| **Priority**: HIGH | |
| **Timeline**: 1 week | |
| **Depends on**: Meeting data ingestion | |
| ```python | |
| # TODO: Implement in alerts/keyword_monitor.py | |
| - [ ] Define oral health keyword categories | |
| - [ ] Pattern matching with word boundaries | |
| - [ ] Generate alerts for users | |
| - [ ] Email/webhook notification system | |
| ``` | |
| ### Phase 3: Scale Architecture (LocalView pattern) π₯ | |
| **Priority**: HIGH | |
| **Timeline**: 2 weeks | |
| **Depends on**: Platform scrapers | |
| ```python | |
| # TODO: Implement in discovery/batch_processor.py | |
| - [ ] Quality metrics per jurisdiction | |
| - [ ] Batch processing (100 at a time) | |
| - [ ] Failure tracking and retry | |
| - [ ] Completeness scoring | |
| ``` | |
| ### Phase 4: Multi-Jurisdiction Search (CivicBand pattern) π‘ | |
| **Priority**: MEDIUM | |
| **Timeline**: 2-3 weeks | |
| **Depends on**: Significant meeting data | |
| ```python | |
| # TODO: Implement in search/federated_search.py | |
| - [ ] Set up Elasticsearch or Meilisearch | |
| - [ ] Index all meetings | |
| - [ ] Cross-jurisdiction search API | |
| - [ ] Jurisdiction faceting | |
| ``` | |
| ### Phase 5: Quality Validation (MeetingBank pattern) π‘ | |
| **Priority**: MEDIUM | |
| **Timeline**: 1 week | |
| **Depends on**: AI summarization | |
| ```python | |
| # TODO: Implement in extraction/quality_validator.py | |
| - [ ] Summary length validation | |
| - [ ] Key term extraction | |
| - [ ] Repetition detection | |
| - [ ] Structure checking | |
| ``` | |
| ### Phase 6: Flexible Config (OpenCouncil pattern) π’ | |
| **Priority**: LOW | |
| **Timeline**: 1 week | |
| **Depends on**: None | |
| ```python | |
| # TODO: Implement in config/jurisdiction_configs.py | |
| - [ ] Per-jurisdiction configuration | |
| - [ ] Meeting schedule patterns | |
| - [ ] Legislative body tracking | |
| ``` | |
| --- | |
| ## π Comparison with Existing Integration | |
| | Capability | Original 5 Projects | New 6 Projects | Status | | |
| |------------|-------------------|---------------|--------| | |
| | Platform detection | β Civic Scraper | - | **Complete** | | |
| | Event schema | β City Scrapers | - | **Complete** | | |
| | Video ingestion | β CDP | β LocalView (scale) | **Need scale patterns** | | |
| | Matter tracking | β Engagic | - | **Complete** | | |
| | Person/vote tracking | β Councilmatic | - | Roadmapped | | |
| | **AI Summarization** | β | β OpenTowns, MeetingBank | **TODO: High priority** | | |
| | **Keyword Alerts** | β | β OpenTowns | **TODO: High priority** | | |
| | **Cross-jurisdiction search** | β οΈ Basic | β CivicBand | **TODO: Medium priority** | | |
| | **Quality metrics** | β | β LocalView, MeetingBank | **TODO: Medium priority** | | |
| | **Batch processing** | β οΈ Basic | β LocalView | **TODO: High priority** | | |
| --- | |
| ## π» Quick Start: Integrate Summarization | |
| Here's how to add OpenTowns-style summarization **right now**: | |
| ```python | |
| # File: extraction/summarizer.py | |
| from openai import OpenAI | |
| from models.meeting_event import MeetingEvent | |
| from config.settings import settings | |
| client = OpenAI(api_key=settings.openai_api_key) | |
| def summarize_meeting(event: MeetingEvent, full_text: str) -> dict: | |
| """ | |
| Generate OpenTowns-style summary with oral health focus. | |
| """ | |
| prompt = f""" | |
| You are summarizing a local government meeting for public health advocates. | |
| Meeting: {event.title} | |
| Jurisdiction: {event.jurisdiction_name}, {event.state_code} | |
| Date: {event.start.strftime('%B %d, %Y')} | |
| Full text (first 8000 chars): | |
| {full_text[:8000]} | |
| Provide: | |
| 1. Executive Summary (2-3 sentences) | |
| 2. Key Decisions (bullet list) | |
| 3. Oral Health Items (if any - fluoridation, dental access, etc.) | |
| 4. Next Actions (follow-ups, future meetings) | |
| Focus on: What was decided? What's happening next? | |
| """ | |
| response = client.chat.completions.create( | |
| model="gpt-4o-mini", | |
| messages=[ | |
| {"role": "system", "content": "You summarize local government meetings for public understanding."}, | |
| {"role": "user", "content": prompt} | |
| ], | |
| temperature=0.3 | |
| ) | |
| return { | |
| 'summary': response.choices[0].message.content, | |
| 'model': 'gpt-4o-mini', | |
| 'tokens_used': response.usage.total_tokens | |
| } | |
| # Usage: | |
| # summary = summarize_meeting(event, full_transcript) | |
| # event.description = summary['summary'] | |
| ``` | |
| --- | |
| ## π¬ Next Steps | |
| 1. **Implement AI summarization** (OpenTowns pattern) β Makes data usable | |
| 2. **Add keyword alerts** (OpenTowns pattern) β Engage advocates | |
| 3. **Add batch processing** (LocalView pattern) β Scale to 1,000+ jurisdictions | |
| 4. **Build search interface** (CivicBand pattern) β User discovery | |
| 5. **Add quality metrics** (LocalView + MeetingBank) β Monitor data health | |
| --- | |
| ## π References | |
| - **OpenTowns**: https://opentowns.org | |
| - **LocalView**: https://www.localview.net | |
| - **MeetingBank**: https://meetingbank.github.io | |
| - **CivicBand**: https://civic.band | |
| - **OpenCouncil**: https://github.com/schemalabz/opencouncil | |
| - **Council Data Project**: https://councildataproject.org (see INTEGRATION_GUIDE.md) | |
| --- | |
| ## π License & Attribution | |
| All patterns documented here are derived from open-source projects: | |
| - OpenTowns: Open civic-tech project | |
| - LocalView: Open-source (Harvard Mellon Urbanism) | |
| - MeetingBank: Open dataset | |
| - CivicBand: Open-source (Raft Foundation) | |
| - OpenCouncil: Open-source (MIT) | |
| - CDP: MIT License | |
| When using code patterns, maintain attribution per each project's license. | |