# Slack URL Summarizer Bot - Technical Specification ## 1. Project Overview ### 1.1 Purpose Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel. ### 1.2 Key Features - Automatic URL detection in Slack messages - Web content extraction and parsing - Content summarization using AI - Translation to Traditional Chinese - Automated response posting to Slack channels ## 2. Functional Requirements ### 2.1 Core Functionality | Feature | Description | Priority | |---------|-------------|----------| | URL Detection | Detect and extract URLs from Slack messages | High | | Content Extraction | Extract main content from web pages | High | | Content Summarization | Generate concise summaries of extracted content | High | | Translation | Translate summaries to Traditional Chinese | High | | Slack Integration | Post results back to originating channel | High | ### 2.2 User Stories - **As a Slack user**, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article - **As a team member**, I want summaries posted in the same channel, so everyone can benefit from the content - **As a user**, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently ## 3. Technical Architecture ### 3.1 System Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Slack App │ │ Web Server │ │ AI Services │ │ │ │ (FastAPI) │ │ │ │ • Events API │◄──►│ • URL Extract │◄──►│ • OpenAI API │ │ • Bot Token │ │ • Content Parse │ │ • Summarization │ │ • Webhooks │ │ • Response Send │ │ • Translation │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ### 3.2 Technology Stack - **Backend Framework**: Python + FastAPI - **Slack Integration**: Slack Bolt SDK for Python - **Content Extraction**: newspaper3k / readability-lxml - **AI Services**: Azure OpenAI API - **HTTP Client**: httpx - **Deployment**: Docker + Cloud hosting (AWS/GCP/Azure) - **Database**: Redis (for caching) - Optional ## 4. Implementation Details ### 4.1 Slack Bot Setup #### Required Scopes - `chat:write` - Post messages to channels - `channels:read` - Read channel information - `app_mentions:read` - Read mentions - `channels:history` - Read channel messages #### Event Subscriptions - `message.channels` - Listen to channel messages - `app_mention` - Listen to bot mentions ### 4.2 Core Components #### 4.2.1 URL Detection Module ```python import re def extract_urls(text: str) -> List[str]: """Extract all URLs from message text""" pattern = r'https?://[^\s<>"{\[\]|\\^`]+' return re.findall(pattern, text) ``` #### 4.2.2 Content Extraction Module ```python from newspaper import Article def extract_content(url: str) -> dict: """Extract main content from URL""" try: article = Article(url) article.download() article.parse() return { 'title': article.title, 'text': article.text, 'authors': article.authors, 'publish_date': article.publish_date } except Exception as e: return {'error': str(e)} ``` #### 4.2.3 AI Processing Module ```python import httpx import os from dotenv import load_dotenv load_dotenv() def summarize_and_translate(text: str) -> str: """Summarize content and translate to Traditional Chinese using Azure OpenAI""" url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}" headers = { "Content-Type": "application/json", "api-key": os.getenv("AZURE_OPENAI_API_KEY"), } body = { "messages": [ { "role": "user", "content": f"請將以下文章摘要成 3–5 句,並翻譯為繁體中文:\n\n{text}" } ], "temperature": 0.7 } resp = httpx.post(url, headers=headers, json=body) resp.raise_for_status() return resp.json()["choices"][0]["message"]["content"].strip() ``` ## 5. API Specifications ### 5.1 Slack Event Handler ```python @app.event("message") def handle_message(event, say): """Handle incoming Slack messages""" # Extract URLs from message urls = extract_urls(event.get('text', '')) if not urls: return # Process each URL for url in urls: process_url_async(url, event['channel'], say) ``` ### 5.2 URL Processing Pipeline ```python async def process_url_async(url: str, channel: str, say): """Asynchronous URL processing pipeline""" try: # Step 1: Extract content content = extract_content(url) # Step 2: Summarize and translate summary = summarize_and_translate(content['text']) # Step 3: Format and send response response = format_response(url, content['title'], summary) say(channel=channel, text=response) except Exception as e: error_message = f"❌ 處理網址時發生錯誤: {url}" say(channel=channel, text=error_message) ``` ## 6. Error Handling & Edge Cases ### 6.1 URL Validation - Invalid URLs (malformed, unreachable) - Protected content (login required, paywalls) - Unsupported content types (PDFs, images, videos) - Rate limiting from target websites ### 6.2 Content Processing - Empty or insufficient content - Non-text content (images, videos) - Multiple languages in source content - Extremely long articles (token limits) ### 6.3 API Failures - Azure OpenAI API rate limits - Network timeouts - Slack API failures - Service unavailability ## 7. Response Format ### 7.1 Successful Response Template ``` 🔗 **原始網址**: {url} 📰 **標題**: {title} 📝 **中文摘要**: {summary} --- ⏰ 處理時間: {timestamp} ``` ### 7.2 Error Response Template ``` ❌ **處理失敗**: {url} 🔍 **錯誤原因**: {error_message} 💡 **建議**: 請檢查網址是否正確或稍後再試 ``` ## 8. Performance Requirements ### 8.1 Response Time - URL processing: < 30 seconds - Simple pages: < 10 seconds - Complex pages: < 20 seconds ### 8.2 Throughput - Support 100 concurrent requests - Handle 1000 URLs per hour - Rate limiting: 5 requests per user per minute ## 9. Security Considerations ### 9.1 Input Validation - URL sanitization and validation - Content length limits - Malicious URL detection ### 9.2 API Security - Secure storage of API keys - Rate limiting implementation - Request logging and monitoring ## 10. Deployment Strategy ### 10.1 Environment Setup - Development: Local Docker containers - Staging: Cloud-based testing environment - Production: Container orchestration (Kubernetes/ECS) ### 10.2 Configuration Management ```python # Environment variables SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN') SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET') AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT') AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY') AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME') AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION') ``` ## 11. Testing Strategy ### 11.1 Unit Tests - URL extraction logic - Content parsing functions - AI API integration - Response formatting ### 11.2 Integration Tests - End-to-end Slack workflow - External API interactions - Error handling scenarios ## 12. Monitoring & Logging ### 12.1 Metrics to Track - Response times - Success/failure rates - API usage costs - User engagement ### 12.2 Logging Requirements - All URL processing attempts - API call results - Error occurrences - Performance metrics ## 13. Future Enhancements ### 13.1 Phase 2 Features - Multiple language support - Custom summary length options - Content caching for repeated URLs - User preference settings ### 13.2 Advanced Features - Batch URL processing - Scheduled summary delivery - Content categorization - Analytics dashboard ## 14. Acceptance Criteria ### 14.1 MVP Success Criteria - [x] Bot responds to URLs in Slack messages - [x] Successfully extracts content from common websites - [x] Generates coherent Chinese summaries - [x] Posts formatted responses to correct channels - [x] Handles basic error scenarios gracefully ### 14.2 Quality Gates - 95% uptime requirement - <20 second average response time - <5% error rate for valid URLs - Positive user feedback (>4.0/5.0) ## 15. Timeline & Milestones | Phase | Duration | Deliverables | |-------|----------|-------------| | Setup & Planning | 1 week | Project setup, Slack app creation | | Core Development | 2 weeks | Basic URL processing pipeline | | AI Integration | 1 week | Summarization and translation | | Testing & Debugging | 1 week | Unit tests, integration tests | | Deployment | 1 week | Production deployment, monitoring | | **Total** | **6 weeks** | **Production-ready MVP** |