Spaces:
Sleeping
Sleeping
| # Slack URL Summarizer Bot - Technical Specification | |
| ## 1. Project Overview | |
| ### 1.1 Purpose | |
| Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel. | |
| ### 1.2 Key Features | |
| - Automatic URL detection in Slack messages | |
| - Web content extraction and parsing | |
| - Content summarization using AI | |
| - Translation to Traditional Chinese | |
| - Automated response posting to Slack channels | |
| ## 2. Functional Requirements | |
| ### 2.1 Core Functionality | |
| | Feature | Description | Priority | | |
| |---------|-------------|----------| | |
| | URL Detection | Detect and extract URLs from Slack messages | High | | |
| | Content Extraction | Extract main content from web pages | High | | |
| | Content Summarization | Generate concise summaries of extracted content | High | | |
| | Translation | Translate summaries to Traditional Chinese | High | | |
| | Slack Integration | Post results back to originating channel | High | | |
| ### 2.2 User Stories | |
| - **As a Slack user**, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article | |
| - **As a team member**, I want summaries posted in the same channel, so everyone can benefit from the content | |
| - **As a user**, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently | |
| ## 3. Technical Architecture | |
| ### 3.1 System Architecture | |
| ``` | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| β Slack App β β Web Server β β AI Services β | |
| β β β (FastAPI) β β β | |
| β β’ Events API βββββΊβ β’ URL Extract βββββΊβ β’ OpenAI API β | |
| β β’ Bot Token β β β’ Content Parse β β β’ Summarization β | |
| β β’ Webhooks β β β’ Response Send β β β’ Translation β | |
| βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ | |
| ``` | |
| ### 3.2 Technology Stack | |
| - **Backend Framework**: Python + FastAPI | |
| - **Slack Integration**: Slack Bolt SDK for Python | |
| - **Content Extraction**: newspaper3k / readability-lxml | |
| - **AI Services**: Azure OpenAI API | |
| - **HTTP Client**: httpx | |
| - **Deployment**: Docker + Cloud hosting (AWS/GCP/Azure) | |
| - **Database**: Redis (for caching) - Optional | |
| ## 4. Implementation Details | |
| ### 4.1 Slack Bot Setup | |
| #### Required Scopes | |
| - `chat:write` - Post messages to channels | |
| - `channels:read` - Read channel information | |
| - `app_mentions:read` - Read mentions | |
| - `channels:history` - Read channel messages | |
| #### Event Subscriptions | |
| - `message.channels` - Listen to channel messages | |
| - `app_mention` - Listen to bot mentions | |
| ### 4.2 Core Components | |
| #### 4.2.1 URL Detection Module | |
| ```python | |
| import re | |
| def extract_urls(text: str) -> List[str]: | |
| """Extract all URLs from message text""" | |
| pattern = r'https?://[^\s<>"{\[\]|\\^`]+' | |
| return re.findall(pattern, text) | |
| ``` | |
| #### 4.2.2 Content Extraction Module | |
| ```python | |
| from newspaper import Article | |
| def extract_content(url: str) -> dict: | |
| """Extract main content from URL""" | |
| try: | |
| article = Article(url) | |
| article.download() | |
| article.parse() | |
| return { | |
| 'title': article.title, | |
| 'text': article.text, | |
| 'authors': article.authors, | |
| 'publish_date': article.publish_date | |
| } | |
| except Exception as e: | |
| return {'error': str(e)} | |
| ``` | |
| #### 4.2.3 AI Processing Module | |
| ```python | |
| import httpx | |
| import os | |
| from dotenv import load_dotenv | |
| load_dotenv() | |
| def summarize_and_translate(text: str) -> str: | |
| """Summarize content and translate to Traditional Chinese using Azure OpenAI""" | |
| url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}" | |
| headers = { | |
| "Content-Type": "application/json", | |
| "api-key": os.getenv("AZURE_OPENAI_API_KEY"), | |
| } | |
| body = { | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": f"θ«ε°δ»₯δΈζη« ζθ¦ζ 3β5 ε₯οΌδΈ¦ηΏ»θ―ηΊηΉι«δΈζοΌ\n\n{text}" | |
| } | |
| ], | |
| "temperature": 0.7 | |
| } | |
| resp = httpx.post(url, headers=headers, json=body) | |
| resp.raise_for_status() | |
| return resp.json()["choices"][0]["message"]["content"].strip() | |
| ``` | |
| ## 5. API Specifications | |
| ### 5.1 Slack Event Handler | |
| ```python | |
| @app.event("message") | |
| def handle_message(event, say): | |
| """Handle incoming Slack messages""" | |
| # Extract URLs from message | |
| urls = extract_urls(event.get('text', '')) | |
| if not urls: | |
| return | |
| # Process each URL | |
| for url in urls: | |
| process_url_async(url, event['channel'], say) | |
| ``` | |
| ### 5.2 URL Processing Pipeline | |
| ```python | |
| async def process_url_async(url: str, channel: str, say): | |
| """Asynchronous URL processing pipeline""" | |
| try: | |
| # Step 1: Extract content | |
| content = extract_content(url) | |
| # Step 2: Summarize and translate | |
| summary = summarize_and_translate(content['text']) | |
| # Step 3: Format and send response | |
| response = format_response(url, content['title'], summary) | |
| say(channel=channel, text=response) | |
| except Exception as e: | |
| error_message = f"β θηηΆ²εζηΌηι―θͺ€: {url}" | |
| say(channel=channel, text=error_message) | |
| ``` | |
| ## 6. Error Handling & Edge Cases | |
| ### 6.1 URL Validation | |
| - Invalid URLs (malformed, unreachable) | |
| - Protected content (login required, paywalls) | |
| - Unsupported content types (PDFs, images, videos) | |
| - Rate limiting from target websites | |
| ### 6.2 Content Processing | |
| - Empty or insufficient content | |
| - Non-text content (images, videos) | |
| - Multiple languages in source content | |
| - Extremely long articles (token limits) | |
| ### 6.3 API Failures | |
| - Azure OpenAI API rate limits | |
| - Network timeouts | |
| - Slack API failures | |
| - Service unavailability | |
| ## 7. Response Format | |
| ### 7.1 Successful Response Template | |
| ``` | |
| π **εε§ηΆ²ε**: {url} | |
| π° **ζ¨ι‘**: {title} | |
| π **δΈζζθ¦**: | |
| {summary} | |
| --- | |
| β° θηζι: {timestamp} | |
| ``` | |
| ### 7.2 Error Response Template | |
| ``` | |
| β **θηε€±ζ**: {url} | |
| π **ι―θͺ€εε **: {error_message} | |
| π‘ **ε»Ίθ°**: θ«ζͺ’ζ₯ηΆ²εζ―ε¦ζ£η’Ίζη¨εΎε試 | |
| ``` | |
| ## 8. Performance Requirements | |
| ### 8.1 Response Time | |
| - URL processing: < 30 seconds | |
| - Simple pages: < 10 seconds | |
| - Complex pages: < 20 seconds | |
| ### 8.2 Throughput | |
| - Support 100 concurrent requests | |
| - Handle 1000 URLs per hour | |
| - Rate limiting: 5 requests per user per minute | |
| ## 9. Security Considerations | |
| ### 9.1 Input Validation | |
| - URL sanitization and validation | |
| - Content length limits | |
| - Malicious URL detection | |
| ### 9.2 API Security | |
| - Secure storage of API keys | |
| - Rate limiting implementation | |
| - Request logging and monitoring | |
| ## 10. Deployment Strategy | |
| ### 10.1 Environment Setup | |
| - Development: Local Docker containers | |
| - Staging: Cloud-based testing environment | |
| - Production: Container orchestration (Kubernetes/ECS) | |
| ### 10.2 Configuration Management | |
| ```python | |
| # Environment variables | |
| SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN') | |
| SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET') | |
| AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT') | |
| AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY') | |
| AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME') | |
| AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION') | |
| ``` | |
| ## 11. Testing Strategy | |
| ### 11.1 Unit Tests | |
| - URL extraction logic | |
| - Content parsing functions | |
| - AI API integration | |
| - Response formatting | |
| ### 11.2 Integration Tests | |
| - End-to-end Slack workflow | |
| - External API interactions | |
| - Error handling scenarios | |
| ## 12. Monitoring & Logging | |
| ### 12.1 Metrics to Track | |
| - Response times | |
| - Success/failure rates | |
| - API usage costs | |
| - User engagement | |
| ### 12.2 Logging Requirements | |
| - All URL processing attempts | |
| - API call results | |
| - Error occurrences | |
| - Performance metrics | |
| ## 13. Future Enhancements | |
| ### 13.1 Phase 2 Features | |
| - Multiple language support | |
| - Custom summary length options | |
| - Content caching for repeated URLs | |
| - User preference settings | |
| ### 13.2 Advanced Features | |
| - Batch URL processing | |
| - Scheduled summary delivery | |
| - Content categorization | |
| - Analytics dashboard | |
| ## 14. Acceptance Criteria | |
| ### 14.1 MVP Success Criteria | |
| - [x] Bot responds to URLs in Slack messages | |
| - [x] Successfully extracts content from common websites | |
| - [x] Generates coherent Chinese summaries | |
| - [x] Posts formatted responses to correct channels | |
| - [x] Handles basic error scenarios gracefully | |
| ### 14.2 Quality Gates | |
| - 95% uptime requirement | |
| - <20 second average response time | |
| - <5% error rate for valid URLs | |
| - Positive user feedback (>4.0/5.0) | |
| ## 15. Timeline & Milestones | |
| | Phase | Duration | Deliverables | | |
| |-------|----------|-------------| | |
| | Setup & Planning | 1 week | Project setup, Slack app creation | | |
| | Core Development | 2 weeks | Basic URL processing pipeline | | |
| | AI Integration | 1 week | Summarization and translation | | |
| | Testing & Debugging | 1 week | Unit tests, integration tests | | |
| | Deployment | 1 week | Production deployment, monitoring | | |
| | **Total** | **6 weeks** | **Production-ready MVP** | |