slack_url_bot / PRD.md
tonebeta's picture
feat: Set up initial project structure for slack_url_bot including core application, dependencies, documentation, and build automation.
6468574

Slack URL Summarizer Bot - Technical Specification

1. Project Overview

1.1 Purpose

Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.

1.2 Key Features

  • Automatic URL detection in Slack messages
  • Web content extraction and parsing
  • Content summarization using AI
  • Translation to Traditional Chinese
  • Automated response posting to Slack channels

2. Functional Requirements

2.1 Core Functionality

Feature Description Priority
URL Detection Detect and extract URLs from Slack messages High
Content Extraction Extract main content from web pages High
Content Summarization Generate concise summaries of extracted content High
Translation Translate summaries to Traditional Chinese High
Slack Integration Post results back to originating channel High

2.2 User Stories

  • As a Slack user, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
  • As a team member, I want summaries posted in the same channel, so everyone can benefit from the content
  • As a user, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently

3. Technical Architecture

3.1 System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Slack App     β”‚    β”‚   Web Server    β”‚    β”‚   AI Services   β”‚
β”‚                 β”‚    β”‚   (FastAPI)     β”‚    β”‚                 β”‚
β”‚ β€’ Events API    │◄──►│ β€’ URL Extract   │◄──►│ β€’ OpenAI API    β”‚
β”‚ β€’ Bot Token     β”‚    β”‚ β€’ Content Parse β”‚    β”‚ β€’ Summarization β”‚
β”‚ β€’ Webhooks      β”‚    β”‚ β€’ Response Send β”‚    β”‚ β€’ Translation   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.2 Technology Stack

  • Backend Framework: Python + FastAPI
  • Slack Integration: Slack Bolt SDK for Python
  • Content Extraction: newspaper3k / readability-lxml
  • AI Services: Azure OpenAI API
  • HTTP Client: httpx
  • Deployment: Docker + Cloud hosting (AWS/GCP/Azure)
  • Database: Redis (for caching) - Optional

4. Implementation Details

4.1 Slack Bot Setup

Required Scopes

  • chat:write - Post messages to channels
  • channels:read - Read channel information
  • app_mentions:read - Read mentions
  • channels:history - Read channel messages

Event Subscriptions

  • message.channels - Listen to channel messages
  • app_mention - Listen to bot mentions

4.2 Core Components

4.2.1 URL Detection Module

import re

def extract_urls(text: str) -> List[str]:
    """Extract all URLs from message text"""
    pattern = r'https?://[^\s<>"{\[\]|\\^`]+'
    return re.findall(pattern, text)

4.2.2 Content Extraction Module

from newspaper import Article

def extract_content(url: str) -> dict:
    """Extract main content from URL"""
    try:
        article = Article(url)
        article.download()
        article.parse()
        return {
            'title': article.title,
            'text': article.text,
            'authors': article.authors,
            'publish_date': article.publish_date
        }
    except Exception as e:
        return {'error': str(e)}

4.2.3 AI Processing Module

import httpx
import os
from dotenv import load_dotenv

load_dotenv()

def summarize_and_translate(text: str) -> str:
    """Summarize content and translate to Traditional Chinese using Azure OpenAI"""
    url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
    headers = {
        "Content-Type": "application/json",
        "api-key": os.getenv("AZURE_OPENAI_API_KEY"),
    }
    body = {
        "messages": [
            {
                "role": "user",
                "content": f"θ«‹ε°‡δ»₯δΈ‹ζ–‡η« ζ‘˜θ¦ζˆ 3–5 ε₯οΌŒδΈ¦ηΏ»θ­―η‚ΊηΉι«”δΈ­ζ–‡οΌš\n\n{text}"
            }
        ],
        "temperature": 0.7
    }

    resp = httpx.post(url, headers=headers, json=body)
    resp.raise_for_status()
    return resp.json()["choices"][0]["message"]["content"].strip()

5. API Specifications

5.1 Slack Event Handler

@app.event("message")
def handle_message(event, say):
    """Handle incoming Slack messages"""
    # Extract URLs from message
    urls = extract_urls(event.get('text', ''))
    
    if not urls:
        return
    
    # Process each URL
    for url in urls:
        process_url_async(url, event['channel'], say)

5.2 URL Processing Pipeline

async def process_url_async(url: str, channel: str, say):
    """Asynchronous URL processing pipeline"""
    try:
        # Step 1: Extract content
        content = extract_content(url)
        
        # Step 2: Summarize and translate
        summary = summarize_and_translate(content['text'])
        
        # Step 3: Format and send response
        response = format_response(url, content['title'], summary)
        say(channel=channel, text=response)
        
    except Exception as e:
        error_message = f"❌ θ™•η†ηΆ²ε€ζ™‚η™Όη”ŸιŒ―θͺ€: {url}"
        say(channel=channel, text=error_message)

6. Error Handling & Edge Cases

6.1 URL Validation

  • Invalid URLs (malformed, unreachable)
  • Protected content (login required, paywalls)
  • Unsupported content types (PDFs, images, videos)
  • Rate limiting from target websites

6.2 Content Processing

  • Empty or insufficient content
  • Non-text content (images, videos)
  • Multiple languages in source content
  • Extremely long articles (token limits)

6.3 API Failures

  • Azure OpenAI API rate limits
  • Network timeouts
  • Slack API failures
  • Service unavailability

7. Response Format

7.1 Successful Response Template

πŸ”— **εŽŸε§‹ηΆ²ε€**: {url}
πŸ“° **ζ¨™ι‘Œ**: {title}
πŸ“ **δΈ­ζ–‡ζ‘˜θ¦**:
{summary}

---
⏰ 處理時間: {timestamp}

7.2 Error Response Template

❌ **處理倱敗**: {url}
πŸ” **錯θͺ€εŽŸε› **: {error_message}
πŸ’‘ **ε»Ίθ­°**: θ«‹ζͺ’ζŸ₯ηΆ²ε€ζ˜―ε¦ζ­£η’Ίζˆ–η¨εΎŒε†θ©¦

8. Performance Requirements

8.1 Response Time

  • URL processing: < 30 seconds
  • Simple pages: < 10 seconds
  • Complex pages: < 20 seconds

8.2 Throughput

  • Support 100 concurrent requests
  • Handle 1000 URLs per hour
  • Rate limiting: 5 requests per user per minute

9. Security Considerations

9.1 Input Validation

  • URL sanitization and validation
  • Content length limits
  • Malicious URL detection

9.2 API Security

  • Secure storage of API keys
  • Rate limiting implementation
  • Request logging and monitoring

10. Deployment Strategy

10.1 Environment Setup

  • Development: Local Docker containers
  • Staging: Cloud-based testing environment
  • Production: Container orchestration (Kubernetes/ECS)

10.2 Configuration Management

# Environment variables
SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')

11. Testing Strategy

11.1 Unit Tests

  • URL extraction logic
  • Content parsing functions
  • AI API integration
  • Response formatting

11.2 Integration Tests

  • End-to-end Slack workflow
  • External API interactions
  • Error handling scenarios

12. Monitoring & Logging

12.1 Metrics to Track

  • Response times
  • Success/failure rates
  • API usage costs
  • User engagement

12.2 Logging Requirements

  • All URL processing attempts
  • API call results
  • Error occurrences
  • Performance metrics

13. Future Enhancements

13.1 Phase 2 Features

  • Multiple language support
  • Custom summary length options
  • Content caching for repeated URLs
  • User preference settings

13.2 Advanced Features

  • Batch URL processing
  • Scheduled summary delivery
  • Content categorization
  • Analytics dashboard

14. Acceptance Criteria

14.1 MVP Success Criteria

  • Bot responds to URLs in Slack messages
  • Successfully extracts content from common websites
  • Generates coherent Chinese summaries
  • Posts formatted responses to correct channels
  • Handles basic error scenarios gracefully

14.2 Quality Gates

  • 95% uptime requirement
  • <20 second average response time
  • <5% error rate for valid URLs
  • Positive user feedback (>4.0/5.0)

15. Timeline & Milestones

Phase Duration Deliverables
Setup & Planning 1 week Project setup, Slack app creation
Core Development 2 weeks Basic URL processing pipeline
AI Integration 1 week Summarization and translation
Testing & Debugging 1 week Unit tests, integration tests
Deployment 1 week Production deployment, monitoring
Total 6 weeks Production-ready MVP