Spaces:

fongci
/

slack_url_bot

Running

feat: Set up initial project structure for slack_url_bot including core application, dependencies, documentation, and build automation.

6468574 19 days ago

preview code

raw

history blame contribute delete

9.81 kB

Slack URL Summarizer Bot - Technical Specification

1. Project Overview

1.1 Purpose

Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.

1.2 Key Features

Automatic URL detection in Slack messages
Web content extraction and parsing
Content summarization using AI
Translation to Traditional Chinese
Automated response posting to Slack channels

2. Functional Requirements

2.1 Core Functionality

Feature	Description	Priority
URL Detection	Detect and extract URLs from Slack messages	High
Content Extraction	Extract main content from web pages	High
Content Summarization	Generate concise summaries of extracted content	High
Translation	Translate summaries to Traditional Chinese	High
Slack Integration	Post results back to originating channel	High

2.2 User Stories

As a Slack user, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
As a team member, I want summaries posted in the same channel, so everyone can benefit from the content
As a user, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently

3. Technical Architecture

3.1 System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Slack App     │    │   Web Server    │    │   AI Services   │
│                 │    │   (FastAPI)     │    │                 │
│ • Events API    │◄──►│ • URL Extract   │◄──►│ • OpenAI API    │
│ • Bot Token     │    │ • Content Parse │    │ • Summarization │
│ • Webhooks      │    │ • Response Send │    │ • Translation   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

3.2 Technology Stack

Backend Framework: Python + FastAPI
Slack Integration: Slack Bolt SDK for Python
Content Extraction: newspaper3k / readability-lxml
AI Services: Azure OpenAI API
HTTP Client: httpx
Deployment: Docker + Cloud hosting (AWS/GCP/Azure)
Database: Redis (for caching) - Optional

4. Implementation Details

4.1 Slack Bot Setup

Required Scopes

chat:write - Post messages to channels
channels:read - Read channel information
app_mentions:read - Read mentions
channels:history - Read channel messages

Event Subscriptions

message.channels - Listen to channel messages
app_mention - Listen to bot mentions

4.2 Core Components

4.2.1 URL Detection Module

import re

def extract_urls(text: str) -> List[str]:
    """Extract all URLs from message text"""
    pattern = r'https?://[^\s<>"{\[\]|\\^`]+'
    return re.findall(pattern, text)

4.2.2 Content Extraction Module

from newspaper import Article

def extract_content(url: str) -> dict:
    """Extract main content from URL"""
    try:
        article = Article(url)
        article.download()
        article.parse()
        return {
            'title': article.title,
            'text': article.text,
            'authors': article.authors,
            'publish_date': article.publish_date
        }
    except Exception as e:
        return {'error': str(e)}

4.2.3 AI Processing Module

import httpx
import os
from dotenv import load_dotenv

load_dotenv()

def summarize_and_translate(text: str) -> str:
    """Summarize content and translate to Traditional Chinese using Azure OpenAI"""
    url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
    headers = {
        "Content-Type": "application/json",
        "api-key": os.getenv("AZURE_OPENAI_API_KEY"),
    }
    body = {
        "messages": [
            {
                "role": "user",
                "content": f"請將以下文章摘要成 3–5 句，並翻譯為繁體中文：\n\n{text}"
            }
        ],
        "temperature": 0.7
    }

    resp = httpx.post(url, headers=headers, json=body)
    resp.raise_for_status()
    return resp.json()["choices"][0]["message"]["content"].strip()

5. API Specifications

5.1 Slack Event Handler

@app.event("message")
def handle_message(event, say):
    """Handle incoming Slack messages"""
    # Extract URLs from message
    urls = extract_urls(event.get('text', ''))
    
    if not urls:
        return
    
    # Process each URL
    for url in urls:
        process_url_async(url, event['channel'], say)

5.2 URL Processing Pipeline

async def process_url_async(url: str, channel: str, say):
    """Asynchronous URL processing pipeline"""
    try:
        # Step 1: Extract content
        content = extract_content(url)
        
        # Step 2: Summarize and translate
        summary = summarize_and_translate(content['text'])
        
        # Step 3: Format and send response
        response = format_response(url, content['title'], summary)
        say(channel=channel, text=response)
        
    except Exception as e:
        error_message = f"❌ 處理網址時發生錯誤: {url}"
        say(channel=channel, text=error_message)

6. Error Handling & Edge Cases

6.1 URL Validation

Invalid URLs (malformed, unreachable)
Protected content (login required, paywalls)
Unsupported content types (PDFs, images, videos)
Rate limiting from target websites

6.2 Content Processing

Empty or insufficient content
Non-text content (images, videos)
Multiple languages in source content
Extremely long articles (token limits)

6.3 API Failures

Azure OpenAI API rate limits
Network timeouts
Slack API failures
Service unavailability

7. Response Format

7.1 Successful Response Template

🔗 **原始網址**: {url}
📰 **標題**: {title}
📝 **中文摘要**:
{summary}

---
⏰ 處理時間: {timestamp}

7.2 Error Response Template

❌ **處理失敗**: {url}
🔍 **錯誤原因**: {error_message}
💡 **建議**: 請檢查網址是否正確或稍後再試

8. Performance Requirements

8.1 Response Time

URL processing: < 30 seconds
Simple pages: < 10 seconds
Complex pages: < 20 seconds

8.2 Throughput

Support 100 concurrent requests
Handle 1000 URLs per hour
Rate limiting: 5 requests per user per minute

9. Security Considerations

9.1 Input Validation

URL sanitization and validation
Content length limits
Malicious URL detection

9.2 API Security

Secure storage of API keys
Rate limiting implementation
Request logging and monitoring

10. Deployment Strategy

10.1 Environment Setup

Development: Local Docker containers
Staging: Cloud-based testing environment
Production: Container orchestration (Kubernetes/ECS)

10.2 Configuration Management

# Environment variables
SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')

11. Testing Strategy

11.1 Unit Tests

URL extraction logic
Content parsing functions
AI API integration
Response formatting

11.2 Integration Tests

End-to-end Slack workflow
External API interactions
Error handling scenarios

12. Monitoring & Logging

12.1 Metrics to Track

Response times
Success/failure rates
API usage costs
User engagement

12.2 Logging Requirements

All URL processing attempts
API call results
Error occurrences
Performance metrics

13. Future Enhancements

13.1 Phase 2 Features

Multiple language support
Custom summary length options
Content caching for repeated URLs
User preference settings

13.2 Advanced Features

Batch URL processing
Scheduled summary delivery
Content categorization
Analytics dashboard

14. Acceptance Criteria

14.1 MVP Success Criteria

Bot responds to URLs in Slack messages
Successfully extracts content from common websites
Generates coherent Chinese summaries
Posts formatted responses to correct channels
Handles basic error scenarios gracefully

14.2 Quality Gates

95% uptime requirement
<20 second average response time
<5% error rate for valid URLs
Positive user feedback (>4.0/5.0)

15. Timeline & Milestones

Phase	Duration	Deliverables
Setup & Planning	1 week	Project setup, Slack app creation
Core Development	2 weeks	Basic URL processing pipeline
AI Integration	1 week	Summarization and translation
Testing & Debugging	1 week	Unit tests, integration tests
Deployment	1 week	Production deployment, monitoring
Total	6 weeks	Production-ready MVP