# Slack URL Summarizer Bot - Technical Specification

## 1. Project Overview

### 1.1 Purpose
Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.

### 1.2 Key Features
- Automatic URL detection in Slack messages
- Web content extraction and parsing
- Content summarization using AI
- Translation to Traditional Chinese
- Automated response posting to Slack channels

## 2. Functional Requirements

### 2.1 Core Functionality
| Feature | Description | Priority |
|---------|-------------|----------|
| URL Detection | Detect and extract URLs from Slack messages | High |
| Content Extraction | Extract main content from web pages | High |
| Content Summarization | Generate concise summaries of extracted content | High |
| Translation | Translate summaries to Traditional Chinese | High |
| Slack Integration | Post results back to originating channel | High |

### 2.2 User Stories
- **As a Slack user**, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
- **As a team member**, I want summaries posted in the same channel, so everyone can benefit from the content
- **As a user**, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently

## 3. Technical Architecture

### 3.1 System Architecture
```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Slack App     │    │   Web Server    │    │   AI Services   │
│                 │    │   (FastAPI)     │    │                 │
│ • Events API    │◄──►│ • URL Extract   │◄──►│ • OpenAI API    │
│ • Bot Token     │    │ • Content Parse │    │ • Summarization │
│ • Webhooks      │    │ • Response Send │    │ • Translation   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
```

### 3.2 Technology Stack
- **Backend Framework**: Python + FastAPI
- **Slack Integration**: Slack Bolt SDK for Python
- **Content Extraction**: newspaper3k / readability-lxml
- **AI Services**: Azure OpenAI API
- **HTTP Client**: httpx
- **Deployment**: Docker + Cloud hosting (AWS/GCP/Azure)
- **Database**: Redis (for caching) - Optional

## 4. Implementation Details

### 4.1 Slack Bot Setup
#### Required Scopes
- `chat:write` - Post messages to channels
- `channels:read` - Read channel information
- `app_mentions:read` - Read mentions
- `channels:history` - Read channel messages

#### Event Subscriptions
- `message.channels` - Listen to channel messages
- `app_mention` - Listen to bot mentions

### 4.2 Core Components

#### 4.2.1 URL Detection Module
```python
import re

def extract_urls(text: str) -> List[str]:
    """Extract all URLs from message text"""
    pattern = r'https?://[^\s<>"{\[\]|\\^`]+'
    return re.findall(pattern, text)
```

#### 4.2.2 Content Extraction Module
```python
from newspaper import Article

def extract_content(url: str) -> dict:
    """Extract main content from URL"""
    try:
        article = Article(url)
        article.download()
        article.parse()
        return {
            'title': article.title,
            'text': article.text,
            'authors': article.authors,
            'publish_date': article.publish_date
        }
    except Exception as e:
        return {'error': str(e)}
```

#### 4.2.3 AI Processing Module
```python
import httpx
import os
from dotenv import load_dotenv

load_dotenv()

def summarize_and_translate(text: str) -> str:
    """Summarize content and translate to Traditional Chinese using Azure OpenAI"""
    url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
    headers = {
        "Content-Type": "application/json",
        "api-key": os.getenv("AZURE_OPENAI_API_KEY"),
    }
    body = {
        "messages": [
            {
                "role": "user",
                "content": f"請將以下文章摘要成 3–5 句，並翻譯為繁體中文：\n\n{text}"
            }
        ],
        "temperature": 0.7
    }

    resp = httpx.post(url, headers=headers, json=body)
    resp.raise_for_status()
    return resp.json()["choices"][0]["message"]["content"].strip()
```

## 5. API Specifications

### 5.1 Slack Event Handler
```python
@app.event("message")
def handle_message(event, say):
    """Handle incoming Slack messages"""
    # Extract URLs from message
    urls = extract_urls(event.get('text', ''))
    
    if not urls:
        return
    
    # Process each URL
    for url in urls:
        process_url_async(url, event['channel'], say)
```

### 5.2 URL Processing Pipeline
```python
async def process_url_async(url: str, channel: str, say):
    """Asynchronous URL processing pipeline"""
    try:
        # Step 1: Extract content
        content = extract_content(url)
        
        # Step 2: Summarize and translate
        summary = summarize_and_translate(content['text'])
        
        # Step 3: Format and send response
        response = format_response(url, content['title'], summary)
        say(channel=channel, text=response)
        
    except Exception as e:
        error_message = f"❌ 處理網址時發生錯誤: {url}"
        say(channel=channel, text=error_message)
```

## 6. Error Handling & Edge Cases

### 6.1 URL Validation
- Invalid URLs (malformed, unreachable)
- Protected content (login required, paywalls)
- Unsupported content types (PDFs, images, videos)
- Rate limiting from target websites

### 6.2 Content Processing
- Empty or insufficient content
- Non-text content (images, videos)
- Multiple languages in source content
- Extremely long articles (token limits)

### 6.3 API Failures
- Azure OpenAI API rate limits
- Network timeouts
- Slack API failures
- Service unavailability

## 7. Response Format

### 7.1 Successful Response Template
```
🔗 **原始網址**: {url}
📰 **標題**: {title}
📝 **中文摘要**:
{summary}

---
⏰ 處理時間: {timestamp}
```

### 7.2 Error Response Template
```
❌ **處理失敗**: {url}
🔍 **錯誤原因**: {error_message}
💡 **建議**: 請檢查網址是否正確或稍後再試
```

## 8. Performance Requirements

### 8.1 Response Time
- URL processing: < 30 seconds
- Simple pages: < 10 seconds
- Complex pages: < 20 seconds

### 8.2 Throughput
- Support 100 concurrent requests
- Handle 1000 URLs per hour
- Rate limiting: 5 requests per user per minute

## 9. Security Considerations

### 9.1 Input Validation
- URL sanitization and validation
- Content length limits
- Malicious URL detection

### 9.2 API Security
- Secure storage of API keys
- Rate limiting implementation
- Request logging and monitoring

## 10. Deployment Strategy

### 10.1 Environment Setup
- Development: Local Docker containers
- Staging: Cloud-based testing environment
- Production: Container orchestration (Kubernetes/ECS)

### 10.2 Configuration Management
```python
# Environment variables
SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
```

## 11. Testing Strategy

### 11.1 Unit Tests
- URL extraction logic
- Content parsing functions
- AI API integration
- Response formatting

### 11.2 Integration Tests
- End-to-end Slack workflow
- External API interactions
- Error handling scenarios

## 12. Monitoring & Logging

### 12.1 Metrics to Track
- Response times
- Success/failure rates
- API usage costs
- User engagement

### 12.2 Logging Requirements
- All URL processing attempts
- API call results
- Error occurrences
- Performance metrics

## 13. Future Enhancements

### 13.1 Phase 2 Features
- Multiple language support
- Custom summary length options
- Content caching for repeated URLs
- User preference settings

### 13.2 Advanced Features
- Batch URL processing
- Scheduled summary delivery
- Content categorization
- Analytics dashboard

## 14. Acceptance Criteria

### 14.1 MVP Success Criteria
- [x] Bot responds to URLs in Slack messages
- [x] Successfully extracts content from common websites
- [x] Generates coherent Chinese summaries
- [x] Posts formatted responses to correct channels
- [x] Handles basic error scenarios gracefully

### 14.2 Quality Gates
- 95% uptime requirement
- <20 second average response time
- <5% error rate for valid URLs
- Positive user feedback (>4.0/5.0)

## 15. Timeline & Milestones

| Phase | Duration | Deliverables |
|-------|----------|-------------|
| Setup & Planning | 1 week | Project setup, Slack app creation |
| Core Development | 2 weeks | Basic URL processing pipeline |
| AI Integration | 1 week | Summarization and translation |
| Testing & Debugging | 1 week | Unit tests, integration tests |
| Deployment | 1 week | Production deployment, monitoring |
| **Total** | **6 weeks** | **Production-ready MVP** |