slack_url_bot / PRD.md
tonebeta's picture
feat: Set up initial project structure for slack_url_bot including core application, dependencies, documentation, and build automation.
6468574
# Slack URL Summarizer Bot - Technical Specification
## 1. Project Overview
### 1.1 Purpose
Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.
### 1.2 Key Features
- Automatic URL detection in Slack messages
- Web content extraction and parsing
- Content summarization using AI
- Translation to Traditional Chinese
- Automated response posting to Slack channels
## 2. Functional Requirements
### 2.1 Core Functionality
| Feature | Description | Priority |
|---------|-------------|----------|
| URL Detection | Detect and extract URLs from Slack messages | High |
| Content Extraction | Extract main content from web pages | High |
| Content Summarization | Generate concise summaries of extracted content | High |
| Translation | Translate summaries to Traditional Chinese | High |
| Slack Integration | Post results back to originating channel | High |
### 2.2 User Stories
- **As a Slack user**, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
- **As a team member**, I want summaries posted in the same channel, so everyone can benefit from the content
- **As a user**, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently
## 3. Technical Architecture
### 3.1 System Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Slack App β”‚ β”‚ Web Server β”‚ β”‚ AI Services β”‚
β”‚ β”‚ β”‚ (FastAPI) β”‚ β”‚ β”‚
β”‚ β€’ Events API │◄──►│ β€’ URL Extract │◄──►│ β€’ OpenAI API β”‚
β”‚ β€’ Bot Token β”‚ β”‚ β€’ Content Parse β”‚ β”‚ β€’ Summarization β”‚
β”‚ β€’ Webhooks β”‚ β”‚ β€’ Response Send β”‚ β”‚ β€’ Translation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### 3.2 Technology Stack
- **Backend Framework**: Python + FastAPI
- **Slack Integration**: Slack Bolt SDK for Python
- **Content Extraction**: newspaper3k / readability-lxml
- **AI Services**: Azure OpenAI API
- **HTTP Client**: httpx
- **Deployment**: Docker + Cloud hosting (AWS/GCP/Azure)
- **Database**: Redis (for caching) - Optional
## 4. Implementation Details
### 4.1 Slack Bot Setup
#### Required Scopes
- `chat:write` - Post messages to channels
- `channels:read` - Read channel information
- `app_mentions:read` - Read mentions
- `channels:history` - Read channel messages
#### Event Subscriptions
- `message.channels` - Listen to channel messages
- `app_mention` - Listen to bot mentions
### 4.2 Core Components
#### 4.2.1 URL Detection Module
```python
import re
def extract_urls(text: str) -> List[str]:
"""Extract all URLs from message text"""
pattern = r'https?://[^\s<>"{\[\]|\\^`]+'
return re.findall(pattern, text)
```
#### 4.2.2 Content Extraction Module
```python
from newspaper import Article
def extract_content(url: str) -> dict:
"""Extract main content from URL"""
try:
article = Article(url)
article.download()
article.parse()
return {
'title': article.title,
'text': article.text,
'authors': article.authors,
'publish_date': article.publish_date
}
except Exception as e:
return {'error': str(e)}
```
#### 4.2.3 AI Processing Module
```python
import httpx
import os
from dotenv import load_dotenv
load_dotenv()
def summarize_and_translate(text: str) -> str:
"""Summarize content and translate to Traditional Chinese using Azure OpenAI"""
url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
headers = {
"Content-Type": "application/json",
"api-key": os.getenv("AZURE_OPENAI_API_KEY"),
}
body = {
"messages": [
{
"role": "user",
"content": f"θ«‹ε°‡δ»₯δΈ‹ζ–‡η« ζ‘˜θ¦ζˆ 3–5 ε₯οΌŒδΈ¦ηΏ»θ­―η‚ΊηΉι«”δΈ­ζ–‡οΌš\n\n{text}"
}
],
"temperature": 0.7
}
resp = httpx.post(url, headers=headers, json=body)
resp.raise_for_status()
return resp.json()["choices"][0]["message"]["content"].strip()
```
## 5. API Specifications
### 5.1 Slack Event Handler
```python
@app.event("message")
def handle_message(event, say):
"""Handle incoming Slack messages"""
# Extract URLs from message
urls = extract_urls(event.get('text', ''))
if not urls:
return
# Process each URL
for url in urls:
process_url_async(url, event['channel'], say)
```
### 5.2 URL Processing Pipeline
```python
async def process_url_async(url: str, channel: str, say):
"""Asynchronous URL processing pipeline"""
try:
# Step 1: Extract content
content = extract_content(url)
# Step 2: Summarize and translate
summary = summarize_and_translate(content['text'])
# Step 3: Format and send response
response = format_response(url, content['title'], summary)
say(channel=channel, text=response)
except Exception as e:
error_message = f"❌ θ™•η†ηΆ²ε€ζ™‚η™Όη”ŸιŒ―θͺ€: {url}"
say(channel=channel, text=error_message)
```
## 6. Error Handling & Edge Cases
### 6.1 URL Validation
- Invalid URLs (malformed, unreachable)
- Protected content (login required, paywalls)
- Unsupported content types (PDFs, images, videos)
- Rate limiting from target websites
### 6.2 Content Processing
- Empty or insufficient content
- Non-text content (images, videos)
- Multiple languages in source content
- Extremely long articles (token limits)
### 6.3 API Failures
- Azure OpenAI API rate limits
- Network timeouts
- Slack API failures
- Service unavailability
## 7. Response Format
### 7.1 Successful Response Template
```
πŸ”— **εŽŸε§‹ηΆ²ε€**: {url}
πŸ“° **ζ¨™ι‘Œ**: {title}
πŸ“ **δΈ­ζ–‡ζ‘˜θ¦**:
{summary}
---
⏰ 處理時間: {timestamp}
```
### 7.2 Error Response Template
```
❌ **處理倱敗**: {url}
πŸ” **錯θͺ€εŽŸε› **: {error_message}
πŸ’‘ **ε»Ίθ­°**: θ«‹ζͺ’ζŸ₯ηΆ²ε€ζ˜―ε¦ζ­£η’Ίζˆ–η¨εΎŒε†θ©¦
```
## 8. Performance Requirements
### 8.1 Response Time
- URL processing: < 30 seconds
- Simple pages: < 10 seconds
- Complex pages: < 20 seconds
### 8.2 Throughput
- Support 100 concurrent requests
- Handle 1000 URLs per hour
- Rate limiting: 5 requests per user per minute
## 9. Security Considerations
### 9.1 Input Validation
- URL sanitization and validation
- Content length limits
- Malicious URL detection
### 9.2 API Security
- Secure storage of API keys
- Rate limiting implementation
- Request logging and monitoring
## 10. Deployment Strategy
### 10.1 Environment Setup
- Development: Local Docker containers
- Staging: Cloud-based testing environment
- Production: Container orchestration (Kubernetes/ECS)
### 10.2 Configuration Management
```python
# Environment variables
SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
```
## 11. Testing Strategy
### 11.1 Unit Tests
- URL extraction logic
- Content parsing functions
- AI API integration
- Response formatting
### 11.2 Integration Tests
- End-to-end Slack workflow
- External API interactions
- Error handling scenarios
## 12. Monitoring & Logging
### 12.1 Metrics to Track
- Response times
- Success/failure rates
- API usage costs
- User engagement
### 12.2 Logging Requirements
- All URL processing attempts
- API call results
- Error occurrences
- Performance metrics
## 13. Future Enhancements
### 13.1 Phase 2 Features
- Multiple language support
- Custom summary length options
- Content caching for repeated URLs
- User preference settings
### 13.2 Advanced Features
- Batch URL processing
- Scheduled summary delivery
- Content categorization
- Analytics dashboard
## 14. Acceptance Criteria
### 14.1 MVP Success Criteria
- [x] Bot responds to URLs in Slack messages
- [x] Successfully extracts content from common websites
- [x] Generates coherent Chinese summaries
- [x] Posts formatted responses to correct channels
- [x] Handles basic error scenarios gracefully
### 14.2 Quality Gates
- 95% uptime requirement
- <20 second average response time
- <5% error rate for valid URLs
- Positive user feedback (>4.0/5.0)
## 15. Timeline & Milestones
| Phase | Duration | Deliverables |
|-------|----------|-------------|
| Setup & Planning | 1 week | Project setup, Slack app creation |
| Core Development | 2 weeks | Basic URL processing pipeline |
| AI Integration | 1 week | Summarization and translation |
| Testing & Debugging | 1 week | Unit tests, integration tests |
| Deployment | 1 week | Production deployment, monitoring |
| **Total** | **6 weeks** | **Production-ready MVP** |