Spaces:
Running
Running
Slack URL Summarizer Bot - Technical Specification
1. Project Overview
1.1 Purpose
Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.
1.2 Key Features
- Automatic URL detection in Slack messages
- Web content extraction and parsing
- Content summarization using AI
- Translation to Traditional Chinese
- Automated response posting to Slack channels
2. Functional Requirements
2.1 Core Functionality
| Feature | Description | Priority |
|---|---|---|
| URL Detection | Detect and extract URLs from Slack messages | High |
| Content Extraction | Extract main content from web pages | High |
| Content Summarization | Generate concise summaries of extracted content | High |
| Translation | Translate summaries to Traditional Chinese | High |
| Slack Integration | Post results back to originating channel | High |
2.2 User Stories
- As a Slack user, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
- As a team member, I want summaries posted in the same channel, so everyone can benefit from the content
- As a user, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently
3. Technical Architecture
3.1 System Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Slack App β β Web Server β β AI Services β
β β β (FastAPI) β β β
β β’ Events API βββββΊβ β’ URL Extract βββββΊβ β’ OpenAI API β
β β’ Bot Token β β β’ Content Parse β β β’ Summarization β
β β’ Webhooks β β β’ Response Send β β β’ Translation β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
3.2 Technology Stack
- Backend Framework: Python + FastAPI
- Slack Integration: Slack Bolt SDK for Python
- Content Extraction: newspaper3k / readability-lxml
- AI Services: Azure OpenAI API
- HTTP Client: httpx
- Deployment: Docker + Cloud hosting (AWS/GCP/Azure)
- Database: Redis (for caching) - Optional
4. Implementation Details
4.1 Slack Bot Setup
Required Scopes
chat:write- Post messages to channelschannels:read- Read channel informationapp_mentions:read- Read mentionschannels:history- Read channel messages
Event Subscriptions
message.channels- Listen to channel messagesapp_mention- Listen to bot mentions
4.2 Core Components
4.2.1 URL Detection Module
import re
def extract_urls(text: str) -> List[str]:
"""Extract all URLs from message text"""
pattern = r'https?://[^\s<>"{\[\]|\\^`]+'
return re.findall(pattern, text)
4.2.2 Content Extraction Module
from newspaper import Article
def extract_content(url: str) -> dict:
"""Extract main content from URL"""
try:
article = Article(url)
article.download()
article.parse()
return {
'title': article.title,
'text': article.text,
'authors': article.authors,
'publish_date': article.publish_date
}
except Exception as e:
return {'error': str(e)}
4.2.3 AI Processing Module
import httpx
import os
from dotenv import load_dotenv
load_dotenv()
def summarize_and_translate(text: str) -> str:
"""Summarize content and translate to Traditional Chinese using Azure OpenAI"""
url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
headers = {
"Content-Type": "application/json",
"api-key": os.getenv("AZURE_OPENAI_API_KEY"),
}
body = {
"messages": [
{
"role": "user",
"content": f"θ«ε°δ»₯δΈζη« ζθ¦ζ 3β5 ε₯οΌδΈ¦ηΏ»θ―ηΊηΉι«δΈζοΌ\n\n{text}"
}
],
"temperature": 0.7
}
resp = httpx.post(url, headers=headers, json=body)
resp.raise_for_status()
return resp.json()["choices"][0]["message"]["content"].strip()
5. API Specifications
5.1 Slack Event Handler
@app.event("message")
def handle_message(event, say):
"""Handle incoming Slack messages"""
# Extract URLs from message
urls = extract_urls(event.get('text', ''))
if not urls:
return
# Process each URL
for url in urls:
process_url_async(url, event['channel'], say)
5.2 URL Processing Pipeline
async def process_url_async(url: str, channel: str, say):
"""Asynchronous URL processing pipeline"""
try:
# Step 1: Extract content
content = extract_content(url)
# Step 2: Summarize and translate
summary = summarize_and_translate(content['text'])
# Step 3: Format and send response
response = format_response(url, content['title'], summary)
say(channel=channel, text=response)
except Exception as e:
error_message = f"β θηηΆ²εζηΌηι―θͺ€: {url}"
say(channel=channel, text=error_message)
6. Error Handling & Edge Cases
6.1 URL Validation
- Invalid URLs (malformed, unreachable)
- Protected content (login required, paywalls)
- Unsupported content types (PDFs, images, videos)
- Rate limiting from target websites
6.2 Content Processing
- Empty or insufficient content
- Non-text content (images, videos)
- Multiple languages in source content
- Extremely long articles (token limits)
6.3 API Failures
- Azure OpenAI API rate limits
- Network timeouts
- Slack API failures
- Service unavailability
7. Response Format
7.1 Successful Response Template
π **εε§ηΆ²ε**: {url}
π° **ζ¨ι‘**: {title}
π **δΈζζθ¦**:
{summary}
---
β° θηζι: {timestamp}
7.2 Error Response Template
β **θηε€±ζ**: {url}
π **ι―θͺ€εε **: {error_message}
π‘ **ε»Ίθ°**: θ«ζͺ’ζ₯ηΆ²εζ―ε¦ζ£η’Ίζη¨εΎε試
8. Performance Requirements
8.1 Response Time
- URL processing: < 30 seconds
- Simple pages: < 10 seconds
- Complex pages: < 20 seconds
8.2 Throughput
- Support 100 concurrent requests
- Handle 1000 URLs per hour
- Rate limiting: 5 requests per user per minute
9. Security Considerations
9.1 Input Validation
- URL sanitization and validation
- Content length limits
- Malicious URL detection
9.2 API Security
- Secure storage of API keys
- Rate limiting implementation
- Request logging and monitoring
10. Deployment Strategy
10.1 Environment Setup
- Development: Local Docker containers
- Staging: Cloud-based testing environment
- Production: Container orchestration (Kubernetes/ECS)
10.2 Configuration Management
# Environment variables
SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
11. Testing Strategy
11.1 Unit Tests
- URL extraction logic
- Content parsing functions
- AI API integration
- Response formatting
11.2 Integration Tests
- End-to-end Slack workflow
- External API interactions
- Error handling scenarios
12. Monitoring & Logging
12.1 Metrics to Track
- Response times
- Success/failure rates
- API usage costs
- User engagement
12.2 Logging Requirements
- All URL processing attempts
- API call results
- Error occurrences
- Performance metrics
13. Future Enhancements
13.1 Phase 2 Features
- Multiple language support
- Custom summary length options
- Content caching for repeated URLs
- User preference settings
13.2 Advanced Features
- Batch URL processing
- Scheduled summary delivery
- Content categorization
- Analytics dashboard
14. Acceptance Criteria
14.1 MVP Success Criteria
- Bot responds to URLs in Slack messages
- Successfully extracts content from common websites
- Generates coherent Chinese summaries
- Posts formatted responses to correct channels
- Handles basic error scenarios gracefully
14.2 Quality Gates
- 95% uptime requirement
- <20 second average response time
- <5% error rate for valid URLs
- Positive user feedback (>4.0/5.0)
15. Timeline & Milestones
| Phase | Duration | Deliverables |
|---|---|---|
| Setup & Planning | 1 week | Project setup, Slack app creation |
| Core Development | 2 weeks | Basic URL processing pipeline |
| AI Integration | 1 week | Summarization and translation |
| Testing & Debugging | 1 week | Unit tests, integration tests |
| Deployment | 1 week | Production deployment, monitoring |
| Total | 6 weeks | Production-ready MVP |