Spaces:

fongci
/

slack_url_bot

Sleeping

feat: Set up initial project structure for slack_url_bot including core application, dependencies, documentation, and build automation.

6468574 19 days ago

preview code

raw

history blame contribute delete

9.81 kB

	# Slack URL Summarizer Bot - Technical Specification

	## 1. Project Overview

	### 1.1 Purpose
	Develop a Slack bot that automatically detects URLs in messages, extracts content from those URLs, generates summaries, translates them to Traditional Chinese, and posts the results back to the channel.

	### 1.2 Key Features
	- Automatic URL detection in Slack messages
	- Web content extraction and parsing
	- Content summarization using AI
	- Translation to Traditional Chinese
	- Automated response posting to Slack channels

	## 2. Functional Requirements

	### 2.1 Core Functionality
	\| Feature \| Description \| Priority \|
	\|---------\|-------------\|----------\|
	\| URL Detection \| Detect and extract URLs from Slack messages \| High \|
	\| Content Extraction \| Extract main content from web pages \| High \|
	\| Content Summarization \| Generate concise summaries of extracted content \| High \|
	\| Translation \| Translate summaries to Traditional Chinese \| High \|
	\| Slack Integration \| Post results back to originating channel \| High \|

	### 2.2 User Stories
	- As a Slack user, I want to paste a URL and automatically receive a Chinese summary, so I can quickly understand the content without reading the full article
	- As a team member, I want summaries posted in the same channel, so everyone can benefit from the content
	- As a user, I want the bot to handle multiple URLs in one message, so I can share multiple resources efficiently

	## 3. Technical Architecture

	### 3.1 System Architecture
	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ Slack App │ │ Web Server │ │ AI Services │
	│ │ │ (FastAPI) │ │ │
	│ • Events API │◄──►│ • URL Extract │◄──►│ • OpenAI API │
	│ • Bot Token │ │ • Content Parse │ │ • Summarization │
	│ • Webhooks │ │ • Response Send │ │ • Translation │
	└─────────────────┘ └─────────────────┘ └─────────────────┘
	```

	### 3.2 Technology Stack
	- Backend Framework: Python + FastAPI
	- Slack Integration: Slack Bolt SDK for Python
	- Content Extraction: newspaper3k / readability-lxml
	- AI Services: Azure OpenAI API
	- HTTP Client: httpx
	- Deployment: Docker + Cloud hosting (AWS/GCP/Azure)
	- Database: Redis (for caching) - Optional

	## 4. Implementation Details

	### 4.1 Slack Bot Setup
	#### Required Scopes
	- `chat:write` - Post messages to channels
	- `channels:read` - Read channel information
	- `app_mentions:read` - Read mentions
	- `channels:history` - Read channel messages

	#### Event Subscriptions
	- `message.channels` - Listen to channel messages
	- `app_mention` - Listen to bot mentions

	### 4.2 Core Components

	#### 4.2.1 URL Detection Module
	```python
	import re

	def extract_urls(text: str) -> List[str]:
	"""Extract all URLs from message text"""
	pattern = r'https?://[^\s<>"{\[\]\|\\^`]+'
	return re.findall(pattern, text)
	```

	#### 4.2.2 Content Extraction Module
	```python
	from newspaper import Article

	def extract_content(url: str) -> dict:
	"""Extract main content from URL"""
	try:
	article = Article(url)
	article.download()
	article.parse()
	return {
	'title': article.title,
	'text': article.text,
	'authors': article.authors,
	'publish_date': article.publish_date
	}
	except Exception as e:
	return {'error': str(e)}
	```

	#### 4.2.3 AI Processing Module
	```python
	import httpx
	import os
	from dotenv import load_dotenv

	load_dotenv()

	def summarize_and_translate(text: str) -> str:
	"""Summarize content and translate to Traditional Chinese using Azure OpenAI"""
	url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}/openai/deployments/{os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}/chat/completions?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
	headers = {
	"Content-Type": "application/json",
	"api-key": os.getenv("AZURE_OPENAI_API_KEY"),
	}
	body = {
	"messages": [
	{
	"role": "user",
	"content": f"請將以下文章摘要成 3–5 句，並翻譯為繁體中文：\n\n{text}"
	}
	],
	"temperature": 0.7
	}

	resp = httpx.post(url, headers=headers, json=body)
	resp.raise_for_status()
	return resp.json()["choices"][0]["message"]["content"].strip()
	```

	## 5. API Specifications

	### 5.1 Slack Event Handler
	```python
	@app.event("message")
	def handle_message(event, say):
	"""Handle incoming Slack messages"""
	# Extract URLs from message
	urls = extract_urls(event.get('text', ''))

	if not urls:
	return

	# Process each URL
	for url in urls:
	process_url_async(url, event['channel'], say)
	```

	### 5.2 URL Processing Pipeline
	```python
	async def process_url_async(url: str, channel: str, say):
	"""Asynchronous URL processing pipeline"""
	try:
	# Step 1: Extract content
	content = extract_content(url)

	# Step 2: Summarize and translate
	summary = summarize_and_translate(content['text'])

	# Step 3: Format and send response
	response = format_response(url, content['title'], summary)
	say(channel=channel, text=response)

	except Exception as e:
	error_message = f"❌ 處理網址時發生錯誤: {url}"
	say(channel=channel, text=error_message)
	```

	## 6. Error Handling & Edge Cases

	### 6.1 URL Validation
	- Invalid URLs (malformed, unreachable)
	- Protected content (login required, paywalls)
	- Unsupported content types (PDFs, images, videos)
	- Rate limiting from target websites

	### 6.2 Content Processing
	- Empty or insufficient content
	- Non-text content (images, videos)
	- Multiple languages in source content
	- Extremely long articles (token limits)

	### 6.3 API Failures
	- Azure OpenAI API rate limits
	- Network timeouts
	- Slack API failures
	- Service unavailability

	## 7. Response Format

	### 7.1 Successful Response Template
	```
	🔗 原始網址: {url}
	📰 標題: {title}
	📝 中文摘要:
	{summary}

	---
	⏰ 處理時間: {timestamp}
	```

	### 7.2 Error Response Template
	```
	❌ 處理失敗: {url}
	🔍 錯誤原因: {error_message}
	💡 建議: 請檢查網址是否正確或稍後再試
	```

	## 8. Performance Requirements

	### 8.1 Response Time
	- URL processing: < 30 seconds
	- Simple pages: < 10 seconds
	- Complex pages: < 20 seconds

	### 8.2 Throughput
	- Support 100 concurrent requests
	- Handle 1000 URLs per hour
	- Rate limiting: 5 requests per user per minute

	## 9. Security Considerations

	### 9.1 Input Validation
	- URL sanitization and validation
	- Content length limits
	- Malicious URL detection

	### 9.2 API Security
	- Secure storage of API keys
	- Rate limiting implementation
	- Request logging and monitoring

	## 10. Deployment Strategy

	### 10.1 Environment Setup
	- Development: Local Docker containers
	- Staging: Cloud-based testing environment
	- Production: Container orchestration (Kubernetes/ECS)

	### 10.2 Configuration Management
	```python
	# Environment variables
	SLACK_BOT_TOKEN = os.getenv('SLACK_BOT_TOKEN')
	SLACK_SIGNING_SECRET = os.getenv('SLACK_SIGNING_SECRET')
	AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
	AZURE_OPENAI_API_KEY = os.getenv('AZURE_OPENAI_API_KEY')
	AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
	AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')
	```

	## 11. Testing Strategy

	### 11.1 Unit Tests
	- URL extraction logic
	- Content parsing functions
	- AI API integration
	- Response formatting

	### 11.2 Integration Tests
	- End-to-end Slack workflow
	- External API interactions
	- Error handling scenarios

	## 12. Monitoring & Logging

	### 12.1 Metrics to Track
	- Response times
	- Success/failure rates
	- API usage costs
	- User engagement

	### 12.2 Logging Requirements
	- All URL processing attempts
	- API call results
	- Error occurrences
	- Performance metrics

	## 13. Future Enhancements

	### 13.1 Phase 2 Features
	- Multiple language support
	- Custom summary length options
	- Content caching for repeated URLs
	- User preference settings

	### 13.2 Advanced Features
	- Batch URL processing
	- Scheduled summary delivery
	- Content categorization
	- Analytics dashboard

	## 14. Acceptance Criteria

	### 14.1 MVP Success Criteria
	- [x] Bot responds to URLs in Slack messages
	- [x] Successfully extracts content from common websites
	- [x] Generates coherent Chinese summaries
	- [x] Posts formatted responses to correct channels
	- [x] Handles basic error scenarios gracefully

	### 14.2 Quality Gates
	- 95% uptime requirement
	- <20 second average response time
	- <5% error rate for valid URLs
	- Positive user feedback (>4.0/5.0)

	## 15. Timeline & Milestones

	\| Phase \| Duration \| Deliverables \|
	\|-------\|----------\|-------------\|
	\| Setup & Planning \| 1 week \| Project setup, Slack app creation \|
	\| Core Development \| 2 weeks \| Basic URL processing pipeline \|
	\| AI Integration \| 1 week \| Summarization and translation \|
	\| Testing & Debugging \| 1 week \| Unit tests, integration tests \|
	\| Deployment \| 1 week \| Production deployment, monitoring \|
	\| Total \| 6 weeks \| Production-ready MVP \|