Spaces:
Sleeping
Sleeping
Deployment Guide
This guide covers various deployment options for the Social Media Topic Modeling System.
Local Development
Quick Start
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run streamlit_app.py
Development with Docker
# Build and run with Docker Compose
docker-compose up --build
# Or build and run manually
docker build -t topic-modeling-app .
docker run -p 8501:8501 topic-modeling-app
Production Deployment
Docker Production Setup
- Build the production image:
docker build -t topic-modeling-app:latest .
- Run with production settings:
docker run -d \
--name topic-modeling-prod \
-p 8501:8501 \
--memory=4g \
--cpus=2 \
--restart=unless-stopped \
topic-modeling-app:latest
- Using Docker Compose for production:
version: '3.8'
services:
topic-modeling-app:
build: .
ports:
- "8501:8501"
environment:
- STREAMLIT_SERVER_PORT=8501
- STREAMLIT_SERVER_ADDRESS=0.0.0.0
volumes:
- ./data:/app/data
restart: unless-stopped
deploy:
resources:
limits:
memory: 4G
cpus: '2'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
interval: 30s
timeout: 10s
retries: 3
Cloud Deployment Options
1. AWS ECS/Fargate
# Tag for ECR
docker tag topic-modeling-app:latest your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest
# Push to ECR
docker push your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest
2. Google Cloud Run
# Build and deploy to Cloud Run
gcloud run deploy topic-modeling-app \
--image gcr.io/your-project/topic-modeling-app \
--platform managed \
--region us-central1 \
--memory 4Gi \
--cpu 2
3. Azure Container Instances
# Deploy to Azure
az container create \
--resource-group myResourceGroup \
--name topic-modeling-app \
--image your-registry.azurecr.io/topic-modeling-app:latest \
--cpu 2 \
--memory 4 \
--ports 8501
4. Heroku
# Login to Heroku Container Registry
heroku container:login
# Build and push
heroku container:push web --app your-app-name
# Release
heroku container:release web --app your-app-name
Kubernetes Deployment
Deployment YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: topic-modeling-app
spec:
replicas: 3
selector:
matchLabels:
app: topic-modeling-app
template:
metadata:
labels:
app: topic-modeling-app
spec:
containers:
- name: topic-modeling-app
image: topic-modeling-app:latest
ports:
- containerPort: 8501
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
env:
- name: STREAMLIT_SERVER_PORT
value: "8501"
- name: STREAMLIT_SERVER_ADDRESS
value: "0.0.0.0"
---
apiVersion: v1
kind: Service
metadata:
name: topic-modeling-service
spec:
selector:
app: topic-modeling-app
ports:
- port: 80
targetPort: 8501
type: LoadBalancer
Performance Optimization
Memory Management
- Minimum RAM: 4GB for small datasets (< 1000 documents)
- Recommended RAM: 8GB+ for larger datasets
- Large datasets: Consider processing in batches
CPU Optimization
- Minimum: 2 CPU cores
- Recommended: 4+ CPU cores for faster processing
- GPU: Optional, can speed up transformer models
Storage Considerations
- Docker image: ~2GB
- Temporary files: Varies with dataset size
- Persistent storage: Optional for saving results
Monitoring and Logging
Health Checks
The application includes built-in health checks:
# Check application health
curl http://localhost:8501/_stcore/health
Logging
Streamlit logs are available through Docker:
# View logs
docker logs topic-modeling-app
# Follow logs
docker logs -f topic-modeling-app
Monitoring with Prometheus
Add monitoring endpoints for production:
# Add to streamlit_app.py for monitoring
import time
import psutil
# Add metrics endpoint
@st.cache_data
def get_system_metrics():
return {
'cpu_percent': psutil.cpu_percent(),
'memory_percent': psutil.virtual_memory().percent,
'timestamp': time.time()
}
Security Considerations
Container Security
- Run as non-root user (included in Dockerfile)
- Use minimal base images
- Regularly update dependencies
Network Security
- Use HTTPS in production
- Implement proper firewall rules
- Consider VPN for internal access
Data Security
- Encrypt data at rest and in transit
- Implement proper access controls
- Regular security audits
Troubleshooting
Common Issues
Out of Memory Errors
- Increase container memory limits
- Process smaller datasets
- Use batch processing
Slow Performance
- Increase CPU allocation
- Use SSD storage
- Optimize dataset size
Container Won't Start
- Check logs:
docker logs container-name - Verify port availability
- Check resource limits
- Check logs:
Model Loading Issues
- Ensure internet connectivity for model downloads
- Pre-download models in Docker build
- Check disk space
Support
For deployment issues:
- Check the logs first
- Verify system requirements
- Test with sample data
- Check network connectivity