# Deployment Guide This guide covers various deployment options for the Social Media Topic Modeling System. ## Local Development ### Quick Start ```bash # Install dependencies pip install -r requirements.txt # Run the application streamlit run streamlit_app.py ``` ### Development with Docker ```bash # Build and run with Docker Compose docker-compose up --build # Or build and run manually docker build -t topic-modeling-app . docker run -p 8501:8501 topic-modeling-app ``` ## Production Deployment ### Docker Production Setup 1. **Build the production image:** ```bash docker build -t topic-modeling-app:latest . ``` 2. **Run with production settings:** ```bash docker run -d \ --name topic-modeling-prod \ -p 8501:8501 \ --memory=4g \ --cpus=2 \ --restart=unless-stopped \ topic-modeling-app:latest ``` 3. **Using Docker Compose for production:** ```yaml version: '3.8' services: topic-modeling-app: build: . ports: - "8501:8501" environment: - STREAMLIT_SERVER_PORT=8501 - STREAMLIT_SERVER_ADDRESS=0.0.0.0 volumes: - ./data:/app/data restart: unless-stopped deploy: resources: limits: memory: 4G cpus: '2' healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"] interval: 30s timeout: 10s retries: 3 ``` ### Cloud Deployment Options #### 1. AWS ECS/Fargate ```bash # Tag for ECR docker tag topic-modeling-app:latest your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest # Push to ECR docker push your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest ``` #### 2. Google Cloud Run ```bash # Build and deploy to Cloud Run gcloud run deploy topic-modeling-app \ --image gcr.io/your-project/topic-modeling-app \ --platform managed \ --region us-central1 \ --memory 4Gi \ --cpu 2 ``` #### 3. Azure Container Instances ```bash # Deploy to Azure az container create \ --resource-group myResourceGroup \ --name topic-modeling-app \ --image your-registry.azurecr.io/topic-modeling-app:latest \ --cpu 2 \ --memory 4 \ --ports 8501 ``` #### 4. Heroku ```bash # Login to Heroku Container Registry heroku container:login # Build and push heroku container:push web --app your-app-name # Release heroku container:release web --app your-app-name ``` ### Kubernetes Deployment #### Deployment YAML ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: topic-modeling-app spec: replicas: 3 selector: matchLabels: app: topic-modeling-app template: metadata: labels: app: topic-modeling-app spec: containers: - name: topic-modeling-app image: topic-modeling-app:latest ports: - containerPort: 8501 resources: requests: memory: "2Gi" cpu: "1" limits: memory: "4Gi" cpu: "2" env: - name: STREAMLIT_SERVER_PORT value: "8501" - name: STREAMLIT_SERVER_ADDRESS value: "0.0.0.0" --- apiVersion: v1 kind: Service metadata: name: topic-modeling-service spec: selector: app: topic-modeling-app ports: - port: 80 targetPort: 8501 type: LoadBalancer ``` ## Performance Optimization ### Memory Management - **Minimum RAM**: 4GB for small datasets (< 1000 documents) - **Recommended RAM**: 8GB+ for larger datasets - **Large datasets**: Consider processing in batches ### CPU Optimization - **Minimum**: 2 CPU cores - **Recommended**: 4+ CPU cores for faster processing - **GPU**: Optional, can speed up transformer models ### Storage Considerations - **Docker image**: ~2GB - **Temporary files**: Varies with dataset size - **Persistent storage**: Optional for saving results ## Monitoring and Logging ### Health Checks The application includes built-in health checks: ```bash # Check application health curl http://localhost:8501/_stcore/health ``` ### Logging Streamlit logs are available through Docker: ```bash # View logs docker logs topic-modeling-app # Follow logs docker logs -f topic-modeling-app ``` ### Monitoring with Prometheus Add monitoring endpoints for production: ```python # Add to streamlit_app.py for monitoring import time import psutil # Add metrics endpoint @st.cache_data def get_system_metrics(): return { 'cpu_percent': psutil.cpu_percent(), 'memory_percent': psutil.virtual_memory().percent, 'timestamp': time.time() } ``` ## Security Considerations ### Container Security - Run as non-root user (included in Dockerfile) - Use minimal base images - Regularly update dependencies ### Network Security - Use HTTPS in production - Implement proper firewall rules - Consider VPN for internal access ### Data Security - Encrypt data at rest and in transit - Implement proper access controls - Regular security audits ## Troubleshooting ### Common Issues 1. **Out of Memory Errors** - Increase container memory limits - Process smaller datasets - Use batch processing 2. **Slow Performance** - Increase CPU allocation - Use SSD storage - Optimize dataset size 3. **Container Won't Start** - Check logs: `docker logs container-name` - Verify port availability - Check resource limits 4. **Model Loading Issues** - Ensure internet connectivity for model downloads - Pre-download models in Docker build - Check disk space ### Support For deployment issues: 1. Check the logs first 2. Verify system requirements 3. Test with sample data 4. Check network connectivity