Spaces:

Mars203020
/

bertopic

Sleeping

App Files Files Community

bertopic / Deployment Guide.md

Mars203020

Upload 17 files

b7b041e verified 19 days ago

preview code

raw

history blame contribute delete

5.59 kB

Deployment Guide

This guide covers various deployment options for the Social Media Topic Modeling System.

Local Development

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run streamlit_app.py

Development with Docker

# Build and run with Docker Compose
docker-compose up --build

# Or build and run manually
docker build -t topic-modeling-app .
docker run -p 8501:8501 topic-modeling-app

Production Deployment

Docker Production Setup

Build the production image:

docker build -t topic-modeling-app:latest .

Run with production settings:

docker run -d \
  --name topic-modeling-prod \
  -p 8501:8501 \
  --memory=4g \
  --cpus=2 \
  --restart=unless-stopped \
  topic-modeling-app:latest

Using Docker Compose for production:

version: '3.8'
services:
  topic-modeling-app:
    build: .
    ports:
      - "8501:8501"
    environment:
      - STREAMLIT_SERVER_PORT=8501
      - STREAMLIT_SERVER_ADDRESS=0.0.0.0
    volumes:
      - ./data:/app/data
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '2'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Cloud Deployment Options

1. AWS ECS/Fargate

# Tag for ECR
docker tag topic-modeling-app:latest your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest

# Push to ECR
docker push your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest

2. Google Cloud Run

# Build and deploy to Cloud Run
gcloud run deploy topic-modeling-app \
  --image gcr.io/your-project/topic-modeling-app \
  --platform managed \
  --region us-central1 \
  --memory 4Gi \
  --cpu 2

3. Azure Container Instances

# Deploy to Azure
az container create \
  --resource-group myResourceGroup \
  --name topic-modeling-app \
  --image your-registry.azurecr.io/topic-modeling-app:latest \
  --cpu 2 \
  --memory 4 \
  --ports 8501

4. Heroku

# Login to Heroku Container Registry
heroku container:login

# Build and push
heroku container:push web --app your-app-name

# Release
heroku container:release web --app your-app-name

Kubernetes Deployment

Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: topic-modeling-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: topic-modeling-app
  template:
    metadata:
      labels:
        app: topic-modeling-app
    spec:
      containers:
      - name: topic-modeling-app
        image: topic-modeling-app:latest
        ports:
        - containerPort: 8501
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        env:
        - name: STREAMLIT_SERVER_PORT
          value: "8501"
        - name: STREAMLIT_SERVER_ADDRESS
          value: "0.0.0.0"
---
apiVersion: v1
kind: Service
metadata:
  name: topic-modeling-service
spec:
  selector:
    app: topic-modeling-app
  ports:
  - port: 80
    targetPort: 8501
  type: LoadBalancer

Performance Optimization

Memory Management

Minimum RAM: 4GB for small datasets (< 1000 documents)
Recommended RAM: 8GB+ for larger datasets
Large datasets: Consider processing in batches

CPU Optimization

Minimum: 2 CPU cores
Recommended: 4+ CPU cores for faster processing
GPU: Optional, can speed up transformer models

Storage Considerations

Docker image: ~2GB
Temporary files: Varies with dataset size
Persistent storage: Optional for saving results

Monitoring and Logging

Health Checks

The application includes built-in health checks:

# Check application health
curl http://localhost:8501/_stcore/health

Logging

Streamlit logs are available through Docker:

# View logs
docker logs topic-modeling-app

# Follow logs
docker logs -f topic-modeling-app

Monitoring with Prometheus

Add monitoring endpoints for production:

# Add to streamlit_app.py for monitoring
import time
import psutil

# Add metrics endpoint
@st.cache_data
def get_system_metrics():
    return {
        'cpu_percent': psutil.cpu_percent(),
        'memory_percent': psutil.virtual_memory().percent,
        'timestamp': time.time()
    }

Security Considerations

Container Security

Run as non-root user (included in Dockerfile)
Use minimal base images
Regularly update dependencies

Network Security

Use HTTPS in production
Implement proper firewall rules
Consider VPN for internal access

Data Security

Encrypt data at rest and in transit
Implement proper access controls
Regular security audits

Troubleshooting

Common Issues

Out of Memory Errors
- Increase container memory limits
- Process smaller datasets
- Use batch processing
Slow Performance
- Increase CPU allocation
- Use SSD storage
- Optimize dataset size
Container Won't Start
- Check logs: docker logs container-name
- Verify port availability
- Check resource limits
Model Loading Issues
- Ensure internet connectivity for model downloads
- Pre-download models in Docker build
- Check disk space

Support

For deployment issues:

Check the logs first
Verify system requirements
Test with sample data
Check network connectivity