Spaces:

Mars203020
/

bertopic

Running

File size: 5,586 Bytes

b7b041e

# Deployment Guide

This guide covers various deployment options for the Social Media Topic Modeling System.

## Local Development

### Quick Start
```bash
# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run streamlit_app.py
```

### Development with Docker
```bash
# Build and run with Docker Compose
docker-compose up --build

# Or build and run manually
docker build -t topic-modeling-app .
docker run -p 8501:8501 topic-modeling-app
```

## Production Deployment

### Docker Production Setup

1. **Build the production image:**
```bash
docker build -t topic-modeling-app:latest .
```

2. **Run with production settings:**
```bash
docker run -d \
  --name topic-modeling-prod \
  -p 8501:8501 \
  --memory=4g \
  --cpus=2 \
  --restart=unless-stopped \
  topic-modeling-app:latest
```

3. **Using Docker Compose for production:**
```yaml
version: '3.8'
services:
  topic-modeling-app:
    build: .
    ports:
      - "8501:8501"
    environment:
      - STREAMLIT_SERVER_PORT=8501
      - STREAMLIT_SERVER_ADDRESS=0.0.0.0
    volumes:
      - ./data:/app/data
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '2'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

### Cloud Deployment Options

#### 1. AWS ECS/Fargate
```bash
# Tag for ECR
docker tag topic-modeling-app:latest your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest

# Push to ECR
docker push your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest
```

#### 2. Google Cloud Run
```bash
# Build and deploy to Cloud Run
gcloud run deploy topic-modeling-app \
  --image gcr.io/your-project/topic-modeling-app \
  --platform managed \
  --region us-central1 \
  --memory 4Gi \
  --cpu 2
```

#### 3. Azure Container Instances
```bash
# Deploy to Azure
az container create \
  --resource-group myResourceGroup \
  --name topic-modeling-app \
  --image your-registry.azurecr.io/topic-modeling-app:latest \
  --cpu 2 \
  --memory 4 \
  --ports 8501
```

#### 4. Heroku
```bash
# Login to Heroku Container Registry
heroku container:login

# Build and push
heroku container:push web --app your-app-name

# Release
heroku container:release web --app your-app-name
```

### Kubernetes Deployment

#### Deployment YAML
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: topic-modeling-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: topic-modeling-app
  template:
    metadata:
      labels:
        app: topic-modeling-app
    spec:
      containers:
      - name: topic-modeling-app
        image: topic-modeling-app:latest
        ports:
        - containerPort: 8501
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        env:
        - name: STREAMLIT_SERVER_PORT
          value: "8501"
        - name: STREAMLIT_SERVER_ADDRESS
          value: "0.0.0.0"
---
apiVersion: v1
kind: Service
metadata:
  name: topic-modeling-service
spec:
  selector:
    app: topic-modeling-app
  ports:
  - port: 80
    targetPort: 8501
  type: LoadBalancer
```

## Performance Optimization

### Memory Management
- **Minimum RAM**: 4GB for small datasets (< 1000 documents)
- **Recommended RAM**: 8GB+ for larger datasets
- **Large datasets**: Consider processing in batches

### CPU Optimization
- **Minimum**: 2 CPU cores
- **Recommended**: 4+ CPU cores for faster processing
- **GPU**: Optional, can speed up transformer models

### Storage Considerations
- **Docker image**: ~2GB
- **Temporary files**: Varies with dataset size
- **Persistent storage**: Optional for saving results

## Monitoring and Logging

### Health Checks
The application includes built-in health checks:
```bash
# Check application health
curl http://localhost:8501/_stcore/health
```

### Logging
Streamlit logs are available through Docker:
```bash
# View logs
docker logs topic-modeling-app

# Follow logs
docker logs -f topic-modeling-app
```

### Monitoring with Prometheus
Add monitoring endpoints for production:
```python
# Add to streamlit_app.py for monitoring
import time
import psutil

# Add metrics endpoint
@st.cache_data
def get_system_metrics():
    return {
        'cpu_percent': psutil.cpu_percent(),
        'memory_percent': psutil.virtual_memory().percent,
        'timestamp': time.time()
    }
```

## Security Considerations

### Container Security
- Run as non-root user (included in Dockerfile)
- Use minimal base images
- Regularly update dependencies

### Network Security
- Use HTTPS in production
- Implement proper firewall rules
- Consider VPN for internal access

### Data Security
- Encrypt data at rest and in transit
- Implement proper access controls
- Regular security audits

## Troubleshooting

### Common Issues

1. **Out of Memory Errors**
   - Increase container memory limits
   - Process smaller datasets
   - Use batch processing

2. **Slow Performance**
   - Increase CPU allocation
   - Use SSD storage
   - Optimize dataset size

3. **Container Won't Start**
   - Check logs: `docker logs container-name`
   - Verify port availability
   - Check resource limits

4. **Model Loading Issues**
   - Ensure internet connectivity for model downloads
   - Pre-download models in Docker build
   - Check disk space

### Support
For deployment issues:
1. Check the logs first
2. Verify system requirements
3. Test with sample data
4. Check network connectivity