Spaces:

Mars203020
/

bertopic

Sleeping

App Files Files Community

bertopic / Deployment Guide.md

Mars203020

Upload 17 files

b7b041e verified 20 days ago

preview code

raw

history blame contribute delete

5.59 kB

	# Deployment Guide

	This guide covers various deployment options for the Social Media Topic Modeling System.

	## Local Development

	### Quick Start
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run the application
	streamlit run streamlit_app.py
	```

	### Development with Docker
	```bash
	# Build and run with Docker Compose
	docker-compose up --build

	# Or build and run manually
	docker build -t topic-modeling-app .
	docker run -p 8501:8501 topic-modeling-app
	```

	## Production Deployment

	### Docker Production Setup

	1. Build the production image:
	```bash
	docker build -t topic-modeling-app:latest .
	```

	2. Run with production settings:
	```bash
	docker run -d \
	--name topic-modeling-prod \
	-p 8501:8501 \
	--memory=4g \
	--cpus=2 \
	--restart=unless-stopped \
	topic-modeling-app:latest
	```

	3. Using Docker Compose for production:
	```yaml
	version: '3.8'
	services:
	topic-modeling-app:
	build: .
	ports:
	- "8501:8501"
	environment:
	- STREAMLIT_SERVER_PORT=8501
	- STREAMLIT_SERVER_ADDRESS=0.0.0.0
	volumes:
	- ./data:/app/data
	restart: unless-stopped
	deploy:
	resources:
	limits:
	memory: 4G
	cpus: '2'
	healthcheck:
	test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
	interval: 30s
	timeout: 10s
	retries: 3
	```

	### Cloud Deployment Options

	#### 1. AWS ECS/Fargate
	```bash
	# Tag for ECR
	docker tag topic-modeling-app:latest your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest

	# Push to ECR
	docker push your-account.dkr.ecr.region.amazonaws.com/topic-modeling-app:latest
	```

	#### 2. Google Cloud Run
	```bash
	# Build and deploy to Cloud Run
	gcloud run deploy topic-modeling-app \
	--image gcr.io/your-project/topic-modeling-app \
	--platform managed \
	--region us-central1 \
	--memory 4Gi \
	--cpu 2
	```

	#### 3. Azure Container Instances
	```bash
	# Deploy to Azure
	az container create \
	--resource-group myResourceGroup \
	--name topic-modeling-app \
	--image your-registry.azurecr.io/topic-modeling-app:latest \
	--cpu 2 \
	--memory 4 \
	--ports 8501
	```

	#### 4. Heroku
	```bash
	# Login to Heroku Container Registry
	heroku container:login

	# Build and push
	heroku container:push web --app your-app-name

	# Release
	heroku container:release web --app your-app-name
	```

	### Kubernetes Deployment

	#### Deployment YAML
	```yaml
	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: topic-modeling-app
	spec:
	replicas: 3
	selector:
	matchLabels:
	app: topic-modeling-app
	template:
	metadata:
	labels:
	app: topic-modeling-app
	spec:
	containers:
	- name: topic-modeling-app
	image: topic-modeling-app:latest
	ports:
	- containerPort: 8501
	resources:
	requests:
	memory: "2Gi"
	cpu: "1"
	limits:
	memory: "4Gi"
	cpu: "2"
	env:
	- name: STREAMLIT_SERVER_PORT
	value: "8501"
	- name: STREAMLIT_SERVER_ADDRESS
	value: "0.0.0.0"
	---
	apiVersion: v1
	kind: Service
	metadata:
	name: topic-modeling-service
	spec:
	selector:
	app: topic-modeling-app
	ports:
	- port: 80
	targetPort: 8501
	type: LoadBalancer
	```

	## Performance Optimization

	### Memory Management
	- Minimum RAM: 4GB for small datasets (< 1000 documents)
	- Recommended RAM: 8GB+ for larger datasets
	- Large datasets: Consider processing in batches

	### CPU Optimization
	- Minimum: 2 CPU cores
	- Recommended: 4+ CPU cores for faster processing
	- GPU: Optional, can speed up transformer models

	### Storage Considerations
	- Docker image: ~2GB
	- Temporary files: Varies with dataset size
	- Persistent storage: Optional for saving results

	## Monitoring and Logging

	### Health Checks
	The application includes built-in health checks:
	```bash
	# Check application health
	curl http://localhost:8501/_stcore/health
	```

	### Logging
	Streamlit logs are available through Docker:
	```bash
	# View logs
	docker logs topic-modeling-app

	# Follow logs
	docker logs -f topic-modeling-app
	```

	### Monitoring with Prometheus
	Add monitoring endpoints for production:
	```python
	# Add to streamlit_app.py for monitoring
	import time
	import psutil

	# Add metrics endpoint
	@st.cache_data
	def get_system_metrics():
	return {
	'cpu_percent': psutil.cpu_percent(),
	'memory_percent': psutil.virtual_memory().percent,
	'timestamp': time.time()
	}
	```

	## Security Considerations

	### Container Security
	- Run as non-root user (included in Dockerfile)
	- Use minimal base images
	- Regularly update dependencies

	### Network Security
	- Use HTTPS in production
	- Implement proper firewall rules
	- Consider VPN for internal access

	### Data Security
	- Encrypt data at rest and in transit
	- Implement proper access controls
	- Regular security audits

	## Troubleshooting

	### Common Issues

	1. Out of Memory Errors
	- Increase container memory limits
	- Process smaller datasets
	- Use batch processing

	2. Slow Performance
	- Increase CPU allocation
	- Use SSD storage
	- Optimize dataset size

	3. Container Won't Start
	- Check logs: `docker logs container-name`
	- Verify port availability
	- Check resource limits

	4. Model Loading Issues
	- Ensure internet connectivity for model downloads
	- Pre-download models in Docker build
	- Check disk space

	### Support
	For deployment issues:
	1. Check the logs first
	2. Verify system requirements
	3. Test with sample data
	4. Check network connectivity