rajkumarrawal's picture
Initial commit
2ec0d39
# Deployment Guide
Comprehensive deployment guide for the MCP Orchestration Platform across different environments and platforms.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Environment Setup](#environment-setup)
3. [Local Development](#local-development)
4. [Docker Deployment](#docker-deployment)
5. [Kubernetes Deployment](#kubernetes-deployment)
6. [Cloud Platform Deployment](#cloud-platform-deployment)
7. [Production Configuration](#production-configuration)
8. [Monitoring and Logging](#monitoring-and-logging)
9. [Security Configuration](#security-configuration)
10. [Troubleshooting](#troubleshooting)
## Prerequisites
### System Requirements
**Minimum Requirements:**
- CPU: 2 cores
- RAM: 4GB
- Storage: 20GB SSD
- Network: 100 Mbps
**Recommended Production Requirements:**
- CPU: 4+ cores
- RAM: 8GB+
- Storage: 50GB+ NVMe SSD
- Network: 1 Gbps
### Software Dependencies
**Required:**
- Python 3.8+
- pip (Python package manager)
- git (for cloning repository)
**Optional (depending on deployment):**
- Docker 20.10+
- Docker Compose 2.0+
- kubectl (for Kubernetes)
- Terraform (for infrastructure as code)
### Infrastructure Dependencies
**Database:**
- PostgreSQL 12+ (recommended)
- Redis 6.0+ (for caching)
- Optional: MongoDB (for audit logs)
**Monitoring:**
- Prometheus (metrics collection)
- Grafana (dashboard visualization)
- ELK Stack (log aggregation)
**Security:**
- HashiCorp Vault (enterprise secrets management)
- AWS Secrets Manager (cloud deployment)
- TLS certificates
## Environment Setup
### Development Environment
1. **Clone the repository**
```bash
git clone https://github.com/your-org/mcp-orchestration-platform.git
cd mcp-orchestration-platform/orchestration_platform
```
2. **Create virtual environment**
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
pip install -r requirements-dev.txt # For development
```
4. **Set up environment variables**
```bash
cp .env.example .env
# Edit .env with your configuration
```
5. **Initialize database**
```bash
python -c "from orchestration_platform.mcp_orchestrator import MCPOrchestrator; import asyncio; asyncio.run(MCPOrchestrator().initialize())"
```
### Testing the Setup
```bash
# Run tests
python -m pytest test_orchestrator.py
# Run demo application
python demo.py
```
## Local Development
### Quick Start
1. **Start required services**
```bash
# Start PostgreSQL and Redis
docker-compose up -d postgres redis
# Or use local installations
sudo service postgresql start
sudo service redis-server start
```
2. **Run the orchestrator**
```bash
python demo.py
```
3. **Start sample servers (separate terminals)**
```bash
# Terminal 1: Weather server
python sample_servers/weather_server.py
# Terminal 2: CRM server
python sample_servers/crm_server.py
```
### Development Configuration
Create `.env` file:
```bash
# Core Configuration
ORCHESTRATOR_HOST=localhost
ORCHESTRATOR_PORT=7860
LOG_LEVEL=DEBUG
# Database
DATABASE_URL=postgresql://postgres:password@localhost:5432/orchestrator_dev
CACHE_URL=redis://localhost:6379
# Security
JWT_SECRET=your-development-secret-key
ENCRYPTION_KEY=your-development-encryption-key
# Secrets (Development)
SECRETS_BACKEND=local
SECRETS_ENCRYPTION_KEY=dev-encryption-key
# Monitoring
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
```
### Hot Reloading
For development with auto-reload:
```bash
pip install watchdog
watchmedo auto-restart --patterns="*.py" --recursive -- python demo.py
```
## Docker Deployment
### Single Container Deployment
1. **Build image**
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create non-root user
RUN useradd -m -u 1000 orchestrator
USER orchestrator
# Expose port
EXPOSE 7860
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:7860/health/ready || exit 1
# Run application
CMD ["python", "demo.py"]
```
2. **Build and run**
```bash
docker build -t mcp-orchestrator:latest .
docker run -p 7860:7860 --env-file .env mcp-orchestrator:latest
```
### Docker Compose Deployment
1. **Create docker-compose.yml**
```yaml
version: '3.8'
services:
orchestrator:
build: .
ports:
- "7860:7860"
environment:
- DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/orchestrator
- CACHE_URL=redis://redis:6379
- SECRETS_BACKEND=vault
- VAULT_ADDR=http://vault:8200
depends_on:
- postgres
- redis
- vault
volumes:
- ./logs:/app/logs
- ./config:/app/config
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:7860/health/ready"]
interval: 30s
timeout: 10s
retries: 3
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_DB=orchestrator
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
restart: unless-stopped
vault:
image: hashicorp/vault:latest
cap_add:
- IPC_LOCK
environment:
- VAULT_DEV_ROOT_TOKEN_ID=dev-root-token
- VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:8200
ports:
- "8200:8200"
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
restart: unless-stopped
volumes:
postgres_data:
redis_data:
prometheus_data:
grafana_data:
networks:
default:
driver: bridge
```
2. **Create environment file**
```bash
# .env
POSTGRES_PASSWORD=secure-password-here
GRAFANA_PASSWORD=admin-password-here
VAULT_TOKEN=dev-root-token
```
3. **Deploy with Docker Compose**
```bash
docker-compose up -d
```
4. **Verify deployment**
```bash
docker-compose ps
curl http://localhost:7860/health/ready
curl http://localhost:3000 # Grafana
```
### Production Docker Configuration
1. **Use multi-stage build for optimization**
```dockerfile
# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Runtime stage
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Copy installed packages
COPY --from=builder /root/.local /root/.local
COPY --chown=1000:1000 . .
USER 1000
EXPOSE 7860
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:7860/health/ready || exit 1
CMD ["python", "demo.py"]
```
2. **Security optimizations**
```dockerfile
# Run as non-root user
USER 1000
# Remove unnecessary packages
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
# Use read-only filesystem where possible
VOLUME ["/app/logs", "/app/config"]
```
## Kubernetes Deployment
### Basic Deployment
1. **Create namespace**
```yaml
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: mcp-orchestrator
```
2. **Create ConfigMap**
```yaml
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: orchestrator-config
namespace: mcp-orchestrator
data:
ORCHESTRATOR_HOST: "0.0.0.0"
ORCHESTRATOR_PORT: "7860"
LOG_LEVEL: "INFO"
PROMETHEUS_ENABLED: "true"
METRICS_PORT: "9090"
```
3. **Create Secret**
```yaml
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: orchestrator-secrets
namespace: mcp-orchestrator
type: Opaque
data:
DATABASE_URL: cG9zdGdyZXNxbDovL3VzZXI6cGFzc3dvcmRAcG9zdGdyZXM6NTQzMi9vcmNoZXN0cmF0b3I= # base64 encoded
JWT_SECRET: eW91ci1qd3Qtc2VjcmV0LWtleQ== # base64 encoded
ENCRYPTION_KEY: eW91ci1lbmNyeXB0aW9uLWtleQ== # base64 encoded
```
4. **Create Deployment**
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: orchestrator
namespace: mcp-orchestrator
labels:
app: mcp-orchestrator
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: mcp-orchestrator
template:
metadata:
labels:
app: mcp-orchestrator
spec:
containers:
- name: orchestrator
image: mcp-orchestrator:latest
imagePullPolicy: Always
ports:
- containerPort: 7860
name: http
- containerPort: 9090
name: metrics
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: orchestrator-secrets
key: DATABASE_URL
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: orchestrator-secrets
key: JWT_SECRET
- name: ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: orchestrator-secrets
key: ENCRYPTION_KEY
envFrom:
- configMapRef:
name: orchestrator-config
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health/live
port: 7860
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 7860
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: config-volume
mountPath: /app/config
- name: logs-volume
mountPath: /app/logs
volumes:
- name: config-volume
configMap:
name: orchestrator-config
- name: logs-volume
emptyDir: {}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
```
5. **Create Service**
```yaml
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: orchestrator-service
namespace: mcp-orchestrator
labels:
app: mcp-orchestrator
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 7860
protocol: TCP
name: http
- port: 9090
targetPort: 9090
protocol: TCP
name: metrics
selector:
app: mcp-orchestrator
```
6. **Create Ingress**
```yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: orchestrator-ingress
namespace: mcp-orchestrator
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
spec:
tls:
- hosts:
- orchestrator.yourdomain.com
secretName: orchestrator-tls
rules:
- host: orchestrator.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: orchestrator-service
port:
number: 80
```
### Deploy to Kubernetes
```bash
# Apply all resources
kubectl apply -f namespace.yaml
kubectl apply -f configmap.yaml
kubectl apply -f secret.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
# Verify deployment
kubectl get pods -n mcp-orchestrator
kubectl get services -n mcp-orchestrator
kubectl get ingress -n mcp-orchestrator
```
### Helm Chart Deployment
1. **Create Helm chart structure**
```bash
helm create mcp-orchestrator
```
2. **Configure values.yaml**
```yaml
# values.yaml
replicaCount: 3
image:
repository: mcp-orchestrator
tag: latest
pullPolicy: Always
service:
type: ClusterIP
port: 80
targetPort: 7860
ingress:
enabled: true
className: nginx
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: orchestrator.yourdomain.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: orchestrator-tls
hosts:
- orchestrator.yourdomain.com
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
nodeSelector: {}
tolerations: []
affinity: {}
config:
ORCHESTRATOR_HOST: "0.0.0.0"
ORCHESTRATOR_PORT: "7860"
LOG_LEVEL: "INFO"
PROMETHEUS_ENABLED: "true"
METRICS_PORT: "9090"
```
3. **Deploy with Helm**
```bash
# Install
helm install orchestrator ./mcp-orchestrator -n mcp-orchestrator
# Upgrade
helm upgrade orchestrator ./mcp-orchestrator -n mcp-orchestrator
# Uninstall
helm uninstall orchestrator -n mcp-orchestrator
```
## Cloud Platform Deployment
### AWS Deployment
#### ECS with Fargate
1. **Create task definition**
```json
{
"family": "mcp-orchestrator",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "orchestrator",
"image": "ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest",
"portMappings": [
{
"containerPort": 7860,
"protocol": "tcp"
}
],
"environment": [
{
"name": "ORCHESTRATOR_HOST",
"value": "0.0.0.0"
},
{
"name": "ORCHESTRATOR_PORT",
"value": "7860"
}
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/database-url"
},
{
"name": "JWT_SECRET",
"valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/jwt-secret"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/mcp-orchestrator",
"awslogs-region": "REGION",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
```
2. **Deploy with CloudFormation**
```yaml
# cloudformation-template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'MCP Orchestrator Platform'
Parameters:
DatabasePassword:
Type: String
NoEcho: true
Description: 'Database password'
Resources:
# ECR Repository
ECRRepository:
Type: AWS::ECR::Repository
Properties:
RepositoryName: mcp-orchestrator
# ECS Cluster
ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: mcp-orchestrator-cluster
# Task Definition
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: mcp-orchestrator
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
Cpu: 512
Memory: 1024
ExecutionRoleArn: !Ref ECSExecutionRole
TaskRoleArn: !Ref ECSTaskRole
ContainerDefinitions:
- Name: orchestrator
Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/mcp-orchestrator:latest'
PortMappings:
- ContainerPort: 7860
Environment:
- Name: ORCHESTRATOR_HOST
Value: 0.0.0.0
- Name: ORCHESTRATOR_PORT
Value: '7860'
Secrets:
- Name: DATABASE_URL
ValueFrom: !Ref DatabaseSecret
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref CloudWatchLogsGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ecs
# Service
ECSService:
Type: AWS::ECS::Service
Properties:
ServiceName: mcp-orchestrator-service
Cluster: !Ref ECSCluster
TaskDefinition: !Ref TaskDefinition
DesiredCount: 2
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups:
- !Ref ECSSecurityGroup
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
LoadBalancers:
- ContainerName: orchestrator
ContainerPort: 7860
TargetGroupArn: !Ref TargetGroup
# Load Balancer
ApplicationLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: mcp-orchestrator-alb
Scheme: internet-facing
Type: application
SecurityGroups:
- !Ref ALBSecurityGroup
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
# Target Group
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: mcp-orchestrator-tg
Port: 7860
Protocol: HTTP
VpcId: !Ref VPC
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 30
# Listener
Listener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref ApplicationLoadBalancer
Port: 80
Protocol: HTTP
Outputs:
ServiceURL:
Value: !GetAtt ApplicationLoadBalancer.DNSName
Description: URL for the MCP Orchestrator service
```
3. **Deploy**
```bash
# Build and push image
aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com
docker build -t mcp-orchestrator .
docker tag mcp-orchestrator:latest ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest
docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest
# Deploy with CloudFormation
aws cloudformation deploy \
--template-file cloudformation-template.yaml \
--stack-name mcp-orchestrator \
--parameter-overrides DatabasePassword=your-secure-password \
--capabilities CAPABILITY_IAM
```
#### AWS EKS Deployment
```yaml
# eks-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-orchestrator
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: mcp-orchestrator
template:
metadata:
labels:
app: mcp-orchestrator
spec:
containers:
- name: orchestrator
image: ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest
ports:
- containerPort: 7860
env:
- name: ORCHESTRATOR_HOST
value: "0.0.0.0"
- name: ORCHESTRATOR_PORT
value: "7860"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: orchestrator-secrets
key: database-url
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: mcp-orchestrator-service
spec:
selector:
app: mcp-orchestrator
ports:
- port: 80
targetPort: 7860
type: LoadBalancer
```
### Azure Container Instances
1. **Create resource group**
```bash
az group create --name mcp-orchestrator-rg --location eastus
```
2. **Deploy container**
```bash
az container create \
--resource-group mcp-orchestrator-rg \
--name mcp-orchestrator \
--image mcp-orchestrator:latest \
--cpu 2 \
--memory 4 \
--ports 7860 \
--environment-variables \
ORCHESTRATOR_HOST=0.0.0.0 \
ORCHESTRATOR_PORT=7860 \
LOG_LEVEL=INFO \
--secure-environment-variables \
DATABASE_URL=postgresql://user:pass@server:5432/db \
JWT_SECRET=your-jwt-secret \
--restart-policy Always
```
3. **Create Azure Database for PostgreSQL**
```bash
az postgres server create \
--resource-group mcp-orchestrator-rg \
--name mcp-orchestrator-db \
--location eastus \
--admin-user orchestrator \
--admin-password secure-password \
--sku-name B_Gen5_1
```
### Google Cloud Run Deployment
1. **Build and push image**
```bash
gcloud builds submit --tag gcr.io/PROJECT-ID/mcp-orchestrator
```
2. **Deploy to Cloud Run**
```bash
gcloud run deploy mcp-orchestrator \
--image gcr.io/PROJECT-ID/mcp-orchestrator \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--port 7860 \
--memory 1Gi \
--cpu 2 \
--set-env-vars ORCHESTRATOR_HOST=0.0.0.0,ORCHESTRATOR_PORT=7860,LOG_LEVEL=INFO \
--set-secrets DATABASE_URL=mcp-orchestrator-db-url:latest \
--set-secrets JWT_SECRET=mcp-orchestrator-jwt-secret:latest
```
## Production Configuration
### Environment Variables
```bash
# Core Application
ORCHESTRATOR_HOST=0.0.0.0
ORCHESTRATOR_PORT=7860
LOG_LEVEL=INFO
DEBUG=false
# Database Configuration
DATABASE_URL=postgresql://user:password@host:5432/database
DATABASE_POOL_SIZE=20
DATABASE_MAX_OVERFLOW=30
DATABASE_POOL_TIMEOUT=30
# Cache Configuration
CACHE_URL=redis://redis:6379/0
CACHE_POOL_SIZE=20
CACHE_TTL=3600
# Security
JWT_SECRET=your-super-secure-jwt-secret-key
ENCRYPTION_KEY=your-32-byte-encryption-key
SECRET_KEY_ROTATION_DAYS=90
SESSION_TTL=3600
MAX_SESSIONS=10000
# Secrets Management
SECRETS_BACKEND=vault # local, vault, aws, environment
VAULT_ADDR=http://vault:8200
VAULT_TOKEN=your-vault-token
AWS_REGION=us-east-1
# Rate Limiting
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_WINDOW=3600
RATE_LIMIT_STORAGE=redis
# Monitoring
PROMETHEUS_ENABLED=true
METRICS_PORT=9090
HEALTH_CHECK_INTERVAL=30
METRICS_RETENTION_DAYS=30
# Performance
MAX_CONNECTIONS=200
CONNECTION_TIMEOUT=30
REQUEST_TIMEOUT=60
MAX_RETRIES=3
CIRCUIT_BREAKER_FAILURE_THRESHOLD=5
CIRCUIT_BREAKER_RECOVERY_TIMEOUT=60
# SSL/TLS
SSL_ENABLED=true
SSL_CERT_PATH=/app/certs/orchestrator.crt
SSL_KEY_PATH=/app/certs/orchestrator.key
SSL_VERIFY=true
# CORS
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
CORS_METHODS=GET,POST,PUT,DELETE,OPTIONS
CORS_HEADERS=Content-Type,Authorization,X-Requested-With
# Feature Flags
FEATURE_REAL_TIME_UPDATES=true
FEATURE_ADVANCED_ANALYTICS=true
FEATURE_PLUGIN_SYSTEM=true
```
### Database Configuration
#### PostgreSQL Optimization
```sql
-- postgresql.conf
shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
```
#### Redis Configuration
```bash
# redis.conf
maxmemory 512mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
```
### Nginx Reverse Proxy
```nginx
# /etc/nginx/sites-available/mcp-orchestrator
upstream orchestrator_backend {
server orchestrator1:7860 weight=3 max_fails=3 fail_timeout=30s;
server orchestrator2:7860 weight=3 max_fails=3 fail_timeout=30s;
server orchestrator3:7860 weight=3 max_fails=3 fail_timeout=30s backup;
}
server {
listen 80;
server_name orchestrator.yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name orchestrator.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/orchestrator.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/orchestrator.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
client_max_body_size 50M;
client_body_timeout 60s;
client_header_timeout 60s;
location / {
proxy_pass http://orchestrator_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
location /metrics {
proxy_pass http://orchestrator_backend:9090/metrics;
allow 127.0.0.1;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
}
location /health {
proxy_pass http://orchestrator_backend/health;
access_log off;
}
}
```
## Monitoring and Logging
### Prometheus Configuration
```yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "orchestrator_alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'mcp-orchestrator'
static_configs:
- targets: ['orchestrator:9090']
metrics_path: /metrics
scrape_interval: 10s
scrape_timeout: 5s
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
```
### Grafana Dashboards
1. **Orchestrator Overview Dashboard**
```json
{
"dashboard": {
"title": "MCP Orchestrator Overview",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(orchestrator_requests_total[5m])",
"legendFormat": "{{method}} {{status}}"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(orchestrator_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
},
{
"expr": "histogram_quantile(0.50, rate(orchestrator_request_duration_seconds_bucket[5m]))",
"legendFormat": "50th percentile"
}
]
},
{
"title": "Active Connections",
"type": "singlestat",
"targets": [
{
"expr": "orchestrator_active_connections"
}
]
}
]
}
}
```
### Structured Logging
```python
import structlog
# Configure structured logging
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.UnicodeDecoder(),
structlog.processors.JSONRenderer()
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
```
## Security Configuration
### TLS/SSL Setup
1. **Generate self-signed certificates (development)**
```bash
openssl req -x509 -newkey rsa:4096 -keyout orchestrator.key -out orchestrator.crt -days 365 -nodes
```
2. **Let's Encrypt certificates (production)**
```bash
certbot certonly --standalone -d orchestrator.yourdomain.com
```
### Security Headers
```python
# security_headers.py
from starlette.middleware.cors import CORSMiddleware
from starlette.middleware.sessions import SessionMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["https://yourdomain.com"],
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE"],
allow_headers=["Authorization", "Content-Type"],
)
# Add security headers
@app.middleware("http")
async def add_security_headers(request, call_next):
response = await call_next(request)
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-Frame-Options"] = "DENY"
response.headers["X-XSS-Protection"] = "1; mode=block"
response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
return response
```
### Authentication
```python
# auth.py
import jwt
from datetime import datetime, timedelta
def create_access_token(data: dict, expires_delta: timedelta = None):
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=15)
to_encode.update({"exp": expire})
encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
return encoded_jwt
def verify_token(token: str):
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
return payload
except jwt.PyJWTError:
return None
```
## Troubleshooting
### Common Deployment Issues
#### 1. Pod CrashLoopBackOff
```bash
# Check pod logs
kubectl logs -f pod-name -n mcp-orchestrator
# Check events
kubectl get events -n mcp-orchestrator --sort-by='.lastTimestamp'
# Debug pod
kubectl debug -it pod-name -n mcp-orchestrator --image=busybox
```
#### 2. Database Connection Issues
```bash
# Test database connectivity
kubectl exec -it pod-name -n mcp-orchestrator -- python -c "
import asyncpg
import asyncio
async def test():
try:
conn = await asyncpg.connect('postgresql://user:pass@host:5432/db')
await conn.execute('SELECT 1')
print('Database connection successful')
await conn.close()
except Exception as e:
print(f'Database connection failed: {e}')
asyncio.run(test())
"
```
#### 3. Memory Issues
```bash
# Check resource usage
kubectl top pods -n mcp-orchestrator
# Check node resources
kubectl top nodes
# Increase memory limits
kubectl patch deployment orchestrator -n mcp-orchestrator -p '{"spec":{"template":{"spec":{"containers":[{"name":"orchestrator","resources":{"limits":{"memory":"2Gi"}}}]}}}}'
```
### Performance Tuning
#### 1. Connection Pool Optimization
```python
# Tune connection pool settings
DATABASE_POOL_SIZE=20 # Increase for high load
DATABASE_MAX_OVERFLOW=30 # Allow overflow connections
DATABASE_POOL_TIMEOUT=30 # Timeout for acquiring connection
```
#### 2. Cache Optimization
```python
# Redis configuration
CACHE_TTL=3600 # Adjust based on use case
CACHE_COMPRESSION=true # Enable for large responses
```
#### 3. Horizontal Pod Autoscaling
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: orchestrator-hpa
namespace: mcp-orchestrator
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: orchestrator
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
### Health Checks
#### Application Health Check
```python
# health_check.py
from fastapi import FastAPI
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
app = FastAPI()
@app.get("/health/live")
async def liveness_check():
return {"status": "alive"}
@app.get("/health/ready")
async def readiness_check():
# Check database connectivity
# Check cache connectivity
# Check external services
return {"status": "ready"}
@app.get("/health/detailed")
async def detailed_health():
return {
"status": "healthy",
"checks": {
"database": await check_database(),
"cache": await check_cache(),
"external_services": await check_external_services()
}
}
```
This completes the comprehensive deployment guide. The platform can now be deployed across various environments with proper configuration, monitoring, and security measures in place.