# Deployment Guide Comprehensive deployment guide for the MCP Orchestration Platform across different environments and platforms. ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Environment Setup](#environment-setup) 3. [Local Development](#local-development) 4. [Docker Deployment](#docker-deployment) 5. [Kubernetes Deployment](#kubernetes-deployment) 6. [Cloud Platform Deployment](#cloud-platform-deployment) 7. [Production Configuration](#production-configuration) 8. [Monitoring and Logging](#monitoring-and-logging) 9. [Security Configuration](#security-configuration) 10. [Troubleshooting](#troubleshooting) ## Prerequisites ### System Requirements **Minimum Requirements:** - CPU: 2 cores - RAM: 4GB - Storage: 20GB SSD - Network: 100 Mbps **Recommended Production Requirements:** - CPU: 4+ cores - RAM: 8GB+ - Storage: 50GB+ NVMe SSD - Network: 1 Gbps ### Software Dependencies **Required:** - Python 3.8+ - pip (Python package manager) - git (for cloning repository) **Optional (depending on deployment):** - Docker 20.10+ - Docker Compose 2.0+ - kubectl (for Kubernetes) - Terraform (for infrastructure as code) ### Infrastructure Dependencies **Database:** - PostgreSQL 12+ (recommended) - Redis 6.0+ (for caching) - Optional: MongoDB (for audit logs) **Monitoring:** - Prometheus (metrics collection) - Grafana (dashboard visualization) - ELK Stack (log aggregation) **Security:** - HashiCorp Vault (enterprise secrets management) - AWS Secrets Manager (cloud deployment) - TLS certificates ## Environment Setup ### Development Environment 1. **Clone the repository** ```bash git clone https://github.com/your-org/mcp-orchestration-platform.git cd mcp-orchestration-platform/orchestration_platform ``` 2. **Create virtual environment** ```bash python -m venv venv source venv/bin/activate # Linux/Mac # or venv\Scripts\activate # Windows ``` 3. **Install dependencies** ```bash pip install -r requirements.txt pip install -r requirements-dev.txt # For development ``` 4. **Set up environment variables** ```bash cp .env.example .env # Edit .env with your configuration ``` 5. **Initialize database** ```bash python -c "from orchestration_platform.mcp_orchestrator import MCPOrchestrator; import asyncio; asyncio.run(MCPOrchestrator().initialize())" ``` ### Testing the Setup ```bash # Run tests python -m pytest test_orchestrator.py # Run demo application python demo.py ``` ## Local Development ### Quick Start 1. **Start required services** ```bash # Start PostgreSQL and Redis docker-compose up -d postgres redis # Or use local installations sudo service postgresql start sudo service redis-server start ``` 2. **Run the orchestrator** ```bash python demo.py ``` 3. **Start sample servers (separate terminals)** ```bash # Terminal 1: Weather server python sample_servers/weather_server.py # Terminal 2: CRM server python sample_servers/crm_server.py ``` ### Development Configuration Create `.env` file: ```bash # Core Configuration ORCHESTRATOR_HOST=localhost ORCHESTRATOR_PORT=7860 LOG_LEVEL=DEBUG # Database DATABASE_URL=postgresql://postgres:password@localhost:5432/orchestrator_dev CACHE_URL=redis://localhost:6379 # Security JWT_SECRET=your-development-secret-key ENCRYPTION_KEY=your-development-encryption-key # Secrets (Development) SECRETS_BACKEND=local SECRETS_ENCRYPTION_KEY=dev-encryption-key # Monitoring PROMETHEUS_ENABLED=true METRICS_PORT=9090 ``` ### Hot Reloading For development with auto-reload: ```bash pip install watchdog watchmedo auto-restart --patterns="*.py" --recursive -- python demo.py ``` ## Docker Deployment ### Single Container Deployment 1. **Build image** ```dockerfile FROM python:3.11-slim WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ gcc \ curl \ && rm -rf /var/lib/apt/lists/* # Copy requirements and install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY . . # Create non-root user RUN useradd -m -u 1000 orchestrator USER orchestrator # Expose port EXPOSE 7860 # Health check HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:7860/health/ready || exit 1 # Run application CMD ["python", "demo.py"] ``` 2. **Build and run** ```bash docker build -t mcp-orchestrator:latest . docker run -p 7860:7860 --env-file .env mcp-orchestrator:latest ``` ### Docker Compose Deployment 1. **Create docker-compose.yml** ```yaml version: '3.8' services: orchestrator: build: . ports: - "7860:7860" environment: - DATABASE_URL=postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/orchestrator - CACHE_URL=redis://redis:6379 - SECRETS_BACKEND=vault - VAULT_ADDR=http://vault:8200 depends_on: - postgres - redis - vault volumes: - ./logs:/app/logs - ./config:/app/config restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7860/health/ready"] interval: 30s timeout: 10s retries: 3 postgres: image: postgres:15-alpine environment: - POSTGRES_DB=orchestrator - POSTGRES_USER=postgres - POSTGRES_PASSWORD=${POSTGRES_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data - ./init.sql:/docker-entrypoint-initdb.d/init.sql restart: unless-stopped redis: image: redis:7-alpine command: redis-server --appendonly yes volumes: - redis_data:/data restart: unless-stopped vault: image: hashicorp/vault:latest cap_add: - IPC_LOCK environment: - VAULT_DEV_ROOT_TOKEN_ID=dev-root-token - VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:8200 ports: - "8200:8200" restart: unless-stopped prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--web.enable-lifecycle' restart: unless-stopped grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} volumes: - grafana_data:/var/lib/grafana - ./grafana/dashboards:/etc/grafana/provisioning/dashboards - ./grafana/datasources:/etc/grafana/provisioning/datasources restart: unless-stopped volumes: postgres_data: redis_data: prometheus_data: grafana_data: networks: default: driver: bridge ``` 2. **Create environment file** ```bash # .env POSTGRES_PASSWORD=secure-password-here GRAFANA_PASSWORD=admin-password-here VAULT_TOKEN=dev-root-token ``` 3. **Deploy with Docker Compose** ```bash docker-compose up -d ``` 4. **Verify deployment** ```bash docker-compose ps curl http://localhost:7860/health/ready curl http://localhost:3000 # Grafana ``` ### Production Docker Configuration 1. **Use multi-stage build for optimization** ```dockerfile # Build stage FROM python:3.11-slim as builder WORKDIR /app COPY requirements.txt . RUN pip install --user --no-cache-dir -r requirements.txt # Runtime stage FROM python:3.11-slim WORKDIR /app RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* # Copy installed packages COPY --from=builder /root/.local /root/.local COPY --chown=1000:1000 . . USER 1000 EXPOSE 7860 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:7860/health/ready || exit 1 CMD ["python", "demo.py"] ``` 2. **Security optimizations** ```dockerfile # Run as non-root user USER 1000 # Remove unnecessary packages RUN apt-get clean && rm -rf /var/lib/apt/lists/* # Use read-only filesystem where possible VOLUME ["/app/logs", "/app/config"] ``` ## Kubernetes Deployment ### Basic Deployment 1. **Create namespace** ```yaml # namespace.yaml apiVersion: v1 kind: Namespace metadata: name: mcp-orchestrator ``` 2. **Create ConfigMap** ```yaml # configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: orchestrator-config namespace: mcp-orchestrator data: ORCHESTRATOR_HOST: "0.0.0.0" ORCHESTRATOR_PORT: "7860" LOG_LEVEL: "INFO" PROMETHEUS_ENABLED: "true" METRICS_PORT: "9090" ``` 3. **Create Secret** ```yaml # secret.yaml apiVersion: v1 kind: Secret metadata: name: orchestrator-secrets namespace: mcp-orchestrator type: Opaque data: DATABASE_URL: cG9zdGdyZXNxbDovL3VzZXI6cGFzc3dvcmRAcG9zdGdyZXM6NTQzMi9vcmNoZXN0cmF0b3I= # base64 encoded JWT_SECRET: eW91ci1qd3Qtc2VjcmV0LWtleQ== # base64 encoded ENCRYPTION_KEY: eW91ci1lbmNyeXB0aW9uLWtleQ== # base64 encoded ``` 4. **Create Deployment** ```yaml # deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: orchestrator namespace: mcp-orchestrator labels: app: mcp-orchestrator spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: mcp-orchestrator template: metadata: labels: app: mcp-orchestrator spec: containers: - name: orchestrator image: mcp-orchestrator:latest imagePullPolicy: Always ports: - containerPort: 7860 name: http - containerPort: 9090 name: metrics env: - name: DATABASE_URL valueFrom: secretKeyRef: name: orchestrator-secrets key: DATABASE_URL - name: JWT_SECRET valueFrom: secretKeyRef: name: orchestrator-secrets key: JWT_SECRET - name: ENCRYPTION_KEY valueFrom: secretKeyRef: name: orchestrator-secrets key: ENCRYPTION_KEY envFrom: - configMapRef: name: orchestrator-config resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" livenessProbe: httpGet: path: /health/live port: 7860 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /health/ready port: 7860 initialDelaySeconds: 5 periodSeconds: 5 volumeMounts: - name: config-volume mountPath: /app/config - name: logs-volume mountPath: /app/logs volumes: - name: config-volume configMap: name: orchestrator-config - name: logs-volume emptyDir: {} securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 ``` 5. **Create Service** ```yaml # service.yaml apiVersion: v1 kind: Service metadata: name: orchestrator-service namespace: mcp-orchestrator labels: app: mcp-orchestrator spec: type: ClusterIP ports: - port: 80 targetPort: 7860 protocol: TCP name: http - port: 9090 targetPort: 9090 protocol: TCP name: metrics selector: app: mcp-orchestrator ``` 6. **Create Ingress** ```yaml # ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: orchestrator-ingress namespace: mcp-orchestrator annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "10m" spec: tls: - hosts: - orchestrator.yourdomain.com secretName: orchestrator-tls rules: - host: orchestrator.yourdomain.com http: paths: - path: / pathType: Prefix backend: service: name: orchestrator-service port: number: 80 ``` ### Deploy to Kubernetes ```bash # Apply all resources kubectl apply -f namespace.yaml kubectl apply -f configmap.yaml kubectl apply -f secret.yaml kubectl apply -f deployment.yaml kubectl apply -f service.yaml kubectl apply -f ingress.yaml # Verify deployment kubectl get pods -n mcp-orchestrator kubectl get services -n mcp-orchestrator kubectl get ingress -n mcp-orchestrator ``` ### Helm Chart Deployment 1. **Create Helm chart structure** ```bash helm create mcp-orchestrator ``` 2. **Configure values.yaml** ```yaml # values.yaml replicaCount: 3 image: repository: mcp-orchestrator tag: latest pullPolicy: Always service: type: ClusterIP port: 80 targetPort: 7860 ingress: enabled: true className: nginx annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: orchestrator.yourdomain.com paths: - path: / pathType: Prefix tls: - secretName: orchestrator-tls hosts: - orchestrator.yourdomain.com resources: limits: cpu: 500m memory: 1Gi requests: cpu: 250m memory: 512Mi autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 nodeSelector: {} tolerations: [] affinity: {} config: ORCHESTRATOR_HOST: "0.0.0.0" ORCHESTRATOR_PORT: "7860" LOG_LEVEL: "INFO" PROMETHEUS_ENABLED: "true" METRICS_PORT: "9090" ``` 3. **Deploy with Helm** ```bash # Install helm install orchestrator ./mcp-orchestrator -n mcp-orchestrator # Upgrade helm upgrade orchestrator ./mcp-orchestrator -n mcp-orchestrator # Uninstall helm uninstall orchestrator -n mcp-orchestrator ``` ## Cloud Platform Deployment ### AWS Deployment #### ECS with Fargate 1. **Create task definition** ```json { "family": "mcp-orchestrator", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "512", "memory": "1024", "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskRole", "containerDefinitions": [ { "name": "orchestrator", "image": "ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest", "portMappings": [ { "containerPort": 7860, "protocol": "tcp" } ], "environment": [ { "name": "ORCHESTRATOR_HOST", "value": "0.0.0.0" }, { "name": "ORCHESTRATOR_PORT", "value": "7860" } ], "secrets": [ { "name": "DATABASE_URL", "valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/database-url" }, { "name": "JWT_SECRET", "valueFrom": "arn:aws:ssm:REGION:ACCOUNT:parameter/orchestrator/jwt-secret" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/mcp-orchestrator", "awslogs-region": "REGION", "awslogs-stream-prefix": "ecs" } } } ] } ``` 2. **Deploy with CloudFormation** ```yaml # cloudformation-template.yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'MCP Orchestrator Platform' Parameters: DatabasePassword: Type: String NoEcho: true Description: 'Database password' Resources: # ECR Repository ECRRepository: Type: AWS::ECR::Repository Properties: RepositoryName: mcp-orchestrator # ECS Cluster ECSCluster: Type: AWS::ECS::Cluster Properties: ClusterName: mcp-orchestrator-cluster # Task Definition TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Family: mcp-orchestrator NetworkMode: awsvpc RequiresCompatibilities: - FARGATE Cpu: 512 Memory: 1024 ExecutionRoleArn: !Ref ECSExecutionRole TaskRoleArn: !Ref ECSTaskRole ContainerDefinitions: - Name: orchestrator Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/mcp-orchestrator:latest' PortMappings: - ContainerPort: 7860 Environment: - Name: ORCHESTRATOR_HOST Value: 0.0.0.0 - Name: ORCHESTRATOR_PORT Value: '7860' Secrets: - Name: DATABASE_URL ValueFrom: !Ref DatabaseSecret LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref CloudWatchLogsGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: ecs # Service ECSService: Type: AWS::ECS::Service Properties: ServiceName: mcp-orchestrator-service Cluster: !Ref ECSCluster TaskDefinition: !Ref TaskDefinition DesiredCount: 2 LaunchType: FARGATE NetworkConfiguration: AwsvpcConfiguration: AssignPublicIp: ENABLED SecurityGroups: - !Ref ECSSecurityGroup Subnets: - !Ref PublicSubnet1 - !Ref PublicSubnet2 LoadBalancers: - ContainerName: orchestrator ContainerPort: 7860 TargetGroupArn: !Ref TargetGroup # Load Balancer ApplicationLoadBalancer: Type: AWS::ElasticLoadBalancingV2::LoadBalancer Properties: Name: mcp-orchestrator-alb Scheme: internet-facing Type: application SecurityGroups: - !Ref ALBSecurityGroup Subnets: - !Ref PublicSubnet1 - !Ref PublicSubnet2 # Target Group TargetGroup: Type: AWS::ElasticLoadBalancingV2::TargetGroup Properties: Name: mcp-orchestrator-tg Port: 7860 Protocol: HTTP VpcId: !Ref VPC TargetGroupAttributes: - Key: deregistration_delay.timeout_seconds Value: 30 # Listener Listener: Type: AWS::ElasticLoadBalancingV2::Listener Properties: DefaultActions: - Type: forward TargetGroupArn: !Ref TargetGroup LoadBalancerArn: !Ref ApplicationLoadBalancer Port: 80 Protocol: HTTP Outputs: ServiceURL: Value: !GetAtt ApplicationLoadBalancer.DNSName Description: URL for the MCP Orchestrator service ``` 3. **Deploy** ```bash # Build and push image aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com docker build -t mcp-orchestrator . docker tag mcp-orchestrator:latest ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest # Deploy with CloudFormation aws cloudformation deploy \ --template-file cloudformation-template.yaml \ --stack-name mcp-orchestrator \ --parameter-overrides DatabasePassword=your-secure-password \ --capabilities CAPABILITY_IAM ``` #### AWS EKS Deployment ```yaml # eks-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: mcp-orchestrator namespace: default spec: replicas: 3 selector: matchLabels: app: mcp-orchestrator template: metadata: labels: app: mcp-orchestrator spec: containers: - name: orchestrator image: ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-orchestrator:latest ports: - containerPort: 7860 env: - name: ORCHESTRATOR_HOST value: "0.0.0.0" - name: ORCHESTRATOR_PORT value: "7860" - name: DATABASE_URL valueFrom: secretKeyRef: name: orchestrator-secrets key: database-url resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" --- apiVersion: v1 kind: Service metadata: name: mcp-orchestrator-service spec: selector: app: mcp-orchestrator ports: - port: 80 targetPort: 7860 type: LoadBalancer ``` ### Azure Container Instances 1. **Create resource group** ```bash az group create --name mcp-orchestrator-rg --location eastus ``` 2. **Deploy container** ```bash az container create \ --resource-group mcp-orchestrator-rg \ --name mcp-orchestrator \ --image mcp-orchestrator:latest \ --cpu 2 \ --memory 4 \ --ports 7860 \ --environment-variables \ ORCHESTRATOR_HOST=0.0.0.0 \ ORCHESTRATOR_PORT=7860 \ LOG_LEVEL=INFO \ --secure-environment-variables \ DATABASE_URL=postgresql://user:pass@server:5432/db \ JWT_SECRET=your-jwt-secret \ --restart-policy Always ``` 3. **Create Azure Database for PostgreSQL** ```bash az postgres server create \ --resource-group mcp-orchestrator-rg \ --name mcp-orchestrator-db \ --location eastus \ --admin-user orchestrator \ --admin-password secure-password \ --sku-name B_Gen5_1 ``` ### Google Cloud Run Deployment 1. **Build and push image** ```bash gcloud builds submit --tag gcr.io/PROJECT-ID/mcp-orchestrator ``` 2. **Deploy to Cloud Run** ```bash gcloud run deploy mcp-orchestrator \ --image gcr.io/PROJECT-ID/mcp-orchestrator \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --port 7860 \ --memory 1Gi \ --cpu 2 \ --set-env-vars ORCHESTRATOR_HOST=0.0.0.0,ORCHESTRATOR_PORT=7860,LOG_LEVEL=INFO \ --set-secrets DATABASE_URL=mcp-orchestrator-db-url:latest \ --set-secrets JWT_SECRET=mcp-orchestrator-jwt-secret:latest ``` ## Production Configuration ### Environment Variables ```bash # Core Application ORCHESTRATOR_HOST=0.0.0.0 ORCHESTRATOR_PORT=7860 LOG_LEVEL=INFO DEBUG=false # Database Configuration DATABASE_URL=postgresql://user:password@host:5432/database DATABASE_POOL_SIZE=20 DATABASE_MAX_OVERFLOW=30 DATABASE_POOL_TIMEOUT=30 # Cache Configuration CACHE_URL=redis://redis:6379/0 CACHE_POOL_SIZE=20 CACHE_TTL=3600 # Security JWT_SECRET=your-super-secure-jwt-secret-key ENCRYPTION_KEY=your-32-byte-encryption-key SECRET_KEY_ROTATION_DAYS=90 SESSION_TTL=3600 MAX_SESSIONS=10000 # Secrets Management SECRETS_BACKEND=vault # local, vault, aws, environment VAULT_ADDR=http://vault:8200 VAULT_TOKEN=your-vault-token AWS_REGION=us-east-1 # Rate Limiting RATE_LIMIT_REQUESTS=1000 RATE_LIMIT_WINDOW=3600 RATE_LIMIT_STORAGE=redis # Monitoring PROMETHEUS_ENABLED=true METRICS_PORT=9090 HEALTH_CHECK_INTERVAL=30 METRICS_RETENTION_DAYS=30 # Performance MAX_CONNECTIONS=200 CONNECTION_TIMEOUT=30 REQUEST_TIMEOUT=60 MAX_RETRIES=3 CIRCUIT_BREAKER_FAILURE_THRESHOLD=5 CIRCUIT_BREAKER_RECOVERY_TIMEOUT=60 # SSL/TLS SSL_ENABLED=true SSL_CERT_PATH=/app/certs/orchestrator.crt SSL_KEY_PATH=/app/certs/orchestrator.key SSL_VERIFY=true # CORS CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com CORS_METHODS=GET,POST,PUT,DELETE,OPTIONS CORS_HEADERS=Content-Type,Authorization,X-Requested-With # Feature Flags FEATURE_REAL_TIME_UPDATES=true FEATURE_ADVANCED_ANALYTICS=true FEATURE_PLUGIN_SYSTEM=true ``` ### Database Configuration #### PostgreSQL Optimization ```sql -- postgresql.conf shared_buffers = 256MB effective_cache_size = 1GB maintenance_work_mem = 64MB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 ``` #### Redis Configuration ```bash # redis.conf maxmemory 512mb maxmemory-policy allkeys-lru save 900 1 save 300 10 save 60 10000 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes ``` ### Nginx Reverse Proxy ```nginx # /etc/nginx/sites-available/mcp-orchestrator upstream orchestrator_backend { server orchestrator1:7860 weight=3 max_fails=3 fail_timeout=30s; server orchestrator2:7860 weight=3 max_fails=3 fail_timeout=30s; server orchestrator3:7860 weight=3 max_fails=3 fail_timeout=30s backup; } server { listen 80; server_name orchestrator.yourdomain.com; return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name orchestrator.yourdomain.com; ssl_certificate /etc/letsencrypt/live/orchestrator.yourdomain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/orchestrator.yourdomain.com/privkey.pem; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384; ssl_prefer_server_ciphers off; client_max_body_size 50M; client_body_timeout 60s; client_header_timeout 60s; location / { proxy_pass http://orchestrator_backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; proxy_read_timeout 300s; proxy_connect_timeout 75s; } location /metrics { proxy_pass http://orchestrator_backend:9090/metrics; allow 127.0.0.1; allow 10.0.0.0/8; allow 172.16.0.0/12; allow 192.168.0.0/16; deny all; } location /health { proxy_pass http://orchestrator_backend/health; access_log off; } } ``` ## Monitoring and Logging ### Prometheus Configuration ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s rule_files: - "orchestrator_alerts.yml" alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 scrape_configs: - job_name: 'mcp-orchestrator' static_configs: - targets: ['orchestrator:9090'] metrics_path: /metrics scrape_interval: 10s scrape_timeout: 5s - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) ``` ### Grafana Dashboards 1. **Orchestrator Overview Dashboard** ```json { "dashboard": { "title": "MCP Orchestrator Overview", "panels": [ { "title": "Request Rate", "type": "graph", "targets": [ { "expr": "rate(orchestrator_requests_total[5m])", "legendFormat": "{{method}} {{status}}" } ] }, { "title": "Response Time", "type": "graph", "targets": [ { "expr": "histogram_quantile(0.95, rate(orchestrator_request_duration_seconds_bucket[5m]))", "legendFormat": "95th percentile" }, { "expr": "histogram_quantile(0.50, rate(orchestrator_request_duration_seconds_bucket[5m]))", "legendFormat": "50th percentile" } ] }, { "title": "Active Connections", "type": "singlestat", "targets": [ { "expr": "orchestrator_active_connections" } ] } ] } } ``` ### Structured Logging ```python import structlog # Configure structured logging structlog.configure( processors=[ structlog.stdlib.filter_by_level, structlog.stdlib.add_logger_name, structlog.stdlib.add_log_level, structlog.stdlib.PositionalArgumentsFormatter(), structlog.processors.TimeStamper(fmt="iso"), structlog.processors.StackInfoRenderer(), structlog.processors.format_exc_info, structlog.processors.UnicodeDecoder(), structlog.processors.JSONRenderer() ], context_class=dict, logger_factory=structlog.stdlib.LoggerFactory(), wrapper_class=structlog.stdlib.BoundLogger, cache_logger_on_first_use=True, ) ``` ## Security Configuration ### TLS/SSL Setup 1. **Generate self-signed certificates (development)** ```bash openssl req -x509 -newkey rsa:4096 -keyout orchestrator.key -out orchestrator.crt -days 365 -nodes ``` 2. **Let's Encrypt certificates (production)** ```bash certbot certonly --standalone -d orchestrator.yourdomain.com ``` ### Security Headers ```python # security_headers.py from starlette.middleware.cors import CORSMiddleware from starlette.middleware.sessions import SessionMiddleware app.add_middleware( CORSMiddleware, allow_origins=["https://yourdomain.com"], allow_credentials=True, allow_methods=["GET", "POST", "PUT", "DELETE"], allow_headers=["Authorization", "Content-Type"], ) # Add security headers @app.middleware("http") async def add_security_headers(request, call_next): response = await call_next(request) response.headers["X-Content-Type-Options"] = "nosniff" response.headers["X-Frame-Options"] = "DENY" response.headers["X-XSS-Protection"] = "1; mode=block" response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains" response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin" return response ``` ### Authentication ```python # auth.py import jwt from datetime import datetime, timedelta def create_access_token(data: dict, expires_delta: timedelta = None): to_encode = data.copy() if expires_delta: expire = datetime.utcnow() + expires_delta else: expire = datetime.utcnow() + timedelta(minutes=15) to_encode.update({"exp": expire}) encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM) return encoded_jwt def verify_token(token: str): try: payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM]) return payload except jwt.PyJWTError: return None ``` ## Troubleshooting ### Common Deployment Issues #### 1. Pod CrashLoopBackOff ```bash # Check pod logs kubectl logs -f pod-name -n mcp-orchestrator # Check events kubectl get events -n mcp-orchestrator --sort-by='.lastTimestamp' # Debug pod kubectl debug -it pod-name -n mcp-orchestrator --image=busybox ``` #### 2. Database Connection Issues ```bash # Test database connectivity kubectl exec -it pod-name -n mcp-orchestrator -- python -c " import asyncpg import asyncio async def test(): try: conn = await asyncpg.connect('postgresql://user:pass@host:5432/db') await conn.execute('SELECT 1') print('Database connection successful') await conn.close() except Exception as e: print(f'Database connection failed: {e}') asyncio.run(test()) " ``` #### 3. Memory Issues ```bash # Check resource usage kubectl top pods -n mcp-orchestrator # Check node resources kubectl top nodes # Increase memory limits kubectl patch deployment orchestrator -n mcp-orchestrator -p '{"spec":{"template":{"spec":{"containers":[{"name":"orchestrator","resources":{"limits":{"memory":"2Gi"}}}]}}}}' ``` ### Performance Tuning #### 1. Connection Pool Optimization ```python # Tune connection pool settings DATABASE_POOL_SIZE=20 # Increase for high load DATABASE_MAX_OVERFLOW=30 # Allow overflow connections DATABASE_POOL_TIMEOUT=30 # Timeout for acquiring connection ``` #### 2. Cache Optimization ```python # Redis configuration CACHE_TTL=3600 # Adjust based on use case CACHE_COMPRESSION=true # Enable for large responses ``` #### 3. Horizontal Pod Autoscaling ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: orchestrator-hpa namespace: mcp-orchestrator spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: orchestrator minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ### Health Checks #### Application Health Check ```python # health_check.py from fastapi import FastAPI from prometheus_client import generate_latest, CONTENT_TYPE_LATEST app = FastAPI() @app.get("/health/live") async def liveness_check(): return {"status": "alive"} @app.get("/health/ready") async def readiness_check(): # Check database connectivity # Check cache connectivity # Check external services return {"status": "ready"} @app.get("/health/detailed") async def detailed_health(): return { "status": "healthy", "checks": { "database": await check_database(), "cache": await check_cache(), "external_services": await check_external_services() } } ``` This completes the comprehensive deployment guide. The platform can now be deployed across various environments with proper configuration, monitoring, and security measures in place.