# AegisLM Container Security Hardening Guide ## Overview This document defines the container security hardening requirements for AegisLM. All containers must meet these security standards before deployment to production. ## Container Security Checklist ### ✅ Required Security Controls | Control | Description | Priority | Status | |---------|-------------|----------|--------| | Non-root user | Containers run as non-root | Required | ✅ Implemented | | Read-only root filesystem | Root filesystem is read-only | Required | 🔄 To Implement | | Drop capabilities | Drop all unnecessary Linux capabilities | Required | 🔄 To Implement | | Seccomp profile | Enable seccomp restriction | Required | 🔄 To Implement | | No privileged mode | Never run containers in privileged mode | Required | ✅ Implemented | | Resource limits | Set CPU/memory limits | Required | ✅ Implemented | | Health checks | Liveness and readiness probes | Required | ✅ Implemented | | Minimal base image | Use minimal base images | Required | 🔄 To Implement | --- ## Kubernetes Pod Security ### Pod Security Standards All AegisLM pods must use the following security context: ``` yaml # deployment/k8s/api-deployment.yaml (updated) apiVersion: apps/v1 kind: Deployment metadata: name: aegislm-api namespace: aegislm spec: template: spec: securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: api securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 ``` ### Network Policies Create `deployment/k8s/network-policy.yaml`: ``` yaml --- # AegisLM Network Policies # Kubernetes NetworkPolicy definitions for micro-segmentation apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-network-policy namespace: aegislm spec: podSelector: matchLabels: app: aegislm component: api policyTypes: - Ingress - Egress ingress: # Allow ingress from ingress controller - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 8000 # Allow ingress from dashboard - from: - podSelector: matchLabels: app: aegislm component: dashboard ports: - protocol: TCP port: 8000 egress: # Allow egress to PostgreSQL - to: - podSelector: matchLabels: app: postgres ports: - protocol: TCP port: 5432 # Allow egress to Redis - to: - podSelector: matchLabels: app: redis ports: - protocol: TCP port: 6379 # Allow egress to worker pods - to: - podSelector: matchLabels: app: aegislm component: worker ports: - protocol: TCP port: 8001 # Allow DNS - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 # Allow egress to model service (internal only) - to: - podSelector: matchLabels: app: aegislm component: model-service ports: - protocol: TCP port: 8002 --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: worker-network-policy namespace: aegislm spec: podSelector: matchLabels: app: aegislm component: worker policyTypes: - Ingress - Egress ingress: # Allow ingress from API - from: - podSelector: matchLabels: app: aegislm component: api ports: - protocol: TCP port: 8001 egress: # Allow egress to model service - to: - podSelector: matchLabels: app: aegislm component: model-service ports: - protocol: TCP port: 8002 # Allow egress to object storage - to: - namespaceSelector: {} podSelector: matchLabels: app: minio ports: - protocol: TCP port: 9000 # Allow DNS - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: model-service-network-policy namespace: aegislm spec: podSelector: matchLabels: app: aegislm component: model-service policyTypes: - Ingress - Egress ingress: # Only allow ingress from worker pods (internal only) - from: - podSelector: matchLabels: app: aegislm component: worker ports: - protocol: TCP port: 8002 # No egress - model service is internal only egress: [] --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: dashboard-network-policy namespace: aegislm spec: podSelector: matchLabels: app: aegislm component: dashboard policyTypes: - Ingress - Egress ingress: # Allow ingress from ingress controller - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 3000 egress: # Allow egress to API - to: - podSelector: matchLabels: app: aegislm component: api ports: - protocol: TCP port: 8000 # Allow DNS - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 ``` --- ## Docker Security Enhancements ### Enhanced Base Dockerfile Update `services/base/Dockerfile` to use minimal base image: ``` dockerfile # AegisLM Base Docker Image - Hardened Version # Multi-Agent Adversarial LLM Evaluation Framework # Use distroless base image for minimal attack surface FROM gcr.io/distroless/python3-debian11:nonroot AS builder # Build stage FROM python:3.11-slim-bookworm AS build # Install build dependencies RUN apt-get update && apt-get install -y \ build-essential \ && rm -rf /var/lib/apt/lists/* # Create virtual environment RUN python -m venv /opt/venv ENV PATH="/opt/venv/bin:$PATH" # Install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir --require-hashes -r requirements.txt # Final stage - distroless FROM gcr.io/distroless/python3-debian11:nonroot # Set environment variables ENV PYTHONUNBUFFERED=1 \ PYTHONDONTWRITEBYTECODE=1 \ PATH="/opt/venv/bin:$PATH" # Copy virtual environment from builder COPY --from=build /opt/venv /opt/venv # Copy application WORKDIR /app COPY --chown=nonroot:nonroot . /app # Switch to non-root user (built into distroless) USER nonroot # Additional security: read-only filesystem # (Requires tmpfs for /tmp and writeable mounts) VOLUME ["/tmp", "/var/cache/pip"] # Set no new privilege flag ARG BUILD_DATE ARG VCS_REF LABEL org.label-schema.build-date=$BUILD_DATE \ org.label-schema.name="aegislm" \ org.label-schema.vcs-ref=$VCS_REF \ org.label-schema.vcs-url="https://github.com/aegislm/aegislm" CMD ["python"] ``` ### Enhanced API Dockerfile ``` dockerfile # AegisLM API Service Dockerfile # Production-grade security hardened # Build stage FROM python:3.11-slim-bookworm AS builder ENV PIP_NO_CACHE_DIR=1 \ PIP_DISABLE_PIP_VERSION_CHECK=1 WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir --require-hashes -r requirements.txt # Copy source COPY . . # Final stage FROM gcr.io/distroless/python3-debian11:nonroot # Security: No root USER nonroot WORKDIR /app # Copy from builder COPY --from=builder /opt/venv /opt/venv COPY --from=builder --chown=nonroot:nonroot /app /app ENV PATH="/opt/venv/bin:$PATH" # Security: Read-only filesystem preparation # Use tmpfs for temporary files TMPFS_SIZE=64m ENV TMPFS_OPTS=size=${TMPFS_SIZE},mode=1777 # Expose port EXPOSE 8000 # Health check HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/live')" # Run as non-root USER nonroot CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` --- ## Container Runtime Security ### Seccomp Profile Create a custom seccomp profile in `deployment/k8s/seccomp-profile.yaml`: ``` yaml # Custom seccomp profile for AegisLM apiVersion: security.openshift.io/v1 kind: SecurityContextConstraints metadata: name: aegislm-restricted allowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegedContainer: false allowedCapabilities: - NET_BIND_SERVICE defaultAddCapabilities: [] fsGroup: type: RunAsAny priority: 10 readOnlyRootFilesystem: true requiredDropCapabilities: - ALL runAsUser: type: MustRunAs uid: 1000 seLinuxContext: type: MustRunAs seLinuxOptions: level: "s0:c123,c456" seccompProfiles: - runtime/default supplementalGroups: type: RunAsAny volumes: - configMap - emptyDir - projected - secret - downwardAPI - persistentVolumeClaim ``` --- ## Runtime Security ### Resource Limits (Already Implemented) ``` yaml # In each deployment resources: requests: cpu: 500m memory: 1Gi nvidia.com/gpu: 1 # For model service limits: cpu: 1000m memory: 2Gi nvidia.com/gpu: 1 ``` ### Security Validation Tests Create `tests/test_container_security.py`: ``` python """ Container Security Validation Tests Tests that all security controls are properly configured """ import pytest import yaml from pathlib import Path class TestContainerSecurity: """Test container security configurations.""" @pytest.fixture def k8s_deployments(self): """Load Kubernetes deployment files.""" deploy_dir = Path("deployment/k8s") deployments = {} for deploy_file in deploy_dir.glob("*-deployment.yaml"): with open(deploy_file) as f: doc = yaml.safe_load(f) deployments[deploy_file.stem] = doc return deployments def test_containers_run_as_non_root(self, k8s_deployments): """Verify all containers run as non-root user.""" for name, deploy in k8s_deployments.items(): spec = deploy.get("spec", {}) template = spec.get("template", {}) security_ctx = template.get("spec", {}).get("securityContext", {}) assert security_ctx.get("runAsNonRoot") is True, \ f"{name}: Container must run as non-root" assert security_ctx.get("runAsUser") == 1000, \ f"{name}: Must run as user 1000" def test_containers_drop_all_capabilities(self, k8s_deployments): """Verify all containers drop all capabilities.""" for name, deploy in k8s_deployments.items(): spec = deploy.get("spec", {}) template = spec.get("template", {}) containers = template.get("spec", {}).get("containers", []) for container in containers: security_ctx = container.get("securityContext", {}) capabilities = security_ctx.get("capabilities", {}) dropped = capabilities.get("drop", []) assert "ALL" in dropped, \ f"{name}/{container['name']}: Must drop ALL capabilities" def test_no_privileged_containers(self, k8s_deployments): """Verify no containers run in privileged mode.""" for name, deploy in k8s_deployments.items(): spec = deploy.get("spec", {}) template = spec.get("template", {}) containers = template.get("spec", {}).get("containers", []) for container in containers: security_ctx = container.get("securityContext", {}) assert security_ctx.get("allowPrivilegeEscalation") is False, \ f"{name}/{container['name']}: Must not allow privilege escalation" def test_read_only_root_filesystem(self, k8s_deployments): """Verify containers have read-only root filesystem.""" for name, deploy in k8s_deployments.items(): spec = deploy.get("spec", {}) template = spec.get("template", {}) containers = template.get("spec", {}).get("containers", []) for container in containers: security_ctx = container.get("securityContext", {}) # Note: This will require tmpfs mounts for /tmp assert security_ctx.get("readOnlyRootFilesystem") is True, \ f"{name}/{container['name']}: Must have read-only root filesystem" def test_seccomp_profile_set(self, k8s_deployments): """Verify seccomp profile is set.""" for name, deploy in k8s_deployments.items(): spec = deploy.get("spec", {}) template = spec.get("template", {}) security_ctx = template.get("spec", {}).get("securityContext", {}) seccomp = security_ctx.get("seccompProfile", {}) assert seccomp.get("type") == "RuntimeDefault", \ f"{name}: Must use RuntimeDefault seccomp profile" class TestNetworkPolicies: """Test network policy configurations.""" @pytest.fixture def network_policies(self): """Load network policy files.""" policy_dir = Path("deployment/k8s") policies = {} for policy_file in policy_dir.glob("network-policy*.yaml"): with open(policy_file) as f: # Handle multiple documents for doc in yaml.safe_load_all(f): if doc: name = doc.get("metadata", {}).get("name") policies[name] = doc return policies def test_api_has_ingress_policy(self, network_policies): """Verify API has ingress network policy.""" assert "api-network-policy" in network_policies, \ "API must have network policy" policy = network_policies["api-network-policy"] assert "Ingress" in policy.get("policyTypes", []), \ "API must have ingress policy" def test_model_service_internal_only(self, network_policies): """Verify model service is internal only.""" if "model-service-network-policy" not in network_policies: pytest.skip("Model service network policy not defined") policy = network_policies["model-service-network-policy"] ingress = policy.get("spec", {}).get("ingress", []) # Model service should not have public ingress # Should only allow from worker pods assert len(ingress) <= 1, \ "Model service should have limited ingress" def test_all_pods_have_policies(self, network_policies): """Verify all critical components have network policies.""" required_policies = [ "api-network-policy", "worker-network-policy", "dashboard-network-policy", ] for policy_name in required_policies: assert policy_name in network_policies, \ f"Required network policy {policy_name} not found" class TestDockerfileSecurity: """Test Dockerfile security configurations.""" @pytest.fixture def dockerfiles(self): """Load Dockerfile content.""" dockerfiles = {} for dockerfile_path in Path("services").rglob("Dockerfile"): with open(dockerfile_path) as f: dockerfiles[dockerfile_path.name] = f.read() return dockerfiles def test_no_root_user_in_dockerfile(self, dockerfiles): """Verify Dockerfiles don't use root user.""" for name, content in dockerfiles.items(): # Check for USER root lines = content.split("\n") for line in lines: if line.strip().startswith("USER") and "root" in line.lower(): pytest.fail(f"{name}: Should not use root user") def test_no_latest_tag(self, dockerfiles): """Verify no 'latest' tag is used.""" for name, content in dockerfiles.items(): lines = content.split("\n") for line in lines: if line.strip().startswith("FROM"): if ":latest" in line.lower(): pytest.fail(f"{name}: Should not use :latest tag") ``` --- ## Secrets Management ### External Secrets ``` yaml # deployment/k8s/external-secrets.yaml apiVersion: external-secrets.io/v1beta1 kind: ClusterSecretStore metadata: name: aegislm-vault spec: provider: vault: server: "https://vault.example.com:8200" path: "secret" version: "v2" auth: kubernetes: mountPath: kubernetes --- apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: aegislm-secrets namespace: aegislm spec: refreshInterval: 1h secretStoreRef: name: aegislm-vault kind: ClusterSecretStore target: name: aegislm-secrets creationPolicy: Owner data: - secretKey: DATABASE_URL remoteRef: key: aegislm/database property: url - secretKey: API_SECRET_KEY remoteRef: key: aegislm/api property: secret_key ``` --- ## Compliance Mapping | Control | NIST 800-53 | ISO 27001 | PCI DSS | |---------|-------------|-----------|---------| | Non-root containers | AC-3 | A.9.1 | Req 7.1 | | Read-only filesystem | AC-3 | A.9.1 | Req 7.1 | | Drop capabilities | AC-3 | A.9.1 | - | | Network segmentation | AC-4 | A.13.1 | Req 1.3 | | Secrets management | IA-5 | A.9.4 | Req 3.4 | --- ## Security Audit Commands ``` bash # Check container security context kubectl get pods -n aegislm -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}' # Check network policies kubectl get networkpolicies -n aegislm # Check for privileged containers kubectl get pods -n aegislm -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext.privileged}{"\n"}{end}' # Run security tests pytest tests/test_container_security.py -v ``` --- ## References - [NIST 800-53 Security Controls](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final) - [Kubernetes Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) - [CIS Docker Benchmark](https://www.cisecurity.org/benchmark/docker) - [Distroless Docker Images](https://github.com/GoogleContainerTools/distroless)