aegislm / security /container_security.md
ACA050's picture
Upload 57 files
f2c6053 verified

AegisLM Container Security Hardening Guide

Overview

This document defines the container security hardening requirements for AegisLM. All containers must meet these security standards before deployment to production.

Container Security Checklist

βœ… Required Security Controls

Control Description Priority Status
Non-root user Containers run as non-root Required βœ… Implemented
Read-only root filesystem Root filesystem is read-only Required πŸ”„ To Implement
Drop capabilities Drop all unnecessary Linux capabilities Required πŸ”„ To Implement
Seccomp profile Enable seccomp restriction Required πŸ”„ To Implement
No privileged mode Never run containers in privileged mode Required βœ… Implemented
Resource limits Set CPU/memory limits Required βœ… Implemented
Health checks Liveness and readiness probes Required βœ… Implemented
Minimal base image Use minimal base images Required πŸ”„ To Implement

Kubernetes Pod Security

Pod Security Standards

All AegisLM pods must use the following security context:

yaml
# deployment/k8s/api-deployment.yaml (updated)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: aegislm-api
  namespace: aegislm
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: api
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
            runAsNonRoot: true
            runAsUser: 1000

Network Policies

Create deployment/k8s/network-policy.yaml:

yaml
---
# AegisLM Network Policies
# Kubernetes NetworkPolicy definitions for micro-segmentation

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-network-policy
  namespace: aegislm
spec:
  podSelector:
    matchLabels:
      app: aegislm
      component: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow ingress from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8000
    # Allow ingress from dashboard
    - from:
        - podSelector:
            matchLabels:
              app: aegislm
              component: dashboard
      ports:
        - protocol: TCP
          port: 8000
  egress:
    # Allow egress to PostgreSQL
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    # Allow egress to Redis
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379
    # Allow egress to worker pods
    - to:
        - podSelector:
            matchLabels:
              app: aegislm
              component: worker
      ports:
        - protocol: TCP
          port: 8001
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
    # Allow egress to model service (internal only)
    - to:
        - podSelector:
            matchLabels:
              app: aegislm
              component: model-service
      ports:
        - protocol: TCP
          port: 8002
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: worker-network-policy
  namespace: aegislm
spec:
  podSelector:
    matchLabels:
      app: aegislm
      component: worker
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow ingress from API
    - from:
        - podSelector:
            matchLabels:
              app: aegislm
              component: api
      ports:
        - protocol: TCP
          port: 8001
  egress:
    # Allow egress to model service
    - to:
        - podSelector:
            matchLabels:
              app: aegislm
              component: model-service
      ports:
        - protocol: TCP
          port: 8002
    # Allow egress to object storage
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              app: minio
      ports:
        - protocol: TCP
          port: 9000
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: model-service-network-policy
  namespace: aegislm
spec:
  podSelector:
    matchLabels:
      app: aegislm
      component: model-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Only allow ingress from worker pods (internal only)
    - from:
        - podSelector:
            matchLabels:
              app: aegislm
              component: worker
      ports:
        - protocol: TCP
          port: 8002
  # No egress - model service is internal only
  egress: []
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dashboard-network-policy
  namespace: aegislm
spec:
  podSelector:
    matchLabels:
      app: aegislm
      component: dashboard
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow ingress from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 3000
  egress:
    # Allow egress to API
    - to:
        - podSelector:
            matchLabels:
              app: aegislm
              component: api
      ports:
        - protocol: TCP
          port: 8000
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

Docker Security Enhancements

Enhanced Base Dockerfile

Update services/base/Dockerfile to use minimal base image:

dockerfile
# AegisLM Base Docker Image - Hardened Version
# Multi-Agent Adversarial LLM Evaluation Framework

# Use distroless base image for minimal attack surface
FROM gcr.io/distroless/python3-debian11:nonroot AS builder

# Build stage
FROM python:3.11-slim-bookworm AS build

# Install build dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --require-hashes -r requirements.txt

# Final stage - distroless
FROM gcr.io/distroless/python3-debian11:nonroot

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PATH="/opt/venv/bin:$PATH"

# Copy virtual environment from builder
COPY --from=build /opt/venv /opt/venv

# Copy application
WORKDIR /app
COPY --chown=nonroot:nonroot . /app

# Switch to non-root user (built into distroless)
USER nonroot

# Additional security: read-only filesystem
# (Requires tmpfs for /tmp and writeable mounts)
VOLUME ["/tmp", "/var/cache/pip"]

# Set no new privilege flag
ARG BUILD_DATE
ARG VCS_REF
LABEL org.label-schema.build-date=$BUILD_DATE \
      org.label-schema.name="aegislm" \
      org.label-schema.vcs-ref=$VCS_REF \
      org.label-schema.vcs-url="https://github.com/aegislm/aegislm"

CMD ["python"]

Enhanced API Dockerfile

dockerfile
# AegisLM API Service Dockerfile
# Production-grade security hardened

# Build stage
FROM python:3.11-slim-bookworm AS builder

ENV PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --require-hashes -r requirements.txt

# Copy source
COPY . .

# Final stage
FROM gcr.io/distroless/python3-debian11:nonroot

# Security: No root
USER nonroot

WORKDIR /app

# Copy from builder
COPY --from=builder /opt/venv /opt/venv
COPY --from=builder --chown=nonroot:nonroot /app /app

ENV PATH="/opt/venv/bin:$PATH"

# Security: Read-only filesystem preparation
# Use tmpfs for temporary files
TMPFS_SIZE=64m
ENV TMPFS_OPTS=size=${TMPFS_SIZE},mode=1777

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/live')"

# Run as non-root
USER nonroot

CMD ["python", "-m", "uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Container Runtime Security

Seccomp Profile

Create a custom seccomp profile in deployment/k8s/seccomp-profile.yaml:

yaml
# Custom seccomp profile for AegisLM
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: aegislm-restricted
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegedContainer: false
allowedCapabilities:
  - NET_BIND_SERVICE
defaultAddCapabilities: []
fsGroup:
  type: RunAsAny
priority: 10
readOnlyRootFilesystem: true
requiredDropCapabilities:
  - ALL
runAsUser:
  type: MustRunAs
  uid: 1000
seLinuxContext:
  type: MustRunAs
  seLinuxOptions:
    level: "s0:c123,c456"
seccompProfiles:
  - runtime/default
supplementalGroups:
  type: RunAsAny
volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim

Runtime Security

Resource Limits (Already Implemented)

yaml
# In each deployment
resources:
  requests:
    cpu: 500m
    memory: 1Gi
    nvidia.com/gpu: 1  # For model service
  limits:
    cpu: 1000m
    memory: 2Gi
    nvidia.com/gpu: 1

Security Validation Tests

Create tests/test_container_security.py:

python
"""
Container Security Validation Tests
Tests that all security controls are properly configured
"""

import pytest
import yaml
from pathlib import Path


class TestContainerSecurity:
    """Test container security configurations."""
    
    @pytest.fixture
    def k8s_deployments(self):
        """Load Kubernetes deployment files."""
        deploy_dir = Path("deployment/k8s")
        deployments = {}
        
        for deploy_file in deploy_dir.glob("*-deployment.yaml"):
            with open(deploy_file) as f:
                doc = yaml.safe_load(f)
                deployments[deploy_file.stem] = doc
        
        return deployments
    
    def test_containers_run_as_non_root(self, k8s_deployments):
        """Verify all containers run as non-root user."""
        for name, deploy in k8s_deployments.items():
            spec = deploy.get("spec", {})
            template = spec.get("template", {})
            security_ctx = template.get("spec", {}).get("securityContext", {})
            
            assert security_ctx.get("runAsNonRoot") is True, \
                f"{name}: Container must run as non-root"
            assert security_ctx.get("runAsUser") == 1000, \
                f"{name}: Must run as user 1000"
    
    def test_containers_drop_all_capabilities(self, k8s_deployments):
        """Verify all containers drop all capabilities."""
        for name, deploy in k8s_deployments.items():
            spec = deploy.get("spec", {})
            template = spec.get("template", {})
            containers = template.get("spec", {}).get("containers", [])
            
            for container in containers:
                security_ctx = container.get("securityContext", {})
                capabilities = security_ctx.get("capabilities", {})
                dropped = capabilities.get("drop", [])
                
                assert "ALL" in dropped, \
                    f"{name}/{container['name']}: Must drop ALL capabilities"
    
    def test_no_privileged_containers(self, k8s_deployments):
        """Verify no containers run in privileged mode."""
        for name, deploy in k8s_deployments.items():
            spec = deploy.get("spec", {})
            template = spec.get("template", {})
            containers = template.get("spec", {}).get("containers", [])
            
            for container in containers:
                security_ctx = container.get("securityContext", {})
                
                assert security_ctx.get("allowPrivilegeEscalation") is False, \
                    f"{name}/{container['name']}: Must not allow privilege escalation"
    
    def test_read_only_root_filesystem(self, k8s_deployments):
        """Verify containers have read-only root filesystem."""
        for name, deploy in k8s_deployments.items():
            spec = deploy.get("spec", {})
            template = spec.get("template", {})
            containers = template.get("spec", {}).get("containers", [])
            
            for container in containers:
                security_ctx = container.get("securityContext", {})
                
                # Note: This will require tmpfs mounts for /tmp
                assert security_ctx.get("readOnlyRootFilesystem") is True, \
                    f"{name}/{container['name']}: Must have read-only root filesystem"
    
    def test_seccomp_profile_set(self, k8s_deployments):
        """Verify seccomp profile is set."""
        for name, deploy in k8s_deployments.items():
            spec = deploy.get("spec", {})
            template = spec.get("template", {})
            security_ctx = template.get("spec", {}).get("securityContext", {})
            
            seccomp = security_ctx.get("seccompProfile", {})
            assert seccomp.get("type") == "RuntimeDefault", \
                f"{name}: Must use RuntimeDefault seccomp profile"


class TestNetworkPolicies:
    """Test network policy configurations."""
    
    @pytest.fixture
    def network_policies(self):
        """Load network policy files."""
        policy_dir = Path("deployment/k8s")
        policies = {}
        
        for policy_file in policy_dir.glob("network-policy*.yaml"):
            with open(policy_file) as f:
                # Handle multiple documents
                for doc in yaml.safe_load_all(f):
                    if doc:
                        name = doc.get("metadata", {}).get("name")
                        policies[name] = doc
        
        return policies
    
    def test_api_has_ingress_policy(self, network_policies):
        """Verify API has ingress network policy."""
        assert "api-network-policy" in network_policies, \
            "API must have network policy"
        
        policy = network_policies["api-network-policy"]
        assert "Ingress" in policy.get("policyTypes", []), \
            "API must have ingress policy"
    
    def test_model_service_internal_only(self, network_policies):
        """Verify model service is internal only."""
        if "model-service-network-policy" not in network_policies:
            pytest.skip("Model service network policy not defined")
        
        policy = network_policies["model-service-network-policy"]
        ingress = policy.get("spec", {}).get("ingress", [])
        
        # Model service should not have public ingress
        # Should only allow from worker pods
        assert len(ingress) <= 1, \
            "Model service should have limited ingress"
    
    def test_all_pods_have_policies(self, network_policies):
        """Verify all critical components have network policies."""
        required_policies = [
            "api-network-policy",
            "worker-network-policy",
            "dashboard-network-policy",
        ]
        
        for policy_name in required_policies:
            assert policy_name in network_policies, \
                f"Required network policy {policy_name} not found"


class TestDockerfileSecurity:
    """Test Dockerfile security configurations."""
    
    @pytest.fixture
    def dockerfiles(self):
        """Load Dockerfile content."""
        dockerfiles = {}
        
        for dockerfile_path in Path("services").rglob("Dockerfile"):
            with open(dockerfile_path) as f:
                dockerfiles[dockerfile_path.name] = f.read()
        
        return dockerfiles
    
    def test_no_root_user_in_dockerfile(self, dockerfiles):
        """Verify Dockerfiles don't use root user."""
        for name, content in dockerfiles.items():
            # Check for USER root
            lines = content.split("\n")
            for line in lines:
                if line.strip().startswith("USER") and "root" in line.lower():
                    pytest.fail(f"{name}: Should not use root user")
    
    def test_no_latest_tag(self, dockerfiles):
        """Verify no 'latest' tag is used."""
        for name, content in dockerfiles.items():
            lines = content.split("\n")
            for line in lines:
                if line.strip().startswith("FROM"):
                    if ":latest" in line.lower():
                        pytest.fail(f"{name}: Should not use :latest tag")

Secrets Management

External Secrets

yaml
# deployment/k8s/external-secrets.yaml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: aegislm-vault
spec:
  provider:
    vault:
      server: "https://vault.example.com:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: kubernetes
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: aegislm-secrets
  namespace: aegislm
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aegislm-vault
    kind: ClusterSecretStore
  target:
    name: aegislm-secrets
    creationPolicy: Owner
  data:
    - secretKey: DATABASE_URL
      remoteRef:
        key: aegislm/database
        property: url
    - secretKey: API_SECRET_KEY
      remoteRef:
        key: aegislm/api
        property: secret_key

Compliance Mapping

Control NIST 800-53 ISO 27001 PCI DSS
Non-root containers AC-3 A.9.1 Req 7.1
Read-only filesystem AC-3 A.9.1 Req 7.1
Drop capabilities AC-3 A.9.1 -
Network segmentation AC-4 A.13.1 Req 1.3
Secrets management IA-5 A.9.4 Req 3.4

Security Audit Commands

bash
# Check container security context
kubectl get pods -n aegislm -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}'

# Check network policies
kubectl get networkpolicies -n aegislm

# Check for privileged containers
kubectl get pods -n aegislm -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'

# Run security tests
pytest tests/test_container_security.py -v

References