KCH / docs /docker-ai.md
bsamadi's picture
Update to pixi env
c032460

Docker AI: Deploying Local LLMs and MCP Servers

This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.

Table of Contents


Docker Compose Models

Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the models top-level element in your Compose files.

Prerequisites

  • Docker Compose v2.38 or later
  • Docker Model Runner (DMR) or compatible cloud providers
  • For DMR: See requirements

Basic Model Definition

Define models in your docker-compose.yml:

services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2

This configuration:

  • Defines a service chat-app that uses a model named llm
  • References the ai/smollm2 model image from Docker Hub

Model Configuration Options

Configure models with various runtime parameters:

models:
  llm:
    model: ai/smollm2
    context_size: 1024
    runtime_flags:
      - "--a-flag"
      - "--another-flag=42"

Key configuration options:

  • model (required): OCI artifact identifier for the model
  • context_size: Maximum token context size (keep as small as feasible for your needs)
  • runtime_flags: Command-line flags passed to the inference engine (e.g., llama.cpp parameters)
  • x-*: Platform-specific extension attributes

Service Model Binding

Short Syntax

The simplest way to bind models to services:

services:
  app:
    image: my-app
    models:
      - llm
      - embedding-model

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm

Auto-generated environment variables:

  • LLM_URL - URL to access the LLM model
  • LLM_MODEL - Model identifier
  • EMBEDDING_MODEL_URL - URL to access the embedding model
  • EMBEDDING_MODEL_MODEL - Model identifier

Long Syntax

Customize environment variable names:

services:
  app:
    image: my-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME
      embedding-model:
        endpoint_var: EMBEDDING_URL
        model_var: EMBEDDING_NAME

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm

Platform Portability

Docker Model Runner

When Docker Model Runner is enabled locally:

services:
  chat-app:
    image: my-chat-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--no-prefill-assistant"

Docker Model Runner will:

  • Pull and run the model locally
  • Provide endpoint URLs
  • Inject environment variables into the service

Cloud Providers

The same Compose file works on cloud providers:

services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2
    # Cloud-specific configurations
    x-cloud-options:
      - "cloud.instance-type=gpu-small"
      - "cloud.region=us-west-2"

Cloud providers may:

  • Use managed AI services
  • Apply cloud-specific optimizations
  • Provide monitoring and logging
  • Handle model versioning automatically

Common Runtime Configurations

Development Mode

models:
  dev_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--verbose-prompt"
      - "--log-prefix"
      - "--log-timestamps"
      - "--log-colors"

Conservative (Disabled Reasoning)

models:
  conservative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.1"
      - "--top-k"
      - "1"
      - "--reasoning-budget"
      - "0"

Creative (High Randomness)

models:
  creative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "1"
      - "--top-p"
      - "0.9"

Highly Deterministic

models:
  deterministic_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0"
      - "--top-k"
      - "1"

Concurrent Processing

models:
  concurrent_model:
    model: ai/model
    context_size: 2048
    runtime_flags:
      - "--threads"
      - "8"
      - "--mlock"  # Lock memory to prevent swapping

Docker Model Runner

Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.

Key Features

  • Local model execution: Run models on your local machine
  • GPU support: Leverage local GPU resources
  • Automatic model pulling: Models are pulled from Docker Hub as needed
  • OpenAI-compatible API: Expose models via standard API endpoints

Use Cases

  • Development and testing: Test AI applications locally before cloud deployment
  • Privacy-sensitive workloads: Keep data on-premises
  • Offline development: Work without internet connectivity
  • Cost optimization: Avoid cloud inference costs

Docker MCP Toolkit

The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.

What is MCP?

Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:

  • Interact with external tools and APIs
  • Access databases and services
  • Execute code in isolated environments
  • Retrieve real-world data

Docker MCP Components

1. MCP Gateway

The Docker MCP Gateway is the core open-source component that:

  • Manages MCP containers
  • Provides a unified endpoint for AI applications
  • Mediates between agents and MCP servers
  • Enables dynamic MCP discovery and configuration

2. MCP Catalog

The Docker MCP Catalog is a curated collection of trusted, containerized MCP servers including:

  • 270+ curated servers from publishers like Stripe, Elastic, Grafana
  • Publisher trust levels: Distinguish between official, verified, and community servers
  • Commit pinning: Each server tied to specific Git commits for verifiability
  • AI-audited updates: Automated reviews of code changes

3. MCP Toolkit (Docker Desktop)

A management interface in Docker Desktop for:

  • Discovering MCP servers
  • Configuring and managing servers
  • One-click deployment
  • Centralized authentication

Dynamic MCPs: Smart Search and Tool Composition

Recent enhancements enable agents to dynamically discover and configure MCP servers:

Smart Search Features

mcp-find: Find MCP servers by name or description

Agent: "Find MCP servers for web searching"
→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.

mcp-add: Add MCP servers to the current session

Agent: "Add the DuckDuckGo MCP server"
→ Server is pulled, configured, and made available

Benefits of Dynamic MCPs

  1. No manual configuration: Agents discover and add tools as needed
  2. Reduced token usage: Only load tools when required
  3. Autonomous operation: Agents manage their own tool ecosystem
  4. Secure sandbox: Tool composition happens in isolated environments

Security Features

Docker MCP Toolkit implements multiple security layers:

  1. Containerization: Strong isolation limits blast radius
  2. Commit pinning: Precise attribution and verification
  3. Automated auditing: AI-powered code reviews
  4. Publisher trust levels: Clear indicators of server origin
  5. Isolated execution: MCP servers run in separate containers

Using MCP with Docker Compose

Example docker-compose.yml for MCP servers:

services:
  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - mcp-data:/data
    environment:
      - MCP_CATALOG_URL=https://hub.docker.com/mcp

  my-app:
    image: my-ai-app
    depends_on:
      - mcp-gateway
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000

volumes:
  mcp-data:

Submitting MCP Servers

To contribute MCP servers to the Docker MCP Catalog:

  1. Follow the submission guidance
  2. Submit to the MCP Registry
  3. Servers undergo automated and manual review
  4. Approved servers appear in the catalog with appropriate trust levels

Common Use Cases

1. Local AI Development Environment

services:
  dev-app:
    build: .
    models:
      - llm
      - embeddings
    depends_on:
      - mcp-gateway

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--log-colors"
  
  embeddings:
    model: ai/all-minilm

2. Multi-Model AI Application

services:
  chat-service:
    image: my-chat-service
    models:
      chat-model:
        endpoint_var: CHAT_MODEL_URL
        model_var: CHAT_MODEL_NAME
      
  code-service:
    image: my-code-service
    models:
      code-model:
        endpoint_var: CODE_MODEL_URL
        model_var: CODE_MODEL_NAME

models:
  chat-model:
    model: ai/smollm2
    context_size: 2048
    runtime_flags:
      - "--temp"
      - "0.7"
  
  code-model:
    model: ai/codellama
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.2"

3. AI Agent with Dynamic MCP Tools

services:
  ai-agent:
    image: my-ai-agent
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000
      - ENABLE_DYNAMIC_MCPS=true
    depends_on:
      - mcp-gateway
      - llm-service
    models:
      - agent-llm

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - ./mcp-catalog.yml:/config/catalog.yml

models:
  agent-llm:
    model: ai/smollm2
    context_size: 4096

Additional Resources


Last Updated: December 2024