Spaces:

KashiAI
/

KCH

Sleeping

App Files Files Community

KCH / docs /docker-ai.md

bsamadi

Update to pixi env

c032460 about 2 months ago

preview code

raw

history blame contribute delete

11.1 kB

Docker AI: Deploying Local LLMs and MCP Servers

This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.

Docker Compose Models
Docker Model Runner
Docker MCP Toolkit
Common Use Cases

Docker Compose Models

Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the models top-level element in your Compose files.

Prerequisites

Docker Compose v2.38 or later
Docker Model Runner (DMR) or compatible cloud providers
For DMR: See requirements

Basic Model Definition

Define models in your docker-compose.yml:

services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2

This configuration:

Defines a service chat-app that uses a model named llm
References the ai/smollm2 model image from Docker Hub

Model Configuration Options

Configure models with various runtime parameters:

models:
  llm:
    model: ai/smollm2
    context_size: 1024
    runtime_flags:
      - "--a-flag"
      - "--another-flag=42"

Key configuration options:

model (required): OCI artifact identifier for the model
context_size: Maximum token context size (keep as small as feasible for your needs)
runtime_flags: Command-line flags passed to the inference engine (e.g., llama.cpp parameters)
x-*: Platform-specific extension attributes

Service Model Binding

Short Syntax

The simplest way to bind models to services:

services:
  app:
    image: my-app
    models:
      - llm
      - embedding-model

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm

Auto-generated environment variables:

LLM_URL - URL to access the LLM model
LLM_MODEL - Model identifier
EMBEDDING_MODEL_URL - URL to access the embedding model
EMBEDDING_MODEL_MODEL - Model identifier

Long Syntax

Customize environment variable names:

services:
  app:
    image: my-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME
      embedding-model:
        endpoint_var: EMBEDDING_URL
        model_var: EMBEDDING_NAME

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm

Platform Portability

Docker Model Runner

When Docker Model Runner is enabled locally:

services:
  chat-app:
    image: my-chat-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--no-prefill-assistant"

Docker Model Runner will:

Pull and run the model locally
Provide endpoint URLs
Inject environment variables into the service

Cloud Providers

The same Compose file works on cloud providers:

services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2
    # Cloud-specific configurations
    x-cloud-options:
      - "cloud.instance-type=gpu-small"
      - "cloud.region=us-west-2"

Cloud providers may:

Use managed AI services
Apply cloud-specific optimizations
Provide monitoring and logging
Handle model versioning automatically

Common Runtime Configurations

Development Mode

models:
  dev_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--verbose-prompt"
      - "--log-prefix"
      - "--log-timestamps"
      - "--log-colors"

Conservative (Disabled Reasoning)

models:
  conservative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.1"
      - "--top-k"
      - "1"
      - "--reasoning-budget"
      - "0"

Creative (High Randomness)

models:
  creative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "1"
      - "--top-p"
      - "0.9"

Highly Deterministic

models:
  deterministic_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0"
      - "--top-k"
      - "1"

Concurrent Processing

models:
  concurrent_model:
    model: ai/model
    context_size: 2048
    runtime_flags:
      - "--threads"
      - "8"
      - "--mlock"  # Lock memory to prevent swapping

Docker Model Runner

Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.

Key Features

Local model execution: Run models on your local machine
GPU support: Leverage local GPU resources
Automatic model pulling: Models are pulled from Docker Hub as needed
OpenAI-compatible API: Expose models via standard API endpoints

Use Cases

Development and testing: Test AI applications locally before cloud deployment
Privacy-sensitive workloads: Keep data on-premises
Offline development: Work without internet connectivity
Cost optimization: Avoid cloud inference costs

Docker MCP Toolkit

The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.

What is MCP?

Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:

Interact with external tools and APIs
Access databases and services
Execute code in isolated environments
Retrieve real-world data

Docker MCP Components

1. MCP Gateway

The Docker MCP Gateway is the core open-source component that:

Manages MCP containers
Provides a unified endpoint for AI applications
Mediates between agents and MCP servers
Enables dynamic MCP discovery and configuration

2. MCP Catalog

The Docker MCP Catalog is a curated collection of trusted, containerized MCP servers including:

270+ curated servers from publishers like Stripe, Elastic, Grafana
Publisher trust levels: Distinguish between official, verified, and community servers
Commit pinning: Each server tied to specific Git commits for verifiability
AI-audited updates: Automated reviews of code changes

3. MCP Toolkit (Docker Desktop)

A management interface in Docker Desktop for:

Discovering MCP servers
Configuring and managing servers
One-click deployment
Centralized authentication

Dynamic MCPs: Smart Search and Tool Composition

Recent enhancements enable agents to dynamically discover and configure MCP servers:

Smart Search Features

mcp-find: Find MCP servers by name or description

Agent: "Find MCP servers for web searching"
→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.

mcp-add: Add MCP servers to the current session

Agent: "Add the DuckDuckGo MCP server"
→ Server is pulled, configured, and made available

Benefits of Dynamic MCPs

No manual configuration: Agents discover and add tools as needed
Reduced token usage: Only load tools when required
Autonomous operation: Agents manage their own tool ecosystem
Secure sandbox: Tool composition happens in isolated environments

Security Features

Docker MCP Toolkit implements multiple security layers:

Containerization: Strong isolation limits blast radius
Commit pinning: Precise attribution and verification
Automated auditing: AI-powered code reviews
Publisher trust levels: Clear indicators of server origin
Isolated execution: MCP servers run in separate containers

Using MCP with Docker Compose

Example docker-compose.yml for MCP servers:

services:
  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - mcp-data:/data
    environment:
      - MCP_CATALOG_URL=https://hub.docker.com/mcp

  my-app:
    image: my-ai-app
    depends_on:
      - mcp-gateway
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000

volumes:
  mcp-data:

Submitting MCP Servers

To contribute MCP servers to the Docker MCP Catalog:

Follow the submission guidance
Submit to the MCP Registry
Servers undergo automated and manual review
Approved servers appear in the catalog with appropriate trust levels

Common Use Cases

1. Local AI Development Environment

services:
  dev-app:
    build: .
    models:
      - llm
      - embeddings
    depends_on:
      - mcp-gateway

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--log-colors"
  
  embeddings:
    model: ai/all-minilm

2. Multi-Model AI Application

services:
  chat-service:
    image: my-chat-service
    models:
      chat-model:
        endpoint_var: CHAT_MODEL_URL
        model_var: CHAT_MODEL_NAME
      
  code-service:
    image: my-code-service
    models:
      code-model:
        endpoint_var: CODE_MODEL_URL
        model_var: CODE_MODEL_NAME

models:
  chat-model:
    model: ai/smollm2
    context_size: 2048
    runtime_flags:
      - "--temp"
      - "0.7"
  
  code-model:
    model: ai/codellama
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.2"

3. AI Agent with Dynamic MCP Tools

services:
  ai-agent:
    image: my-ai-agent
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000
      - ENABLE_DYNAMIC_MCPS=true
    depends_on:
      - mcp-gateway
      - llm-service
    models:
      - agent-llm

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - ./mcp-catalog.yml:/config/catalog.yml

models:
  agent-llm:
    model: ai/smollm2
    context_size: 4096

Additional Resources

Last Updated: December 2024

Docker AI: Deploying Local LLMs and MCP Servers

Table of Contents

Docker Compose Models

Prerequisites

Basic Model Definition

Model Configuration Options

Service Model Binding

Short Syntax

Long Syntax

Platform Portability

Docker Model Runner

Cloud Providers

Common Runtime Configurations

Development Mode

Conservative (Disabled Reasoning)

Creative (High Randomness)

Highly Deterministic

Concurrent Processing

Docker Model Runner

Key Features

Use Cases

Docker MCP Toolkit

What is MCP?

Docker MCP Components

1. MCP Gateway

2. MCP Catalog

3. MCP Toolkit (Docker Desktop)

Dynamic MCPs: Smart Search and Tool Composition

Smart Search Features

Benefits of Dynamic MCPs

Security Features

Using MCP with Docker Compose

Submitting MCP Servers

Common Use Cases

1. Local AI Development Environment

2. Multi-Model AI Application

3. AI Agent with Dynamic MCP Tools

Additional Resources