Spaces:

KashiAI
/

KCH

Sleeping

KCH

File size: 11,084 Bytes

c032460

# Docker AI: Deploying Local LLMs and MCP Servers

This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.

## Table of Contents

- [Docker Compose Models](#docker-compose-models)
- [Docker Model Runner](#docker-model-runner)
- [Docker MCP Toolkit](#docker-mcp-toolkit)
- [Common Use Cases](#common-use-cases)

---

## Docker Compose Models

Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the `models` top-level element in your Compose files.

### Prerequisites

- Docker Compose v2.38 or later
- Docker Model Runner (DMR) or compatible cloud providers
- For DMR: See [requirements](https://docs.docker.com/ai/model-runner/#requirements)

### Basic Model Definition

Define models in your `docker-compose.yml`:

```yaml
services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2
```

This configuration:
- Defines a service `chat-app` that uses a model named `llm`
- References the `ai/smollm2` model image from Docker Hub

### Model Configuration Options

Configure models with various runtime parameters:

```yaml
models:
  llm:
    model: ai/smollm2
    context_size: 1024
    runtime_flags:
      - "--a-flag"
      - "--another-flag=42"
```

**Key configuration options:**

- **`model`** (required): OCI artifact identifier for the model
- **`context_size`**: Maximum token context size (keep as small as feasible for your needs)
- **`runtime_flags`**: Command-line flags passed to the inference engine (e.g., [llama.cpp parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md))
- **`x-*`**: Platform-specific extension attributes

### Service Model Binding

#### Short Syntax

The simplest way to bind models to services:

```yaml
services:
  app:
    image: my-app
    models:
      - llm
      - embedding-model

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm
```

Auto-generated environment variables:
- `LLM_URL` - URL to access the LLM model
- `LLM_MODEL` - Model identifier
- `EMBEDDING_MODEL_URL` - URL to access the embedding model
- `EMBEDDING_MODEL_MODEL` - Model identifier

#### Long Syntax

Customize environment variable names:

```yaml
services:
  app:
    image: my-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME
      embedding-model:
        endpoint_var: EMBEDDING_URL
        model_var: EMBEDDING_NAME

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm
```

### Platform Portability

#### Docker Model Runner

When Docker Model Runner is enabled locally:

```yaml
services:
  chat-app:
    image: my-chat-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--no-prefill-assistant"
```

Docker Model Runner will:
- Pull and run the model locally
- Provide endpoint URLs
- Inject environment variables into the service

#### Cloud Providers

The same Compose file works on cloud providers:

```yaml
services:
  chat-app:
    image: my-chat-app
    models:
      - llm

models:
  llm:
    model: ai/smollm2
    # Cloud-specific configurations
    x-cloud-options:
      - "cloud.instance-type=gpu-small"
      - "cloud.region=us-west-2"
```

Cloud providers may:
- Use managed AI services
- Apply cloud-specific optimizations
- Provide monitoring and logging
- Handle model versioning automatically

### Common Runtime Configurations

#### Development Mode

```yaml
models:
  dev_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--verbose-prompt"
      - "--log-prefix"
      - "--log-timestamps"
      - "--log-colors"
```

#### Conservative (Disabled Reasoning)

```yaml
models:
  conservative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.1"
      - "--top-k"
      - "1"
      - "--reasoning-budget"
      - "0"
```

#### Creative (High Randomness)

```yaml
models:
  creative_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "1"
      - "--top-p"
      - "0.9"
```

#### Highly Deterministic

```yaml
models:
  deterministic_model:
    model: ai/model
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0"
      - "--top-k"
      - "1"
```

#### Concurrent Processing

```yaml
models:
  concurrent_model:
    model: ai/model
    context_size: 2048
    runtime_flags:
      - "--threads"
      - "8"
      - "--mlock"  # Lock memory to prevent swapping
```

---

## Docker Model Runner

Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.

### Key Features

- **Local model execution**: Run models on your local machine
- **GPU support**: Leverage local GPU resources
- **Automatic model pulling**: Models are pulled from Docker Hub as needed
- **OpenAI-compatible API**: Expose models via standard API endpoints

### Use Cases

- **Development and testing**: Test AI applications locally before cloud deployment
- **Privacy-sensitive workloads**: Keep data on-premises
- **Offline development**: Work without internet connectivity
- **Cost optimization**: Avoid cloud inference costs

---

## Docker MCP Toolkit

The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.

### What is MCP?

Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:

- Interact with external tools and APIs
- Access databases and services
- Execute code in isolated environments
- Retrieve real-world data

### Docker MCP Components

#### 1. MCP Gateway

The [Docker MCP Gateway](https://github.com/docker/mcp-gateway/) is the core open-source component that:

- Manages MCP containers
- Provides a unified endpoint for AI applications
- Mediates between agents and MCP servers
- Enables dynamic MCP discovery and configuration

#### 2. MCP Catalog

The [Docker MCP Catalog](https://hub.docker.com/mcp) is a curated collection of trusted, containerized MCP servers including:

- **270+ curated servers** from publishers like Stripe, Elastic, Grafana
- **Publisher trust levels**: Distinguish between official, verified, and community servers
- **Commit pinning**: Each server tied to specific Git commits for verifiability
- **AI-audited updates**: Automated reviews of code changes

#### 3. MCP Toolkit (Docker Desktop)

A management interface in Docker Desktop for:

- Discovering MCP servers
- Configuring and managing servers
- One-click deployment
- Centralized authentication

### Dynamic MCPs: Smart Search and Tool Composition

Recent enhancements enable agents to dynamically discover and configure MCP servers:

#### Smart Search Features

**`mcp-find`**: Find MCP servers by name or description
```
Agent: "Find MCP servers for web searching"
→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.
```

**`mcp-add`**: Add MCP servers to the current session
```
Agent: "Add the DuckDuckGo MCP server"
→ Server is pulled, configured, and made available
```

#### Benefits of Dynamic MCPs

1. **No manual configuration**: Agents discover and add tools as needed
2. **Reduced token usage**: Only load tools when required
3. **Autonomous operation**: Agents manage their own tool ecosystem
4. **Secure sandbox**: Tool composition happens in isolated environments

### Security Features

Docker MCP Toolkit implements multiple security layers:

1. **Containerization**: Strong isolation limits blast radius
2. **Commit pinning**: Precise attribution and verification
3. **Automated auditing**: AI-powered code reviews
4. **Publisher trust levels**: Clear indicators of server origin
5. **Isolated execution**: MCP servers run in separate containers

### Using MCP with Docker Compose

Example `docker-compose.yml` for MCP servers:

```yaml
services:
  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - mcp-data:/data
    environment:
      - MCP_CATALOG_URL=https://hub.docker.com/mcp

  my-app:
    image: my-ai-app
    depends_on:
      - mcp-gateway
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000

volumes:
  mcp-data:
```

### Submitting MCP Servers

To contribute MCP servers to the Docker MCP Catalog:

1. Follow the [submission guidance](https://github.com/docker/mcp-registry/blob/main/CONTRIBUTING.md)
2. Submit to the [MCP Registry](https://github.com/docker/mcp-registry)
3. Servers undergo automated and manual review
4. Approved servers appear in the catalog with appropriate trust levels

---

## Common Use Cases

### 1. Local AI Development Environment

```yaml
services:
  dev-app:
    build: .
    models:
      - llm
      - embeddings
    depends_on:
      - mcp-gateway

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--log-colors"
  
  embeddings:
    model: ai/all-minilm
```

### 2. Multi-Model AI Application

```yaml
services:
  chat-service:
    image: my-chat-service
    models:
      chat-model:
        endpoint_var: CHAT_MODEL_URL
        model_var: CHAT_MODEL_NAME
      
  code-service:
    image: my-code-service
    models:
      code-model:
        endpoint_var: CODE_MODEL_URL
        model_var: CODE_MODEL_NAME

models:
  chat-model:
    model: ai/smollm2
    context_size: 2048
    runtime_flags:
      - "--temp"
      - "0.7"
  
  code-model:
    model: ai/codellama
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.2"
```

### 3. AI Agent with Dynamic MCP Tools

```yaml
services:
  ai-agent:
    image: my-ai-agent
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000
      - ENABLE_DYNAMIC_MCPS=true
    depends_on:
      - mcp-gateway
      - llm-service
    models:
      - agent-llm

  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - ./mcp-catalog.yml:/config/catalog.yml

models:
  agent-llm:
    model: ai/smollm2
    context_size: 4096
```

---

## Additional Resources

- [Docker AI Documentation](https://docs.docker.com/ai/)
- [Docker Compose Models Reference](https://docs.docker.com/ai/compose/models-and-compose/)
- [Docker Model Runner](https://docs.docker.com/ai/model-runner/)
- [Docker MCP Gateway (GitHub)](https://github.com/docker/mcp-gateway/)
- [Docker MCP Catalog](https://hub.docker.com/mcp)
- [MCP Registry](https://github.com/docker/mcp-registry)
- [Dynamic MCPs Blog Post](https://www.docker.com/blog/dynamic-mcps-stop-hardcoding-your-agents-world/)
- [MCP Security Blog Post](https://www.docker.com/blog/enhancing-mcp-trust-with-the-docker-mcp-catalog/)

---

**Last Updated**: December 2024