| # Docker AI: Deploying Local LLMs and MCP Servers | |
| This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers. | |
| ## Table of Contents | |
| - [Docker Compose Models](#docker-compose-models) | |
| - [Docker Model Runner](#docker-model-runner) | |
| - [Docker MCP Toolkit](#docker-mcp-toolkit) | |
| - [Common Use Cases](#common-use-cases) | |
| --- | |
| ## Docker Compose Models | |
| Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the `models` top-level element in your Compose files. | |
| ### Prerequisites | |
| - Docker Compose v2.38 or later | |
| - Docker Model Runner (DMR) or compatible cloud providers | |
| - For DMR: See [requirements](https://docs.docker.com/ai/model-runner/#requirements) | |
| ### Basic Model Definition | |
| Define models in your `docker-compose.yml`: | |
| ```yaml | |
| services: | |
| chat-app: | |
| image: my-chat-app | |
| models: | |
| - llm | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| ``` | |
| This configuration: | |
| - Defines a service `chat-app` that uses a model named `llm` | |
| - References the `ai/smollm2` model image from Docker Hub | |
| ### Model Configuration Options | |
| Configure models with various runtime parameters: | |
| ```yaml | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| context_size: 1024 | |
| runtime_flags: | |
| - "--a-flag" | |
| - "--another-flag=42" | |
| ``` | |
| **Key configuration options:** | |
| - **`model`** (required): OCI artifact identifier for the model | |
| - **`context_size`**: Maximum token context size (keep as small as feasible for your needs) | |
| - **`runtime_flags`**: Command-line flags passed to the inference engine (e.g., [llama.cpp parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md)) | |
| - **`x-*`**: Platform-specific extension attributes | |
| ### Service Model Binding | |
| #### Short Syntax | |
| The simplest way to bind models to services: | |
| ```yaml | |
| services: | |
| app: | |
| image: my-app | |
| models: | |
| - llm | |
| - embedding-model | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| embedding-model: | |
| model: ai/all-minilm | |
| ``` | |
| Auto-generated environment variables: | |
| - `LLM_URL` - URL to access the LLM model | |
| - `LLM_MODEL` - Model identifier | |
| - `EMBEDDING_MODEL_URL` - URL to access the embedding model | |
| - `EMBEDDING_MODEL_MODEL` - Model identifier | |
| #### Long Syntax | |
| Customize environment variable names: | |
| ```yaml | |
| services: | |
| app: | |
| image: my-app | |
| models: | |
| llm: | |
| endpoint_var: AI_MODEL_URL | |
| model_var: AI_MODEL_NAME | |
| embedding-model: | |
| endpoint_var: EMBEDDING_URL | |
| model_var: EMBEDDING_NAME | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| embedding-model: | |
| model: ai/all-minilm | |
| ``` | |
| ### Platform Portability | |
| #### Docker Model Runner | |
| When Docker Model Runner is enabled locally: | |
| ```yaml | |
| services: | |
| chat-app: | |
| image: my-chat-app | |
| models: | |
| llm: | |
| endpoint_var: AI_MODEL_URL | |
| model_var: AI_MODEL_NAME | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--no-prefill-assistant" | |
| ``` | |
| Docker Model Runner will: | |
| - Pull and run the model locally | |
| - Provide endpoint URLs | |
| - Inject environment variables into the service | |
| #### Cloud Providers | |
| The same Compose file works on cloud providers: | |
| ```yaml | |
| services: | |
| chat-app: | |
| image: my-chat-app | |
| models: | |
| - llm | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| # Cloud-specific configurations | |
| x-cloud-options: | |
| - "cloud.instance-type=gpu-small" | |
| - "cloud.region=us-west-2" | |
| ``` | |
| Cloud providers may: | |
| - Use managed AI services | |
| - Apply cloud-specific optimizations | |
| - Provide monitoring and logging | |
| - Handle model versioning automatically | |
| ### Common Runtime Configurations | |
| #### Development Mode | |
| ```yaml | |
| models: | |
| dev_model: | |
| model: ai/model | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--verbose" | |
| - "--verbose-prompt" | |
| - "--log-prefix" | |
| - "--log-timestamps" | |
| - "--log-colors" | |
| ``` | |
| #### Conservative (Disabled Reasoning) | |
| ```yaml | |
| models: | |
| conservative_model: | |
| model: ai/model | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--temp" | |
| - "0.1" | |
| - "--top-k" | |
| - "1" | |
| - "--reasoning-budget" | |
| - "0" | |
| ``` | |
| #### Creative (High Randomness) | |
| ```yaml | |
| models: | |
| creative_model: | |
| model: ai/model | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--temp" | |
| - "1" | |
| - "--top-p" | |
| - "0.9" | |
| ``` | |
| #### Highly Deterministic | |
| ```yaml | |
| models: | |
| deterministic_model: | |
| model: ai/model | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--temp" | |
| - "0" | |
| - "--top-k" | |
| - "1" | |
| ``` | |
| #### Concurrent Processing | |
| ```yaml | |
| models: | |
| concurrent_model: | |
| model: ai/model | |
| context_size: 2048 | |
| runtime_flags: | |
| - "--threads" | |
| - "8" | |
| - "--mlock" # Lock memory to prevent swapping | |
| ``` | |
| --- | |
| ## Docker Model Runner | |
| Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models. | |
| ### Key Features | |
| - **Local model execution**: Run models on your local machine | |
| - **GPU support**: Leverage local GPU resources | |
| - **Automatic model pulling**: Models are pulled from Docker Hub as needed | |
| - **OpenAI-compatible API**: Expose models via standard API endpoints | |
| ### Use Cases | |
| - **Development and testing**: Test AI applications locally before cloud deployment | |
| - **Privacy-sensitive workloads**: Keep data on-premises | |
| - **Offline development**: Work without internet connectivity | |
| - **Cost optimization**: Avoid cloud inference costs | |
| --- | |
| ## Docker MCP Toolkit | |
| The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources. | |
| ### What is MCP? | |
| Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to: | |
| - Interact with external tools and APIs | |
| - Access databases and services | |
| - Execute code in isolated environments | |
| - Retrieve real-world data | |
| ### Docker MCP Components | |
| #### 1. MCP Gateway | |
| The [Docker MCP Gateway](https://github.com/docker/mcp-gateway/) is the core open-source component that: | |
| - Manages MCP containers | |
| - Provides a unified endpoint for AI applications | |
| - Mediates between agents and MCP servers | |
| - Enables dynamic MCP discovery and configuration | |
| #### 2. MCP Catalog | |
| The [Docker MCP Catalog](https://hub.docker.com/mcp) is a curated collection of trusted, containerized MCP servers including: | |
| - **270+ curated servers** from publishers like Stripe, Elastic, Grafana | |
| - **Publisher trust levels**: Distinguish between official, verified, and community servers | |
| - **Commit pinning**: Each server tied to specific Git commits for verifiability | |
| - **AI-audited updates**: Automated reviews of code changes | |
| #### 3. MCP Toolkit (Docker Desktop) | |
| A management interface in Docker Desktop for: | |
| - Discovering MCP servers | |
| - Configuring and managing servers | |
| - One-click deployment | |
| - Centralized authentication | |
| ### Dynamic MCPs: Smart Search and Tool Composition | |
| Recent enhancements enable agents to dynamically discover and configure MCP servers: | |
| #### Smart Search Features | |
| **`mcp-find`**: Find MCP servers by name or description | |
| ``` | |
| Agent: "Find MCP servers for web searching" | |
| → Returns: DuckDuckGo MCP, Brave Search MCP, etc. | |
| ``` | |
| **`mcp-add`**: Add MCP servers to the current session | |
| ``` | |
| Agent: "Add the DuckDuckGo MCP server" | |
| → Server is pulled, configured, and made available | |
| ``` | |
| #### Benefits of Dynamic MCPs | |
| 1. **No manual configuration**: Agents discover and add tools as needed | |
| 2. **Reduced token usage**: Only load tools when required | |
| 3. **Autonomous operation**: Agents manage their own tool ecosystem | |
| 4. **Secure sandbox**: Tool composition happens in isolated environments | |
| ### Security Features | |
| Docker MCP Toolkit implements multiple security layers: | |
| 1. **Containerization**: Strong isolation limits blast radius | |
| 2. **Commit pinning**: Precise attribution and verification | |
| 3. **Automated auditing**: AI-powered code reviews | |
| 4. **Publisher trust levels**: Clear indicators of server origin | |
| 5. **Isolated execution**: MCP servers run in separate containers | |
| ### Using MCP with Docker Compose | |
| Example `docker-compose.yml` for MCP servers: | |
| ```yaml | |
| services: | |
| mcp-gateway: | |
| image: docker/mcp-gateway:latest | |
| ports: | |
| - "3000:3000" | |
| volumes: | |
| - mcp-data:/data | |
| environment: | |
| - MCP_CATALOG_URL=https://hub.docker.com/mcp | |
| my-app: | |
| image: my-ai-app | |
| depends_on: | |
| - mcp-gateway | |
| environment: | |
| - MCP_GATEWAY_URL=http://mcp-gateway:3000 | |
| volumes: | |
| mcp-data: | |
| ``` | |
| ### Submitting MCP Servers | |
| To contribute MCP servers to the Docker MCP Catalog: | |
| 1. Follow the [submission guidance](https://github.com/docker/mcp-registry/blob/main/CONTRIBUTING.md) | |
| 2. Submit to the [MCP Registry](https://github.com/docker/mcp-registry) | |
| 3. Servers undergo automated and manual review | |
| 4. Approved servers appear in the catalog with appropriate trust levels | |
| --- | |
| ## Common Use Cases | |
| ### 1. Local AI Development Environment | |
| ```yaml | |
| services: | |
| dev-app: | |
| build: . | |
| models: | |
| - llm | |
| - embeddings | |
| depends_on: | |
| - mcp-gateway | |
| mcp-gateway: | |
| image: docker/mcp-gateway:latest | |
| ports: | |
| - "3000:3000" | |
| models: | |
| llm: | |
| model: ai/smollm2 | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--verbose" | |
| - "--log-colors" | |
| embeddings: | |
| model: ai/all-minilm | |
| ``` | |
| ### 2. Multi-Model AI Application | |
| ```yaml | |
| services: | |
| chat-service: | |
| image: my-chat-service | |
| models: | |
| chat-model: | |
| endpoint_var: CHAT_MODEL_URL | |
| model_var: CHAT_MODEL_NAME | |
| code-service: | |
| image: my-code-service | |
| models: | |
| code-model: | |
| endpoint_var: CODE_MODEL_URL | |
| model_var: CODE_MODEL_NAME | |
| models: | |
| chat-model: | |
| model: ai/smollm2 | |
| context_size: 2048 | |
| runtime_flags: | |
| - "--temp" | |
| - "0.7" | |
| code-model: | |
| model: ai/codellama | |
| context_size: 4096 | |
| runtime_flags: | |
| - "--temp" | |
| - "0.2" | |
| ``` | |
| ### 3. AI Agent with Dynamic MCP Tools | |
| ```yaml | |
| services: | |
| ai-agent: | |
| image: my-ai-agent | |
| environment: | |
| - MCP_GATEWAY_URL=http://mcp-gateway:3000 | |
| - ENABLE_DYNAMIC_MCPS=true | |
| depends_on: | |
| - mcp-gateway | |
| - llm-service | |
| models: | |
| - agent-llm | |
| mcp-gateway: | |
| image: docker/mcp-gateway:latest | |
| ports: | |
| - "3000:3000" | |
| volumes: | |
| - ./mcp-catalog.yml:/config/catalog.yml | |
| models: | |
| agent-llm: | |
| model: ai/smollm2 | |
| context_size: 4096 | |
| ``` | |
| --- | |
| ## Additional Resources | |
| - [Docker AI Documentation](https://docs.docker.com/ai/) | |
| - [Docker Compose Models Reference](https://docs.docker.com/ai/compose/models-and-compose/) | |
| - [Docker Model Runner](https://docs.docker.com/ai/model-runner/) | |
| - [Docker MCP Gateway (GitHub)](https://github.com/docker/mcp-gateway/) | |
| - [Docker MCP Catalog](https://hub.docker.com/mcp) | |
| - [MCP Registry](https://github.com/docker/mcp-registry) | |
| - [Dynamic MCPs Blog Post](https://www.docker.com/blog/dynamic-mcps-stop-hardcoding-your-agents-world/) | |
| - [MCP Security Blog Post](https://www.docker.com/blog/enhancing-mcp-trust-with-the-docker-mcp-catalog/) | |
| --- | |
| **Last Updated**: December 2024 | |