# Docker AI: Deploying Local LLMs and MCP Servers This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers. ## Table of Contents - [Docker Compose Models](#docker-compose-models) - [Docker Model Runner](#docker-model-runner) - [Docker MCP Toolkit](#docker-mcp-toolkit) - [Common Use Cases](#common-use-cases) --- ## Docker Compose Models Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the `models` top-level element in your Compose files. ### Prerequisites - Docker Compose v2.38 or later - Docker Model Runner (DMR) or compatible cloud providers - For DMR: See [requirements](https://docs.docker.com/ai/model-runner/#requirements) ### Basic Model Definition Define models in your `docker-compose.yml`: ```yaml services: chat-app: image: my-chat-app models: - llm models: llm: model: ai/smollm2 ``` This configuration: - Defines a service `chat-app` that uses a model named `llm` - References the `ai/smollm2` model image from Docker Hub ### Model Configuration Options Configure models with various runtime parameters: ```yaml models: llm: model: ai/smollm2 context_size: 1024 runtime_flags: - "--a-flag" - "--another-flag=42" ``` **Key configuration options:** - **`model`** (required): OCI artifact identifier for the model - **`context_size`**: Maximum token context size (keep as small as feasible for your needs) - **`runtime_flags`**: Command-line flags passed to the inference engine (e.g., [llama.cpp parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md)) - **`x-*`**: Platform-specific extension attributes ### Service Model Binding #### Short Syntax The simplest way to bind models to services: ```yaml services: app: image: my-app models: - llm - embedding-model models: llm: model: ai/smollm2 embedding-model: model: ai/all-minilm ``` Auto-generated environment variables: - `LLM_URL` - URL to access the LLM model - `LLM_MODEL` - Model identifier - `EMBEDDING_MODEL_URL` - URL to access the embedding model - `EMBEDDING_MODEL_MODEL` - Model identifier #### Long Syntax Customize environment variable names: ```yaml services: app: image: my-app models: llm: endpoint_var: AI_MODEL_URL model_var: AI_MODEL_NAME embedding-model: endpoint_var: EMBEDDING_URL model_var: EMBEDDING_NAME models: llm: model: ai/smollm2 embedding-model: model: ai/all-minilm ``` ### Platform Portability #### Docker Model Runner When Docker Model Runner is enabled locally: ```yaml services: chat-app: image: my-chat-app models: llm: endpoint_var: AI_MODEL_URL model_var: AI_MODEL_NAME models: llm: model: ai/smollm2 context_size: 4096 runtime_flags: - "--no-prefill-assistant" ``` Docker Model Runner will: - Pull and run the model locally - Provide endpoint URLs - Inject environment variables into the service #### Cloud Providers The same Compose file works on cloud providers: ```yaml services: chat-app: image: my-chat-app models: - llm models: llm: model: ai/smollm2 # Cloud-specific configurations x-cloud-options: - "cloud.instance-type=gpu-small" - "cloud.region=us-west-2" ``` Cloud providers may: - Use managed AI services - Apply cloud-specific optimizations - Provide monitoring and logging - Handle model versioning automatically ### Common Runtime Configurations #### Development Mode ```yaml models: dev_model: model: ai/model context_size: 4096 runtime_flags: - "--verbose" - "--verbose-prompt" - "--log-prefix" - "--log-timestamps" - "--log-colors" ``` #### Conservative (Disabled Reasoning) ```yaml models: conservative_model: model: ai/model context_size: 4096 runtime_flags: - "--temp" - "0.1" - "--top-k" - "1" - "--reasoning-budget" - "0" ``` #### Creative (High Randomness) ```yaml models: creative_model: model: ai/model context_size: 4096 runtime_flags: - "--temp" - "1" - "--top-p" - "0.9" ``` #### Highly Deterministic ```yaml models: deterministic_model: model: ai/model context_size: 4096 runtime_flags: - "--temp" - "0" - "--top-k" - "1" ``` #### Concurrent Processing ```yaml models: concurrent_model: model: ai/model context_size: 2048 runtime_flags: - "--threads" - "8" - "--mlock" # Lock memory to prevent swapping ``` --- ## Docker Model Runner Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models. ### Key Features - **Local model execution**: Run models on your local machine - **GPU support**: Leverage local GPU resources - **Automatic model pulling**: Models are pulled from Docker Hub as needed - **OpenAI-compatible API**: Expose models via standard API endpoints ### Use Cases - **Development and testing**: Test AI applications locally before cloud deployment - **Privacy-sensitive workloads**: Keep data on-premises - **Offline development**: Work without internet connectivity - **Cost optimization**: Avoid cloud inference costs --- ## Docker MCP Toolkit The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources. ### What is MCP? Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to: - Interact with external tools and APIs - Access databases and services - Execute code in isolated environments - Retrieve real-world data ### Docker MCP Components #### 1. MCP Gateway The [Docker MCP Gateway](https://github.com/docker/mcp-gateway/) is the core open-source component that: - Manages MCP containers - Provides a unified endpoint for AI applications - Mediates between agents and MCP servers - Enables dynamic MCP discovery and configuration #### 2. MCP Catalog The [Docker MCP Catalog](https://hub.docker.com/mcp) is a curated collection of trusted, containerized MCP servers including: - **270+ curated servers** from publishers like Stripe, Elastic, Grafana - **Publisher trust levels**: Distinguish between official, verified, and community servers - **Commit pinning**: Each server tied to specific Git commits for verifiability - **AI-audited updates**: Automated reviews of code changes #### 3. MCP Toolkit (Docker Desktop) A management interface in Docker Desktop for: - Discovering MCP servers - Configuring and managing servers - One-click deployment - Centralized authentication ### Dynamic MCPs: Smart Search and Tool Composition Recent enhancements enable agents to dynamically discover and configure MCP servers: #### Smart Search Features **`mcp-find`**: Find MCP servers by name or description ``` Agent: "Find MCP servers for web searching" → Returns: DuckDuckGo MCP, Brave Search MCP, etc. ``` **`mcp-add`**: Add MCP servers to the current session ``` Agent: "Add the DuckDuckGo MCP server" → Server is pulled, configured, and made available ``` #### Benefits of Dynamic MCPs 1. **No manual configuration**: Agents discover and add tools as needed 2. **Reduced token usage**: Only load tools when required 3. **Autonomous operation**: Agents manage their own tool ecosystem 4. **Secure sandbox**: Tool composition happens in isolated environments ### Security Features Docker MCP Toolkit implements multiple security layers: 1. **Containerization**: Strong isolation limits blast radius 2. **Commit pinning**: Precise attribution and verification 3. **Automated auditing**: AI-powered code reviews 4. **Publisher trust levels**: Clear indicators of server origin 5. **Isolated execution**: MCP servers run in separate containers ### Using MCP with Docker Compose Example `docker-compose.yml` for MCP servers: ```yaml services: mcp-gateway: image: docker/mcp-gateway:latest ports: - "3000:3000" volumes: - mcp-data:/data environment: - MCP_CATALOG_URL=https://hub.docker.com/mcp my-app: image: my-ai-app depends_on: - mcp-gateway environment: - MCP_GATEWAY_URL=http://mcp-gateway:3000 volumes: mcp-data: ``` ### Submitting MCP Servers To contribute MCP servers to the Docker MCP Catalog: 1. Follow the [submission guidance](https://github.com/docker/mcp-registry/blob/main/CONTRIBUTING.md) 2. Submit to the [MCP Registry](https://github.com/docker/mcp-registry) 3. Servers undergo automated and manual review 4. Approved servers appear in the catalog with appropriate trust levels --- ## Common Use Cases ### 1. Local AI Development Environment ```yaml services: dev-app: build: . models: - llm - embeddings depends_on: - mcp-gateway mcp-gateway: image: docker/mcp-gateway:latest ports: - "3000:3000" models: llm: model: ai/smollm2 context_size: 4096 runtime_flags: - "--verbose" - "--log-colors" embeddings: model: ai/all-minilm ``` ### 2. Multi-Model AI Application ```yaml services: chat-service: image: my-chat-service models: chat-model: endpoint_var: CHAT_MODEL_URL model_var: CHAT_MODEL_NAME code-service: image: my-code-service models: code-model: endpoint_var: CODE_MODEL_URL model_var: CODE_MODEL_NAME models: chat-model: model: ai/smollm2 context_size: 2048 runtime_flags: - "--temp" - "0.7" code-model: model: ai/codellama context_size: 4096 runtime_flags: - "--temp" - "0.2" ``` ### 3. AI Agent with Dynamic MCP Tools ```yaml services: ai-agent: image: my-ai-agent environment: - MCP_GATEWAY_URL=http://mcp-gateway:3000 - ENABLE_DYNAMIC_MCPS=true depends_on: - mcp-gateway - llm-service models: - agent-llm mcp-gateway: image: docker/mcp-gateway:latest ports: - "3000:3000" volumes: - ./mcp-catalog.yml:/config/catalog.yml models: agent-llm: model: ai/smollm2 context_size: 4096 ``` --- ## Additional Resources - [Docker AI Documentation](https://docs.docker.com/ai/) - [Docker Compose Models Reference](https://docs.docker.com/ai/compose/models-and-compose/) - [Docker Model Runner](https://docs.docker.com/ai/model-runner/) - [Docker MCP Gateway (GitHub)](https://github.com/docker/mcp-gateway/) - [Docker MCP Catalog](https://hub.docker.com/mcp) - [MCP Registry](https://github.com/docker/mcp-registry) - [Dynamic MCPs Blog Post](https://www.docker.com/blog/dynamic-mcps-stop-hardcoding-your-agents-world/) - [MCP Security Blog Post](https://www.docker.com/blog/enhancing-mcp-trust-with-the-docker-mcp-catalog/) --- **Last Updated**: December 2024