Docker AI: Deploying Local LLMs and MCP Servers
This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.
Table of Contents
Docker Compose Models
Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the models top-level element in your Compose files.
Prerequisites
- Docker Compose v2.38 or later
- Docker Model Runner (DMR) or compatible cloud providers
- For DMR: See requirements
Basic Model Definition
Define models in your docker-compose.yml:
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
This configuration:
- Defines a service
chat-appthat uses a model namedllm - References the
ai/smollm2model image from Docker Hub
Model Configuration Options
Configure models with various runtime parameters:
models:
llm:
model: ai/smollm2
context_size: 1024
runtime_flags:
- "--a-flag"
- "--another-flag=42"
Key configuration options:
model(required): OCI artifact identifier for the modelcontext_size: Maximum token context size (keep as small as feasible for your needs)runtime_flags: Command-line flags passed to the inference engine (e.g., llama.cpp parameters)x-*: Platform-specific extension attributes
Service Model Binding
Short Syntax
The simplest way to bind models to services:
services:
app:
image: my-app
models:
- llm
- embedding-model
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
Auto-generated environment variables:
LLM_URL- URL to access the LLM modelLLM_MODEL- Model identifierEMBEDDING_MODEL_URL- URL to access the embedding modelEMBEDDING_MODEL_MODEL- Model identifier
Long Syntax
Customize environment variable names:
services:
app:
image: my-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
embedding-model:
endpoint_var: EMBEDDING_URL
model_var: EMBEDDING_NAME
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
Platform Portability
Docker Model Runner
When Docker Model Runner is enabled locally:
services:
chat-app:
image: my-chat-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
models:
llm:
model: ai/smollm2
context_size: 4096
runtime_flags:
- "--no-prefill-assistant"
Docker Model Runner will:
- Pull and run the model locally
- Provide endpoint URLs
- Inject environment variables into the service
Cloud Providers
The same Compose file works on cloud providers:
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
# Cloud-specific configurations
x-cloud-options:
- "cloud.instance-type=gpu-small"
- "cloud.region=us-west-2"
Cloud providers may:
- Use managed AI services
- Apply cloud-specific optimizations
- Provide monitoring and logging
- Handle model versioning automatically
Common Runtime Configurations
Development Mode
models:
dev_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--verbose"
- "--verbose-prompt"
- "--log-prefix"
- "--log-timestamps"
- "--log-colors"
Conservative (Disabled Reasoning)
models:
conservative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "0.1"
- "--top-k"
- "1"
- "--reasoning-budget"
- "0"
Creative (High Randomness)
models:
creative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "1"
- "--top-p"
- "0.9"
Highly Deterministic
models:
deterministic_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "0"
- "--top-k"
- "1"
Concurrent Processing
models:
concurrent_model:
model: ai/model
context_size: 2048
runtime_flags:
- "--threads"
- "8"
- "--mlock" # Lock memory to prevent swapping
Docker Model Runner
Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.
Key Features
- Local model execution: Run models on your local machine
- GPU support: Leverage local GPU resources
- Automatic model pulling: Models are pulled from Docker Hub as needed
- OpenAI-compatible API: Expose models via standard API endpoints
Use Cases
- Development and testing: Test AI applications locally before cloud deployment
- Privacy-sensitive workloads: Keep data on-premises
- Offline development: Work without internet connectivity
- Cost optimization: Avoid cloud inference costs
Docker MCP Toolkit
The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.
What is MCP?
Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:
- Interact with external tools and APIs
- Access databases and services
- Execute code in isolated environments
- Retrieve real-world data
Docker MCP Components
1. MCP Gateway
The Docker MCP Gateway is the core open-source component that:
- Manages MCP containers
- Provides a unified endpoint for AI applications
- Mediates between agents and MCP servers
- Enables dynamic MCP discovery and configuration
2. MCP Catalog
The Docker MCP Catalog is a curated collection of trusted, containerized MCP servers including:
- 270+ curated servers from publishers like Stripe, Elastic, Grafana
- Publisher trust levels: Distinguish between official, verified, and community servers
- Commit pinning: Each server tied to specific Git commits for verifiability
- AI-audited updates: Automated reviews of code changes
3. MCP Toolkit (Docker Desktop)
A management interface in Docker Desktop for:
- Discovering MCP servers
- Configuring and managing servers
- One-click deployment
- Centralized authentication
Dynamic MCPs: Smart Search and Tool Composition
Recent enhancements enable agents to dynamically discover and configure MCP servers:
Smart Search Features
mcp-find: Find MCP servers by name or description
Agent: "Find MCP servers for web searching"
→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.
mcp-add: Add MCP servers to the current session
Agent: "Add the DuckDuckGo MCP server"
→ Server is pulled, configured, and made available
Benefits of Dynamic MCPs
- No manual configuration: Agents discover and add tools as needed
- Reduced token usage: Only load tools when required
- Autonomous operation: Agents manage their own tool ecosystem
- Secure sandbox: Tool composition happens in isolated environments
Security Features
Docker MCP Toolkit implements multiple security layers:
- Containerization: Strong isolation limits blast radius
- Commit pinning: Precise attribution and verification
- Automated auditing: AI-powered code reviews
- Publisher trust levels: Clear indicators of server origin
- Isolated execution: MCP servers run in separate containers
Using MCP with Docker Compose
Example docker-compose.yml for MCP servers:
services:
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
volumes:
- mcp-data:/data
environment:
- MCP_CATALOG_URL=https://hub.docker.com/mcp
my-app:
image: my-ai-app
depends_on:
- mcp-gateway
environment:
- MCP_GATEWAY_URL=http://mcp-gateway:3000
volumes:
mcp-data:
Submitting MCP Servers
To contribute MCP servers to the Docker MCP Catalog:
- Follow the submission guidance
- Submit to the MCP Registry
- Servers undergo automated and manual review
- Approved servers appear in the catalog with appropriate trust levels
Common Use Cases
1. Local AI Development Environment
services:
dev-app:
build: .
models:
- llm
- embeddings
depends_on:
- mcp-gateway
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
models:
llm:
model: ai/smollm2
context_size: 4096
runtime_flags:
- "--verbose"
- "--log-colors"
embeddings:
model: ai/all-minilm
2. Multi-Model AI Application
services:
chat-service:
image: my-chat-service
models:
chat-model:
endpoint_var: CHAT_MODEL_URL
model_var: CHAT_MODEL_NAME
code-service:
image: my-code-service
models:
code-model:
endpoint_var: CODE_MODEL_URL
model_var: CODE_MODEL_NAME
models:
chat-model:
model: ai/smollm2
context_size: 2048
runtime_flags:
- "--temp"
- "0.7"
code-model:
model: ai/codellama
context_size: 4096
runtime_flags:
- "--temp"
- "0.2"
3. AI Agent with Dynamic MCP Tools
services:
ai-agent:
image: my-ai-agent
environment:
- MCP_GATEWAY_URL=http://mcp-gateway:3000
- ENABLE_DYNAMIC_MCPS=true
depends_on:
- mcp-gateway
- llm-service
models:
- agent-llm
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
volumes:
- ./mcp-catalog.yml:/config/catalog.yml
models:
agent-llm:
model: ai/smollm2
context_size: 4096
Additional Resources
- Docker AI Documentation
- Docker Compose Models Reference
- Docker Model Runner
- Docker MCP Gateway (GitHub)
- Docker MCP Catalog
- MCP Registry
- Dynamic MCPs Blog Post
- MCP Security Blog Post
Last Updated: December 2024