KCH / docs /docker-ai.md
bsamadi's picture
Update to pixi env
c032460
# Docker AI: Deploying Local LLMs and MCP Servers
This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.
## Table of Contents
- [Docker Compose Models](#docker-compose-models)
- [Docker Model Runner](#docker-model-runner)
- [Docker MCP Toolkit](#docker-mcp-toolkit)
- [Common Use Cases](#common-use-cases)
---
## Docker Compose Models
Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the `models` top-level element in your Compose files.
### Prerequisites
- Docker Compose v2.38 or later
- Docker Model Runner (DMR) or compatible cloud providers
- For DMR: See [requirements](https://docs.docker.com/ai/model-runner/#requirements)
### Basic Model Definition
Define models in your `docker-compose.yml`:
```yaml
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
```
This configuration:
- Defines a service `chat-app` that uses a model named `llm`
- References the `ai/smollm2` model image from Docker Hub
### Model Configuration Options
Configure models with various runtime parameters:
```yaml
models:
llm:
model: ai/smollm2
context_size: 1024
runtime_flags:
- "--a-flag"
- "--another-flag=42"
```
**Key configuration options:**
- **`model`** (required): OCI artifact identifier for the model
- **`context_size`**: Maximum token context size (keep as small as feasible for your needs)
- **`runtime_flags`**: Command-line flags passed to the inference engine (e.g., [llama.cpp parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md))
- **`x-*`**: Platform-specific extension attributes
### Service Model Binding
#### Short Syntax
The simplest way to bind models to services:
```yaml
services:
app:
image: my-app
models:
- llm
- embedding-model
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
```
Auto-generated environment variables:
- `LLM_URL` - URL to access the LLM model
- `LLM_MODEL` - Model identifier
- `EMBEDDING_MODEL_URL` - URL to access the embedding model
- `EMBEDDING_MODEL_MODEL` - Model identifier
#### Long Syntax
Customize environment variable names:
```yaml
services:
app:
image: my-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
embedding-model:
endpoint_var: EMBEDDING_URL
model_var: EMBEDDING_NAME
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
```
### Platform Portability
#### Docker Model Runner
When Docker Model Runner is enabled locally:
```yaml
services:
chat-app:
image: my-chat-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
models:
llm:
model: ai/smollm2
context_size: 4096
runtime_flags:
- "--no-prefill-assistant"
```
Docker Model Runner will:
- Pull and run the model locally
- Provide endpoint URLs
- Inject environment variables into the service
#### Cloud Providers
The same Compose file works on cloud providers:
```yaml
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
# Cloud-specific configurations
x-cloud-options:
- "cloud.instance-type=gpu-small"
- "cloud.region=us-west-2"
```
Cloud providers may:
- Use managed AI services
- Apply cloud-specific optimizations
- Provide monitoring and logging
- Handle model versioning automatically
### Common Runtime Configurations
#### Development Mode
```yaml
models:
dev_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--verbose"
- "--verbose-prompt"
- "--log-prefix"
- "--log-timestamps"
- "--log-colors"
```
#### Conservative (Disabled Reasoning)
```yaml
models:
conservative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "0.1"
- "--top-k"
- "1"
- "--reasoning-budget"
- "0"
```
#### Creative (High Randomness)
```yaml
models:
creative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "1"
- "--top-p"
- "0.9"
```
#### Highly Deterministic
```yaml
models:
deterministic_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp"
- "0"
- "--top-k"
- "1"
```
#### Concurrent Processing
```yaml
models:
concurrent_model:
model: ai/model
context_size: 2048
runtime_flags:
- "--threads"
- "8"
- "--mlock" # Lock memory to prevent swapping
```
---
## Docker Model Runner
Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.
### Key Features
- **Local model execution**: Run models on your local machine
- **GPU support**: Leverage local GPU resources
- **Automatic model pulling**: Models are pulled from Docker Hub as needed
- **OpenAI-compatible API**: Expose models via standard API endpoints
### Use Cases
- **Development and testing**: Test AI applications locally before cloud deployment
- **Privacy-sensitive workloads**: Keep data on-premises
- **Offline development**: Work without internet connectivity
- **Cost optimization**: Avoid cloud inference costs
---
## Docker MCP Toolkit
The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.
### What is MCP?
Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:
- Interact with external tools and APIs
- Access databases and services
- Execute code in isolated environments
- Retrieve real-world data
### Docker MCP Components
#### 1. MCP Gateway
The [Docker MCP Gateway](https://github.com/docker/mcp-gateway/) is the core open-source component that:
- Manages MCP containers
- Provides a unified endpoint for AI applications
- Mediates between agents and MCP servers
- Enables dynamic MCP discovery and configuration
#### 2. MCP Catalog
The [Docker MCP Catalog](https://hub.docker.com/mcp) is a curated collection of trusted, containerized MCP servers including:
- **270+ curated servers** from publishers like Stripe, Elastic, Grafana
- **Publisher trust levels**: Distinguish between official, verified, and community servers
- **Commit pinning**: Each server tied to specific Git commits for verifiability
- **AI-audited updates**: Automated reviews of code changes
#### 3. MCP Toolkit (Docker Desktop)
A management interface in Docker Desktop for:
- Discovering MCP servers
- Configuring and managing servers
- One-click deployment
- Centralized authentication
### Dynamic MCPs: Smart Search and Tool Composition
Recent enhancements enable agents to dynamically discover and configure MCP servers:
#### Smart Search Features
**`mcp-find`**: Find MCP servers by name or description
```
Agent: "Find MCP servers for web searching"
→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.
```
**`mcp-add`**: Add MCP servers to the current session
```
Agent: "Add the DuckDuckGo MCP server"
→ Server is pulled, configured, and made available
```
#### Benefits of Dynamic MCPs
1. **No manual configuration**: Agents discover and add tools as needed
2. **Reduced token usage**: Only load tools when required
3. **Autonomous operation**: Agents manage their own tool ecosystem
4. **Secure sandbox**: Tool composition happens in isolated environments
### Security Features
Docker MCP Toolkit implements multiple security layers:
1. **Containerization**: Strong isolation limits blast radius
2. **Commit pinning**: Precise attribution and verification
3. **Automated auditing**: AI-powered code reviews
4. **Publisher trust levels**: Clear indicators of server origin
5. **Isolated execution**: MCP servers run in separate containers
### Using MCP with Docker Compose
Example `docker-compose.yml` for MCP servers:
```yaml
services:
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
volumes:
- mcp-data:/data
environment:
- MCP_CATALOG_URL=https://hub.docker.com/mcp
my-app:
image: my-ai-app
depends_on:
- mcp-gateway
environment:
- MCP_GATEWAY_URL=http://mcp-gateway:3000
volumes:
mcp-data:
```
### Submitting MCP Servers
To contribute MCP servers to the Docker MCP Catalog:
1. Follow the [submission guidance](https://github.com/docker/mcp-registry/blob/main/CONTRIBUTING.md)
2. Submit to the [MCP Registry](https://github.com/docker/mcp-registry)
3. Servers undergo automated and manual review
4. Approved servers appear in the catalog with appropriate trust levels
---
## Common Use Cases
### 1. Local AI Development Environment
```yaml
services:
dev-app:
build: .
models:
- llm
- embeddings
depends_on:
- mcp-gateway
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
models:
llm:
model: ai/smollm2
context_size: 4096
runtime_flags:
- "--verbose"
- "--log-colors"
embeddings:
model: ai/all-minilm
```
### 2. Multi-Model AI Application
```yaml
services:
chat-service:
image: my-chat-service
models:
chat-model:
endpoint_var: CHAT_MODEL_URL
model_var: CHAT_MODEL_NAME
code-service:
image: my-code-service
models:
code-model:
endpoint_var: CODE_MODEL_URL
model_var: CODE_MODEL_NAME
models:
chat-model:
model: ai/smollm2
context_size: 2048
runtime_flags:
- "--temp"
- "0.7"
code-model:
model: ai/codellama
context_size: 4096
runtime_flags:
- "--temp"
- "0.2"
```
### 3. AI Agent with Dynamic MCP Tools
```yaml
services:
ai-agent:
image: my-ai-agent
environment:
- MCP_GATEWAY_URL=http://mcp-gateway:3000
- ENABLE_DYNAMIC_MCPS=true
depends_on:
- mcp-gateway
- llm-service
models:
- agent-llm
mcp-gateway:
image: docker/mcp-gateway:latest
ports:
- "3000:3000"
volumes:
- ./mcp-catalog.yml:/config/catalog.yml
models:
agent-llm:
model: ai/smollm2
context_size: 4096
```
---
## Additional Resources
- [Docker AI Documentation](https://docs.docker.com/ai/)
- [Docker Compose Models Reference](https://docs.docker.com/ai/compose/models-and-compose/)
- [Docker Model Runner](https://docs.docker.com/ai/model-runner/)
- [Docker MCP Gateway (GitHub)](https://github.com/docker/mcp-gateway/)
- [Docker MCP Catalog](https://hub.docker.com/mcp)
- [MCP Registry](https://github.com/docker/mcp-registry)
- [Dynamic MCPs Blog Post](https://www.docker.com/blog/dynamic-mcps-stop-hardcoding-your-agents-world/)
- [MCP Security Blog Post](https://www.docker.com/blog/enhancing-mcp-trust-with-the-docker-mcp-catalog/)
---
**Last Updated**: December 2024