Spaces:

KashiAI
/

KCH

Sleeping

App Files Files Community

KCH / docs /docker-ai.md

bsamadi

Update to pixi env

c032460 about 2 months ago

preview code

raw

history blame contribute delete

11.1 kB

	# Docker AI: Deploying Local LLMs and MCP Servers

	This guide covers the latest ways to use Docker for deploying local Large Language Model (LLM) models and Model Context Protocol (MCP) servers.

	## Table of Contents

	- [Docker Compose Models](#docker-compose-models)
	- [Docker Model Runner](#docker-model-runner)
	- [Docker MCP Toolkit](#docker-mcp-toolkit)
	- [Common Use Cases](#common-use-cases)

	---

	## Docker Compose Models

	Docker Compose v2.38+ introduces a standardized way to define AI model dependencies using the `models` top-level element in your Compose files.

	### Prerequisites

	- Docker Compose v2.38 or later
	- Docker Model Runner (DMR) or compatible cloud providers
	- For DMR: See [requirements](https://docs.docker.com/ai/model-runner/#requirements)

	### Basic Model Definition

	Define models in your `docker-compose.yml`:

	```yaml
	services:
	chat-app:
	image: my-chat-app
	models:
	- llm

	models:
	llm:
	model: ai/smollm2
	```

	This configuration:
	- Defines a service `chat-app` that uses a model named `llm`
	- References the `ai/smollm2` model image from Docker Hub

	### Model Configuration Options

	Configure models with various runtime parameters:

	```yaml
	models:
	llm:
	model: ai/smollm2
	context_size: 1024
	runtime_flags:
	- "--a-flag"
	- "--another-flag=42"
	```

	Key configuration options:

	- `model` (required): OCI artifact identifier for the model
	- `context_size`: Maximum token context size (keep as small as feasible for your needs)
	- `runtime_flags`: Command-line flags passed to the inference engine (e.g., [llama.cpp parameters](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md))
	- *`x-`**: Platform-specific extension attributes

	### Service Model Binding

	#### Short Syntax

	The simplest way to bind models to services:

	```yaml
	services:
	app:
	image: my-app
	models:
	- llm
	- embedding-model

	models:
	llm:
	model: ai/smollm2
	embedding-model:
	model: ai/all-minilm
	```

	Auto-generated environment variables:
	- `LLM_URL` - URL to access the LLM model
	- `LLM_MODEL` - Model identifier
	- `EMBEDDING_MODEL_URL` - URL to access the embedding model
	- `EMBEDDING_MODEL_MODEL` - Model identifier

	#### Long Syntax

	Customize environment variable names:

	```yaml
	services:
	app:
	image: my-app
	models:
	llm:
	endpoint_var: AI_MODEL_URL
	model_var: AI_MODEL_NAME
	embedding-model:
	endpoint_var: EMBEDDING_URL
	model_var: EMBEDDING_NAME

	models:
	llm:
	model: ai/smollm2
	embedding-model:
	model: ai/all-minilm
	```

	### Platform Portability

	#### Docker Model Runner

	When Docker Model Runner is enabled locally:

	```yaml
	services:
	chat-app:
	image: my-chat-app
	models:
	llm:
	endpoint_var: AI_MODEL_URL
	model_var: AI_MODEL_NAME

	models:
	llm:
	model: ai/smollm2
	context_size: 4096
	runtime_flags:
	- "--no-prefill-assistant"
	```

	Docker Model Runner will:
	- Pull and run the model locally
	- Provide endpoint URLs
	- Inject environment variables into the service

	#### Cloud Providers

	The same Compose file works on cloud providers:

	```yaml
	services:
	chat-app:
	image: my-chat-app
	models:
	- llm

	models:
	llm:
	model: ai/smollm2
	# Cloud-specific configurations
	x-cloud-options:
	- "cloud.instance-type=gpu-small"
	- "cloud.region=us-west-2"
	```

	Cloud providers may:
	- Use managed AI services
	- Apply cloud-specific optimizations
	- Provide monitoring and logging
	- Handle model versioning automatically

	### Common Runtime Configurations

	#### Development Mode

	```yaml
	models:
	dev_model:
	model: ai/model
	context_size: 4096
	runtime_flags:
	- "--verbose"
	- "--verbose-prompt"
	- "--log-prefix"
	- "--log-timestamps"
	- "--log-colors"
	```

	#### Conservative (Disabled Reasoning)

	```yaml
	models:
	conservative_model:
	model: ai/model
	context_size: 4096
	runtime_flags:
	- "--temp"
	- "0.1"
	- "--top-k"
	- "1"
	- "--reasoning-budget"
	- "0"
	```

	#### Creative (High Randomness)

	```yaml
	models:
	creative_model:
	model: ai/model
	context_size: 4096
	runtime_flags:
	- "--temp"
	- "1"
	- "--top-p"
	- "0.9"
	```

	#### Highly Deterministic

	```yaml
	models:
	deterministic_model:
	model: ai/model
	context_size: 4096
	runtime_flags:
	- "--temp"
	- "0"
	- "--top-k"
	- "1"
	```

	#### Concurrent Processing

	```yaml
	models:
	concurrent_model:
	model: ai/model
	context_size: 2048
	runtime_flags:
	- "--threads"
	- "8"
	- "--mlock" # Lock memory to prevent swapping
	```

	---

	## Docker Model Runner

	Docker Model Runner (DMR) enables running AI models locally with minimal setup. It integrates seamlessly with Docker Compose models.

	### Key Features

	- Local model execution: Run models on your local machine
	- GPU support: Leverage local GPU resources
	- Automatic model pulling: Models are pulled from Docker Hub as needed
	- OpenAI-compatible API: Expose models via standard API endpoints

	### Use Cases

	- Development and testing: Test AI applications locally before cloud deployment
	- Privacy-sensitive workloads: Keep data on-premises
	- Offline development: Work without internet connectivity
	- Cost optimization: Avoid cloud inference costs

	---

	## Docker MCP Toolkit

	The Docker MCP (Model Context Protocol) Toolkit provides a secure, standardized way to connect AI agents to external tools and data sources.

	### What is MCP?

	Model Context Protocol (MCP) is an open, client-server protocol that standardizes how applications provide context and functionality to Large Language Models. It allows AI agents to:

	- Interact with external tools and APIs
	- Access databases and services
	- Execute code in isolated environments
	- Retrieve real-world data

	### Docker MCP Components

	#### 1. MCP Gateway

	The [Docker MCP Gateway](https://github.com/docker/mcp-gateway/) is the core open-source component that:

	- Manages MCP containers
	- Provides a unified endpoint for AI applications
	- Mediates between agents and MCP servers
	- Enables dynamic MCP discovery and configuration

	#### 2. MCP Catalog

	The [Docker MCP Catalog](https://hub.docker.com/mcp) is a curated collection of trusted, containerized MCP servers including:

	- 270+ curated servers from publishers like Stripe, Elastic, Grafana
	- Publisher trust levels: Distinguish between official, verified, and community servers
	- Commit pinning: Each server tied to specific Git commits for verifiability
	- AI-audited updates: Automated reviews of code changes

	#### 3. MCP Toolkit (Docker Desktop)

	A management interface in Docker Desktop for:

	- Discovering MCP servers
	- Configuring and managing servers
	- One-click deployment
	- Centralized authentication

	### Dynamic MCPs: Smart Search and Tool Composition

	Recent enhancements enable agents to dynamically discover and configure MCP servers:

	#### Smart Search Features

	`mcp-find`: Find MCP servers by name or description
	```
	Agent: "Find MCP servers for web searching"
	→ Returns: DuckDuckGo MCP, Brave Search MCP, etc.
	```

	`mcp-add`: Add MCP servers to the current session
	```
	Agent: "Add the DuckDuckGo MCP server"
	→ Server is pulled, configured, and made available
	```

	#### Benefits of Dynamic MCPs

	1. No manual configuration: Agents discover and add tools as needed
	2. Reduced token usage: Only load tools when required
	3. Autonomous operation: Agents manage their own tool ecosystem
	4. Secure sandbox: Tool composition happens in isolated environments

	### Security Features

	Docker MCP Toolkit implements multiple security layers:

	1. Containerization: Strong isolation limits blast radius
	2. Commit pinning: Precise attribution and verification
	3. Automated auditing: AI-powered code reviews
	4. Publisher trust levels: Clear indicators of server origin
	5. Isolated execution: MCP servers run in separate containers

	### Using MCP with Docker Compose

	Example `docker-compose.yml` for MCP servers:

	```yaml
	services:
	mcp-gateway:
	image: docker/mcp-gateway:latest
	ports:
	- "3000:3000"
	volumes:
	- mcp-data:/data
	environment:
	- MCP_CATALOG_URL=https://hub.docker.com/mcp

	my-app:
	image: my-ai-app
	depends_on:
	- mcp-gateway
	environment:
	- MCP_GATEWAY_URL=http://mcp-gateway:3000

	volumes:
	mcp-data:
	```

	### Submitting MCP Servers

	To contribute MCP servers to the Docker MCP Catalog:

	1. Follow the [submission guidance](https://github.com/docker/mcp-registry/blob/main/CONTRIBUTING.md)
	2. Submit to the [MCP Registry](https://github.com/docker/mcp-registry)
	3. Servers undergo automated and manual review
	4. Approved servers appear in the catalog with appropriate trust levels

	---

	## Common Use Cases

	### 1. Local AI Development Environment

	```yaml
	services:
	dev-app:
	build: .
	models:
	- llm
	- embeddings
	depends_on:
	- mcp-gateway

	mcp-gateway:
	image: docker/mcp-gateway:latest
	ports:
	- "3000:3000"

	models:
	llm:
	model: ai/smollm2
	context_size: 4096
	runtime_flags:
	- "--verbose"
	- "--log-colors"

	embeddings:
	model: ai/all-minilm
	```

	### 2. Multi-Model AI Application

	```yaml
	services:
	chat-service:
	image: my-chat-service
	models:
	chat-model:
	endpoint_var: CHAT_MODEL_URL
	model_var: CHAT_MODEL_NAME

	code-service:
	image: my-code-service
	models:
	code-model:
	endpoint_var: CODE_MODEL_URL
	model_var: CODE_MODEL_NAME

	models:
	chat-model:
	model: ai/smollm2
	context_size: 2048
	runtime_flags:
	- "--temp"
	- "0.7"

	code-model:
	model: ai/codellama
	context_size: 4096
	runtime_flags:
	- "--temp"
	- "0.2"
	```

	### 3. AI Agent with Dynamic MCP Tools

	```yaml
	services:
	ai-agent:
	image: my-ai-agent
	environment:
	- MCP_GATEWAY_URL=http://mcp-gateway:3000
	- ENABLE_DYNAMIC_MCPS=true
	depends_on:
	- mcp-gateway
	- llm-service
	models:
	- agent-llm

	mcp-gateway:
	image: docker/mcp-gateway:latest
	ports:
	- "3000:3000"
	volumes:
	- ./mcp-catalog.yml:/config/catalog.yml

	models:
	agent-llm:
	model: ai/smollm2
	context_size: 4096
	```

	---

	## Additional Resources

	- [Docker AI Documentation](https://docs.docker.com/ai/)
	- [Docker Compose Models Reference](https://docs.docker.com/ai/compose/models-and-compose/)
	- [Docker Model Runner](https://docs.docker.com/ai/model-runner/)
	- [Docker MCP Gateway (GitHub)](https://github.com/docker/mcp-gateway/)
	- [Docker MCP Catalog](https://hub.docker.com/mcp)
	- [MCP Registry](https://github.com/docker/mcp-registry)
	- [Dynamic MCPs Blog Post](https://www.docker.com/blog/dynamic-mcps-stop-hardcoding-your-agents-world/)
	- [MCP Security Blog Post](https://www.docker.com/blog/enhancing-mcp-trust-with-the-docker-mcp-catalog/)

	---

	Last Updated: December 2024