KCH / docs /learning-path.md
bsamadi's picture
Update to pixi env
c032460

Learning Path: Building AI-Powered CLI Tools with Python

A structured learning path for developers with basic Python knowledge who want to build AI-powered CLI tools using modern development practices.

🎯 Prerequisites

What You Should Know:

  • Basic Python syntax (variables, functions, loops, conditionals)
  • How to run Python scripts from the command line
  • Basic understanding of files and directories
  • Familiarity with text editors or IDEs

What You'll Learn:

  • Building professional CLI applications
  • Integrating AI/LLM capabilities
  • Modern Python package management with pixi
  • AI-assisted development with GitHub Copilot
  • Package publishing and distribution

πŸ“š Learning Phases

Phase 1: Foundation Setup (Week 1)

1.1 Development Environment Setup

Install Required Tools:

# Install pixi (cross-platform package manager)
curl -fsSL https://pixi.sh/install.sh | bash

# Verify installation
pixi --version

# Install Git (if not already installed)
# Linux: sudo apt install git
# macOS: brew install git
# Windows: Download from git-scm.com

Set Up GitHub Copilot:

  1. Install VS Code or your preferred IDE
  2. Install GitHub Copilot extension
  3. Sign in with your GitHub account (requires Copilot subscription)
  4. Complete the Copilot quickstart tutorial

Resources:

1.2 Understanding Modern Python Project Structure

Learn About:

  • Project organization (src layout vs flat layout)
  • Virtual environments and dependency isolation
  • Configuration files (pyproject.toml, pixi.toml)
  • Version control with Git

Hands-On Exercise: Create a simple "Hello World" project with pixi:

# Create new project
pixi init my-first-cli
cd my-first-cli

# Add Python dependency
pixi add python

# Create a simple script
mkdir src
echo 'print("Hello from pixi!")' > src/hello.py

# Run it
pixi run python src/hello.py

Use Copilot to:

  • Generate a .gitignore file for Python projects
  • Create a basic README.md template
  • Write docstrings for your functions

Phase 2: CLI Development Fundamentals (Week 2-3)

2.1 Building Your First CLI with Typer

Learning Objectives:

  • Understand command-line argument parsing with type hints
  • Create commands, options, and flags using Python types
  • Handle user input and validation
  • Display formatted output with Rich integration

Project: Simple File Organizer CLI

Note: This is a simplified version for learning CLI basics. For a comprehensive, production-ready example that integrates Docker AI, MCP servers, and multi-agent systems, see the FileOrganizer project in Phase 7.

# Initialize project with pixi
pixi init file-organizer-cli
cd file-organizer-cli

# Add dependencies
pixi add python typer rich

# Create project structure
mkdir -p src/file_organizer
touch src/file_organizer/__init__.py
touch src/file_organizer/cli.py

Example CLI Structure (use Copilot to help generate):

# src/file_organizer/cli.py
import typer
from pathlib import Path
from rich.console import Console
from typing import Optional

app = typer.Typer(help="File organizer CLI tool")
console = Console()

@app.command()
def organize(
    directory: Path = typer.Argument(..., help="Directory to organize", exists=True),
    dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes without executing"),
    verbose: bool = typer.Option(False, "--verbose", "-v", help="Show detailed output")
):
    """Organize files in DIRECTORY by extension."""
    if verbose:
        console.print(f"[blue]Organizing files in: {directory}[/blue]")
    
    # Use Copilot to generate the organization logic
    if dry_run:
        console.print("[yellow]DRY RUN - No changes will be made[/yellow]")
    
    pass

@app.command()
def stats(directory: Path = typer.Argument(..., exists=True)):
    """Show statistics about files in DIRECTORY."""
    # Use Copilot to generate statistics logic
    pass

if __name__ == '__main__':
    app()

Copilot Prompts to Try:

  • "Create a function to organize files by extension using pathlib"
  • "Add error handling for file operations with try-except"
  • "Generate help text and docstrings for CLI commands"
  • "Add progress bar using rich library for file processing"

Resources:

2.2 Configuration and Settings Management

Learn About:

  • Reading configuration files (YAML, TOML, JSON)
  • Environment variables
  • User preferences and defaults
  • Configuration validation with Pydantic

Add to Your Project:

# Add configuration dependencies
pixi add pydantic pyyaml python-dotenv

Use Copilot to Generate:

  • Configuration schema with Pydantic
  • Config file loader functions
  • Environment variable handling

Phase 3: AI Integration Basics (Week 4-5)

3.1 Understanding HuggingFace and LLM APIs

Learning Objectives:

  • API authentication and token management
  • Using HuggingFace Inference API and local models
  • Making API requests with transformers and huggingface_hub
  • Handling streaming responses
  • Error handling and rate limiting

Project: Add AI Capabilities to Your CLI

# Add AI dependencies
pixi add transformers huggingface-hub python-dotenv

# For local inference (optional)
pixi add torch

# Create .env file for API keys
echo "HUGGINGFACE_TOKEN=your-token-here" > .env
echo ".env" >> .gitignore

Simple AI Integration Example:

# src/file_organizer/ai_helper.py
from huggingface_hub import InferenceClient
import os
from dotenv import load_dotenv

load_dotenv()

def suggest_organization_strategy(file_list: list[str]) -> str:
    """Use AI to suggest file organization strategy."""
    client = InferenceClient(token=os.getenv("HUGGINGFACE_TOKEN"))
    
    prompt = f"""Given these files: {', '.join(file_list)}
    
Suggest an intelligent organization strategy. Group related files and explain your reasoning.
Respond in JSON format."""
    
    # Use a free model like Mistral or Llama
    response = client.text_generation(
        prompt,
        model="mistralai/Mistral-7B-Instruct-v0.2",
        max_new_tokens=500,
        temperature=0.7
    )
    
    return response

# Alternative: Using local models with transformers
from transformers import pipeline

def analyze_file_content_local(content: str) -> str:
    """Analyze file content using a local model."""
    # Use Copilot to complete this function
    # Prompt: "Create a function that uses a local HuggingFace model 
    # to analyze and categorize file content"
    
    classifier = pipeline(
        "text-classification",
        model="distilbert-base-uncased-finetuned-sst-2-english"
    )
    
    result = classifier(content[:512])  # Truncate for model limits
    return result

Copilot Exercises:

  • "Create a function to summarize file contents using HuggingFace models"
  • "Add retry logic for API failures with exponential backoff"
  • "Implement streaming response handler for long-form generation"
  • "Create a model selector that chooses between local and API inference"

Resources:

Popular Models to Try:

  • Text Generation: mistralai/Mistral-7B-Instruct-v0.2, meta-llama/Llama-2-7b-chat-hf
  • Summarization: facebook/bart-large-cnn, google/pegasus-xsum
  • Classification: distilbert-base-uncased, roberta-base
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2

Local vs API Inference:

# src/file_organizer/inference.py
from typing import Literal
import os

class AIHelper:
    """Flexible AI helper supporting both local and API inference."""
    
    def __init__(self, mode: Literal["local", "api"] = "api"):
        self.mode = mode
        
        if mode == "api":
            from huggingface_hub import InferenceClient
            self.client = InferenceClient(token=os.getenv("HUGGINGFACE_TOKEN"))
        else:
            from transformers import pipeline
            # Load model once at initialization
            self.pipeline = pipeline(
                "text-generation",
                model="distilgpt2",  # Smaller model for local use
                device=-1  # CPU, use 0 for GPU
            )
    
    def generate(self, prompt: str) -> str:
        """Generate text using configured mode."""
        if self.mode == "api":
            return self.client.text_generation(
                prompt,
                model="mistralai/Mistral-7B-Instruct-v0.2",
                max_new_tokens=500
            )
        else:
            result = self.pipeline(prompt, max_new_tokens=100)
            return result[0]['generated_text']

# Usage in CLI
# Use Copilot: "Add a --local flag to switch between API and local inference"

When to Use Each:

  • API Inference: Better quality, larger models, no local resources needed, requires internet
  • Local Inference: Privacy, offline use, no API costs, but requires more RAM/GPU
  • vLLM Server: Best of both worlds - local privacy with high performance and OpenAI-compatible API

Advanced: Serving Local Models with vLLM

vLLM is a high-performance inference engine that can serve local models with significantly better throughput and lower latency than standard transformers.

# Install vLLM (requires GPU for best performance)
pixi add vllm

# Or install with specific CUDA version
pixi add "vllm[cuda12]"

Starting a vLLM Server:

# Start vLLM server with a model
# This creates an OpenAI-compatible API endpoint
vllm serve mistralai/Mistral-7B-Instruct-v0.2 \
    --host 0.0.0.0 \
    --port 8000 \
    --max-model-len 4096

# For smaller GPUs, use quantized models
vllm serve TheBloke/Mistral-7B-Instruct-v0.2-GPTQ \
    --quantization gptq \
    --dtype half

Using vLLM Server in Your CLI:

# src/file_organizer/vllm_client.py
from openai import OpenAI
from typing import Optional

class vLLMClient:
    """Client for vLLM server with OpenAI-compatible API."""
    
    def __init__(self, base_url: str = "http://localhost:8000/v1"):
        # vLLM provides OpenAI-compatible endpoints
        self.client = OpenAI(
            base_url=base_url,
            api_key="not-needed"  # vLLM doesn't require API key
        )
    
    def generate(
        self, 
        prompt: str, 
        model: str = "mistralai/Mistral-7B-Instruct-v0.2",
        max_tokens: int = 500,
        temperature: float = 0.7
    ) -> str:
        """Generate text using vLLM server."""
        response = self.client.completions.create(
            model=model,
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature
        )
        return response.choices[0].text
    
    def chat_generate(
        self,
        messages: list[dict],
        model: str = "mistralai/Mistral-7B-Instruct-v0.2",
        max_tokens: int = 500
    ) -> str:
        """Generate using chat completion format."""
        response = self.client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content

# Usage in your CLI
def suggest_organization_with_vllm(file_list: list[str]) -> str:
    """Use local vLLM server for suggestions."""
    client = vLLMClient()
    
    messages = [
        {"role": "system", "content": "You are a file organization assistant."},
        {"role": "user", "content": f"Organize these files: {', '.join(file_list)}"}
    ]
    
    return client.chat_generate(messages)

Complete Inference Strategy:

# src/file_organizer/ai_strategy.py
from typing import Literal
import os
from enum import Enum

class InferenceMode(str, Enum):
    """Available inference modes."""
    API = "api"              # HuggingFace Inference API
    LOCAL = "local"          # Direct transformers
    VLLM = "vllm"           # vLLM server
    AUTO = "auto"           # Auto-detect best option

class UnifiedAIClient:
    """Unified client supporting multiple inference backends."""
    
    def __init__(self, mode: InferenceMode = InferenceMode.AUTO):
        self.mode = self._resolve_mode(mode)
        self._setup_client()
    
    def _resolve_mode(self, mode: InferenceMode) -> InferenceMode:
        """Auto-detect best available mode."""
        if mode != InferenceMode.AUTO:
            return mode
        
        # Check if vLLM server is running
        try:
            import requests
            requests.get("http://localhost:8000/health", timeout=1)
            return InferenceMode.VLLM
        except:
            pass
        
        # Check if HuggingFace token is available
        if os.getenv("HUGGINGFACE_TOKEN"):
            return InferenceMode.API
        
        # Fall back to local
        return InferenceMode.LOCAL
    
    def _setup_client(self):
        """Initialize the appropriate client."""
        if self.mode == InferenceMode.VLLM:
            from openai import OpenAI
            self.client = OpenAI(
                base_url="http://localhost:8000/v1",
                api_key="not-needed"
            )
        elif self.mode == InferenceMode.API:
            from huggingface_hub import InferenceClient
            self.client = InferenceClient(token=os.getenv("HUGGINGFACE_TOKEN"))
        else:  # LOCAL
            from transformers import pipeline
            self.client = pipeline("text-generation", model="distilgpt2")
    
    def generate(self, prompt: str, **kwargs) -> str:
        """Generate text using configured backend."""
        if self.mode == InferenceMode.VLLM:
            response = self.client.completions.create(
                model="mistralai/Mistral-7B-Instruct-v0.2",
                prompt=prompt,
                max_tokens=kwargs.get("max_tokens", 500)
            )
            return response.choices[0].text
        
        elif self.mode == InferenceMode.API:
            return self.client.text_generation(
                prompt,
                model="mistralai/Mistral-7B-Instruct-v0.2",
                max_new_tokens=kwargs.get("max_tokens", 500)
            )
        
        else:  # LOCAL
            result = self.client(prompt, max_new_tokens=kwargs.get("max_tokens", 100))
            return result[0]['generated_text']

# Use in CLI with Typer
import typer

@app.command()
def organize(
    directory: Path,
    inference_mode: InferenceMode = typer.Option(
        InferenceMode.AUTO,
        "--mode",
        help="Inference mode: api, local, vllm, or auto"
    )
):
    """Organize files using AI."""
    ai_client = UnifiedAIClient(mode=inference_mode)
    # Use ai_client.generate() for suggestions

vLLM Performance Tips:

  1. GPU Memory: Use --gpu-memory-utilization 0.9 to maximize GPU usage
  2. Batch Size: vLLM automatically batches requests for better throughput
  3. Quantization: Use GPTQ or AWQ quantized models for lower memory usage
  4. Tensor Parallelism: For multi-GPU: --tensor-parallel-size 2

Docker Compose for vLLM (Optional):

# docker-compose.vllm.yml
version: '3.8'

services:
  vllm:
    image: vllm/vllm-openai:latest
    ports:
      - "8000:8000"
    environment:
      - MODEL=mistralai/Mistral-7B-Instruct-v0.2
      - MAX_MODEL_LEN=4096
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    command: >
      --host 0.0.0.0
      --port 8000
      --model ${MODEL}
      --max-model-len ${MAX_MODEL_LEN}

Comparison:

Feature HF API Transformers vLLM
Setup Easy Easy Medium
Speed Fast Slow Very Fast
Cost Pay per use Free Free (local)
GPU Required No Optional Recommended
Offline No Yes Yes
Batch Processing Limited Poor Excellent
Memory Efficient N/A No Yes
OpenAI Compatible No No Yes

Recommended Workflow:

  1. Development: Use HuggingFace API for quick prototyping
  2. Testing: Use vLLM locally for faster iteration
  3. Production: Deploy vLLM server for best performance and privacy

3.2 Docker-Based Model Deployment

Docker provides a modern, standardized way to deploy local LLM models with minimal configuration using Docker Compose v2.38+.

Why Use Docker for AI Models?

  • Consistent environments: Same setup across development, testing, and production
  • Easy deployment: One command to start models and services
  • Resource isolation: Models run in containers with defined resource limits
  • Portability: Works locally with Docker Model Runner or on cloud providers
  • Version control: Pin specific model versions with OCI artifacts

Prerequisites:

# Ensure Docker Compose v2.38 or later
docker compose version

# Enable Docker Model Runner in Docker Desktop settings
# Or install separately: https://docs.docker.com/ai/model-runner/

Basic Model Deployment with Docker Compose:

Create a docker-compose.yml for your CLI project:

# docker-compose.yml
services:
  # Your CLI application
  file-organizer:
    build: .
    models:
      - llm  # Reference to the model defined below
    environment:
      # Auto-injected by Docker:
      # LLM_URL - endpoint to access the model
      # LLM_MODEL - model identifier
    volumes:
      - ./data:/app/data

models:
  llm:
    model: ai/smollm2  # Model from Docker Hub
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--log-colors"

Using Models in Your Python CLI:

# src/file_organizer/docker_ai.py
import os
from openai import OpenAI

class DockerModelClient:
    """Client for Docker-deployed models with OpenAI-compatible API."""
    
    def __init__(self):
        # Docker automatically injects these environment variables
        model_url = os.getenv("LLM_URL")
        model_name = os.getenv("LLM_MODEL")
        
        if not model_url:
            raise ValueError("LLM_URL not set. Are you running with Docker Compose?")
        
        # Docker models provide OpenAI-compatible endpoints
        self.client = OpenAI(
            base_url=model_url,
            api_key="not-needed"  # Docker models don't require API keys
        )
        self.model_name = model_name
    
    def generate(self, prompt: str, max_tokens: int = 500) -> str:
        """Generate text using Docker-deployed model."""
        response = self.client.completions.create(
            model=self.model_name,
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=0.7
        )
        return response.choices[0].text
    
    def chat_generate(self, messages: list[dict], max_tokens: int = 500) -> str:
        """Generate using chat completion format."""
        response = self.client.chat.completions.create(
            model=self.model_name,
            messages=messages,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content

# Usage in your CLI
import typer

@app.command()
def organize(directory: Path):
    """Organize files using Docker-deployed AI model."""
    try:
        ai_client = DockerModelClient()
        # Use the model for suggestions
        suggestion = ai_client.generate(f"Organize these files: {list(directory.iterdir())}")
        console.print(suggestion)
    except ValueError as e:
        console.print(f"[red]Error: {e}[/red]")
        console.print("[yellow]Run with: docker compose up[/yellow]")

Multi-Model Setup:

Deploy multiple models for different tasks:

services:
  file-organizer:
    build: .
    models:
      chat-model:
        endpoint_var: CHAT_MODEL_URL
        model_var: CHAT_MODEL_NAME
      embeddings:
        endpoint_var: EMBEDDING_URL
        model_var: EMBEDDING_NAME

models:
  chat-model:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.7"
  
  embeddings:
    model: ai/all-minilm
    context_size: 512

Model Configuration Presets:

# Development mode - verbose logging
models:
  dev_model:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--verbose"
      - "--verbose-prompt"
      - "--log-timestamps"
      - "--log-colors"

# Production mode - deterministic output
models:
  prod_model:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.1"  # Low temperature for consistency
      - "--top-k"
      - "1"

# Creative mode - high randomness
models:
  creative_model:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "1.0"
      - "--top-p"
      - "0.9"

Running Your Dockerized CLI:

# Start models and services
docker compose up -d

# Check model status
docker compose ps

# View model logs
docker compose logs llm

# Run your CLI (models are available via environment variables)
docker compose exec file-organizer python -m file_organizer organize ./data

# Stop everything
docker compose down

Complete Example Dockerfile:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY pyproject.toml .
RUN pip install -e .

# Copy application code
COPY src/ ./src/

# The CLI will use environment variables injected by Docker Compose
CMD ["python", "-m", "file_organizer.cli"]

Benefits of Docker Deployment:

Feature Docker Compose Manual Setup
Setup Time Minutes Hours
Consistency βœ… Same everywhere ❌ Varies by system
Resource Control βœ… Built-in limits ⚠️ Manual config
Multi-model βœ… Easy ❌ Complex
Cloud Portability βœ… Same config ❌ Rewrite needed
Version Control βœ… Git-friendly ⚠️ Documentation

Cloud Deployment:

The same docker-compose.yml works on cloud providers with extensions:

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    # Cloud-specific options (provider-dependent)
    x-cloud-options:
      - "cloud.instance-type=gpu-small"
      - "cloud.region=us-west-2"
      - "cloud.auto-scaling=true"

Resources:

3.3 Docker MCP Toolkit: Secure Tool Integration

The Model Context Protocol (MCP) provides a standardized way for AI agents to interact with external tools and data sources. Docker's MCP Toolkit makes this secure and easy.

What is MCP?

MCP is an open protocol that allows AI models to:

  • Execute code in isolated environments
  • Access databases and APIs securely
  • Use external tools (web search, calculators, etc.)
  • Retrieve real-world data

Why Docker MCP?

  1. Security: Tools run in isolated containers
  2. Trust: Curated catalog with publisher verification
  3. Simplicity: One-click deployment from Docker Desktop
  4. Dynamic Discovery: Agents find and add tools as needed

Docker MCP Components:

# docker-compose.yml with MCP Gateway
services:
  # Your AI-powered CLI
  file-organizer:
    build: .
    models:
      - llm
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000
    depends_on:
      - mcp-gateway
  
  # MCP Gateway - manages MCP servers
  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - mcp-data:/data
    environment:
      - MCP_CATALOG_URL=https://hub.docker.com/mcp

models:
  llm:
    model: ai/smollm2
    context_size: 4096

volumes:
  mcp-data:

Using MCP in Your CLI:

# src/file_organizer/mcp_client.py
import os
import requests
from typing import Any

class MCPClient:
    """Client for Docker MCP Gateway."""
    
    def __init__(self):
        self.gateway_url = os.getenv("MCP_GATEWAY_URL", "http://localhost:3000")
    
    def find_servers(self, query: str) -> list[dict]:
        """Find MCP servers by name or description."""
        response = requests.post(
            f"{self.gateway_url}/mcp-find",
            json={"query": query}
        )
        return response.json()["servers"]
    
    def add_server(self, server_name: str) -> dict:
        """Add an MCP server to the current session."""
        response = requests.post(
            f"{self.gateway_url}/mcp-add",
            json={"server": server_name}
        )
        return response.json()
    
    def call_tool(self, server: str, tool: str, params: dict) -> Any:
        """Call a tool from an MCP server."""
        response = requests.post(
            f"{self.gateway_url}/mcp-call",
            json={
                "server": server,
                "tool": tool,
                "parameters": params
            }
        )
        return response.json()["result"]

# Example: Web search integration
@app.command()
def research(topic: str):
    """Research a topic using web search MCP."""
    mcp = MCPClient()
    
    # Find web search servers
    servers = mcp.find_servers("web search")
    console.print(f"Found {len(servers)} search servers")
    
    # Add DuckDuckGo MCP
    mcp.add_server("duckduckgo-mcp")
    
    # Use the search tool
    results = mcp.call_tool(
        server="duckduckgo-mcp",
        tool="search",
        params={"query": topic, "max_results": 5}
    )
    
    # Display results
    for result in results:
        console.print(f"[bold]{result['title']}[/bold]")
        console.print(f"  {result['url']}")
        console.print(f"  {result['snippet']}\n")

Dynamic MCP Discovery:

Let AI agents discover and use tools automatically:

# src/file_organizer/ai_agent.py
from openai import OpenAI
import json

class AIAgentWithMCP:
    """AI agent that can discover and use MCP tools."""
    
    def __init__(self):
        self.llm = OpenAI(base_url=os.getenv("LLM_URL"), api_key="not-needed")
        self.mcp = MCPClient()
        self.available_tools = []
    
    def discover_tools(self, task_description: str):
        """Ask LLM what tools are needed for a task."""
        prompt = f"""Task: {task_description}
        
        What MCP tools would be helpful? Respond with JSON:
        {{"tools": ["tool-name-1", "tool-name-2"]}}
        """
        
        response = self.llm.completions.create(
            model=os.getenv("LLM_MODEL"),
            prompt=prompt,
            max_tokens=200
        )
        
        tools_needed = json.loads(response.choices[0].text)
        
        # Add each tool
        for tool in tools_needed["tools"]:
            servers = self.mcp.find_servers(tool)
            if servers:
                self.mcp.add_server(servers[0]["name"])
                self.available_tools.append(servers[0])
    
    def execute_task(self, task: str):
        """Execute a task using available tools."""
        # First, discover what tools we need
        self.discover_tools(task)
        
        # Then execute with those tools
        # (Implementation depends on your specific use case)
        pass

# Usage
@app.command()
def smart_organize(directory: Path, strategy: str):
    """Organize files using AI with dynamic tool discovery."""
    agent = AIAgentWithMCP()
    
    task = f"Organize files in {directory} using strategy: {strategy}"
    agent.execute_task(task)

Available MCP Servers:

The Docker MCP Catalog includes 270+ servers:

  • Web Search: DuckDuckGo, Brave Search
  • Databases: PostgreSQL, MongoDB, Elasticsearch
  • APIs: Stripe, GitHub, Slack
  • Monitoring: Grafana, Prometheus
  • File Systems: Local files, S3, Google Drive
  • Development: Git, Docker, Kubernetes

Security Features:

  1. Container Isolation: Each MCP server runs in its own container
  2. Commit Pinning: Servers tied to specific Git commits
  3. Publisher Trust Levels: Official, verified, and community servers
  4. AI-Audited Updates: Automated code review for changes
  5. Resource Limits: CPU and memory constraints per server

Complete Example with MCP:

# docker-compose.yml - Full AI CLI with MCP
services:
  file-organizer:
    build: .
    models:
      - llm
    environment:
      - MCP_GATEWAY_URL=http://mcp-gateway:3000
      - ENABLE_DYNAMIC_MCPS=true
    depends_on:
      - mcp-gateway
    volumes:
      - ./data:/app/data
  
  mcp-gateway:
    image: docker/mcp-gateway:latest
    ports:
      - "3000:3000"
    volumes:
      - mcp-data:/data
      - ./mcp-config.yml:/config/catalog.yml

models:
  llm:
    model: ai/smollm2
    context_size: 4096
    runtime_flags:
      - "--temp"
      - "0.7"

volumes:
  mcp-data:

MCP Best Practices:

  1. Start with trusted servers: Use official and verified publishers
  2. Enable only needed tools: Reduce attack surface
  3. Monitor MCP usage: Track which tools are called
  4. Set resource limits: Prevent runaway processes
  5. Review permissions: Understand what each MCP can access

Resources:

3.4 Prompt Engineering for CLI Tools

Learn About:

  • Crafting effective prompts for different model types
  • Understanding model-specific prompt formats (Mistral, Llama, etc.)
  • System vs user messages (for chat models)
  • Few-shot learning examples
  • Prompt templates and variables

Hands-On: Create a prompt template system:

# src/file_organizer/prompts.py

# For instruction-tuned models like Mistral
MISTRAL_ORGANIZATION_PROMPT = """[INST] You are a helpful file organization assistant.

Given the following list of files:
{file_list}

Suggest an intelligent organization strategy that:
1. Groups related files together
2. Creates meaningful folder names
3. Explains the reasoning

Respond in JSON format with this structure:
{{
  "strategy": "description",
  "folders": [
    {{"name": "folder_name", "files": ["file1", "file2"], "reason": "why"}}
  ]
}} [/INST]"""

# For Llama-2 chat models
LLAMA_SYSTEM_PROMPT = """You are a helpful file organization assistant. 
Always respond in valid JSON format."""

def format_llama_prompt(user_message: str) -> str:
    """Format prompt for Llama-2 chat models."""
    return f"""<s>[INST] <<SYS>>
{LLAMA_SYSTEM_PROMPT}
<</SYS>>

{user_message} [/INST]"""

# For general models without special formatting
GENERIC_PROMPT_TEMPLATE = """Task: Organize the following files intelligently.

Files: {file_list}

Instructions:
- Group related files together
- Suggest meaningful folder names
- Explain your reasoning
- Output as JSON

Response:"""

# Use Copilot to generate more prompt templates for different tasks

Model-Specific Considerations:

# src/file_organizer/model_config.py

MODEL_CONFIGS = {
    "mistralai/Mistral-7B-Instruct-v0.2": {
        "max_tokens": 8192,
        "prompt_format": "mistral",
        "temperature": 0.7,
        "use_case": "general instruction following"
    },
    "meta-llama/Llama-2-7b-chat-hf": {
        "max_tokens": 4096,
        "prompt_format": "llama2",
        "temperature": 0.7,
        "use_case": "conversational tasks"
    },
    "facebook/bart-large-cnn": {
        "max_tokens": 1024,
        "prompt_format": "none",
        "use_case": "summarization only"
    }
}

def get_model_config(model_name: str) -> dict:
    """Get configuration for a specific model."""
    return MODEL_CONFIGS.get(model_name, {})

Copilot Prompts:

  • "Create a function to format prompts based on model type"
  • "Generate few-shot examples for file categorization"
  • "Build a prompt validator that checks token limits"
  • "Create a prompt optimization function that reduces token usage"

Phase 4: Advanced CLI Features (Week 6-7)

4.1 Interactive CLI Elements

Add Dependencies:

pixi add questionary rich typer

Learn to Build:

  • Interactive prompts and menus
  • Progress bars and spinners
  • Tables and formatted output
  • Color-coded messages

Example with Copilot:

# Ask Copilot: "Create an interactive menu using questionary 
# to select file organization options"

import questionary
from rich.progress import track

def interactive_organize():
    # Copilot will help generate this
    pass

4.2 Batch Processing and Async Operations

Learn About:

  • Processing multiple files efficiently
  • Async/await for concurrent API calls
  • Rate limiting and throttling
  • Progress tracking for long operations
# Add async dependencies
pixi add aiohttp asyncio

Copilot Exercise:

  • "Create an async function to process multiple files with OpenAI API"
  • "Add rate limiting to prevent API quota exhaustion"
  • "Implement a queue system for batch processing"

Phase 5: Testing and Quality (Week 8)

5.1 Writing Tests

Add Testing Dependencies:

pixi add pytest pytest-cov pytest-asyncio pytest-mock

Learn to Test:

  • Unit tests for individual functions
  • Integration tests for CLI commands
  • Mocking API calls
  • Test coverage reporting

Example Test Structure:

# tests/test_cli.py
import pytest
from typer.testing import CliRunner
from file_organizer.cli import app

runner = CliRunner()

def test_organize_command():
    # Use Copilot to generate test cases
    result = runner.invoke(app, ['organize', 'test_dir', '--dry-run'])
    assert result.exit_code == 0
    assert "DRY RUN" in result.stdout

def test_organize_with_verbose():
    result = runner.invoke(app, ['organize', 'test_dir', '--verbose'])
    assert result.exit_code == 0
    
def test_stats_command():
    result = runner.invoke(app, ['stats', 'test_dir'])
    assert result.exit_code == 0

Copilot Prompts:

  • "Generate pytest fixtures for mocking HuggingFace Inference API"
  • "Create test cases for error handling with API timeouts"
  • "Write integration tests for the organize command"
  • "Mock transformers pipeline for local model testing"

Example Mocking HuggingFace:

# tests/conftest.py
import pytest
from unittest.mock import Mock, patch

@pytest.fixture
def mock_hf_client():
    """Mock HuggingFace InferenceClient."""
    with patch('huggingface_hub.InferenceClient') as mock:
        mock_instance = Mock()
        mock_instance.text_generation.return_value = '{"strategy": "test"}'
        mock.return_value = mock_instance
        yield mock_instance

@pytest.fixture
def mock_transformers_pipeline():
    """Mock transformers pipeline for local models."""
    with patch('transformers.pipeline') as mock:
        mock_pipeline = Mock()
        mock_pipeline.return_value = [{"label": "POSITIVE", "score": 0.99}]
        mock.return_value = mock_pipeline
        yield mock_pipeline

5.2 Code Quality Tools

# Add quality tools
pixi add ruff mypy black isort

Set Up:

  • Linting with ruff
  • Type checking with mypy
  • Code formatting with black
  • Import sorting with isort

Create pyproject.toml configuration (use Copilot):

[tool.ruff]
line-length = 100
target-version = "py311"

[tool.mypy]
python_version = "3.11"
strict = true

[tool.black]
line-length = 100

Phase 6: Package Publishing with Pixi (Week 9)

6.1 Preparing for Publication

Project Structure:

my-cli-tool/
β”œβ”€β”€ pixi.toml              # Pixi configuration
β”œβ”€β”€ pyproject.toml         # Python package metadata
β”œβ”€β”€ README.md              # Documentation
β”œβ”€β”€ LICENSE                # License file
β”œβ”€β”€ src/
β”‚   └── my_cli_tool/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ cli.py
β”‚       └── ...
β”œβ”€β”€ tests/
β”‚   └── test_*.py
└── docs/
    └── ...

Configure pyproject.toml for Publishing:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "my-cli-tool"
version = "0.1.0"
description = "AI-powered file organization CLI"
authors = [{name = "Your Name", email = "you@example.com"}]
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "typer>=0.9",
    "rich>=13.0",
    "transformers>=4.30",
    "huggingface-hub>=0.16",
]

[project.scripts]
my-cli = "my_cli_tool.cli:cli"

[project.urls]
Homepage = "https://github.com/yourusername/my-cli-tool"
Documentation = "https://my-cli-tool.readthedocs.io"

Use Copilot to:

  • Generate comprehensive README with usage examples
  • Create CHANGELOG.md
  • Write contributing guidelines
  • Generate documentation

6.2 Building and Publishing

Build Package:

# Add build tools
pixi add hatchling build twine

# Build the package
pixi run python -m build

# This creates:
# - dist/my_cli_tool-0.1.0.tar.gz
# - dist/my_cli_tool-0.1.0-py3-none-any.whl

Publish to PyPI:

# Test on TestPyPI first
pixi run twine upload --repository testpypi dist/*

# Then publish to PyPI
pixi run twine upload dist/*

Publish as Pixi Package:

# Create pixi.toml with package metadata
pixi project init --name my-cli-tool

# Add to pixi.toml:
[project]
name = "my-cli-tool"
version = "0.1.0"
description = "AI-powered file organization CLI"
channels = ["conda-forge"]
platforms = ["linux-64", "osx-64", "win-64"]

[dependencies]
python = ">=3.11"
typer = ">=0.9"
rich = ">=13.0"

[tasks]
start = "my-cli"

Resources:


Phase 7: Real-World Project (Week 10-12)

7.1 Choose a Project from the Ideas List

Comprehensive Example Project:

FileOrganizer - AI-Powered File Organization CLI

  • What it demonstrates: Complete integration of all concepts from this learning path
  • Key technologies: Docker Model Runner, MCP servers, CrewAI multi-agent system, Typer CLI
  • Complexity: Advanced
  • Best for: Learners who have completed Phases 1-6 and want to see a production-ready example
  • Features:
    • Multi-agent system (Scanner, Classifier, Organizer, Deduplicator)
    • Docker-based LLM deployment
    • MCP server for file operations
    • Research paper management with metadata extraction
    • Comprehensive CLI with multiple commands
  • Learning outcomes: See how Docker AI, MCP, multi-agent systems, and CLI development work together in a real project

Recommended Starter Projects:

  1. smart-csv (Data & Analytics)

    • Good for: Learning data manipulation
    • Key skills: Pandas, CSV processing, LLM integration
    • Complexity: Medium
  2. smart-summarize (Document Processing)

    • Good for: Text processing and AI integration
    • Key skills: File I/O, API integration, prompt engineering
    • Complexity: Low-Medium
  3. error-translator (DevOps)

    • Good for: String processing and knowledge retrieval
    • Key skills: Pattern matching, API usage, caching
    • Complexity: Medium
  4. task-prioritizer (Productivity)

    • Good for: Building practical tools
    • Key skills: Data structures, AI reasoning, persistence
    • Complexity: Medium

πŸ’‘ Tip: Start with one of the simpler projects (2-4) to build confidence, then tackle FileOrganizer to see how all the concepts integrate in a production-ready application.

7.2 Development Workflow with GitHub Copilot

Step-by-Step Process:

  1. Planning Phase:

    • Use Copilot Chat to brainstorm features
    • Generate project structure
    • Create initial documentation
  2. Implementation Phase:

    • Use Copilot for boilerplate code
    • Ask Copilot to explain unfamiliar concepts
    • Generate test cases alongside code
  3. Refinement Phase:

    • Use Copilot to suggest optimizations
    • Generate documentation and examples
    • Create user guides

Effective Copilot Prompts:

# In comments, be specific:
# "Create a function that reads a CSV file, analyzes column types,
# and returns a dictionary with column names as keys and suggested
# data types as values. Handle errors gracefully."

# Use descriptive function names:
def analyze_csv_column_types(filepath: str) -> dict[str, str]:
    # Copilot will suggest implementation
    pass

# Ask for explanations:
# "Explain how to use asyncio to make concurrent API calls with rate limiting"

7.3 Project Milestones

Week 10: MVP (Minimum Viable Product)

  • Core functionality working
  • Basic CLI interface
  • Simple AI integration
  • README with usage examples

Week 11: Enhancement

  • Add configuration system
  • Implement error handling
  • Add progress indicators
  • Write tests (>70% coverage)

Week 12: Polish & Publish

  • Complete documentation
  • Add examples and tutorials
  • Set up CI/CD (GitHub Actions)
  • Publish to PyPI
  • Share on GitHub/social media

πŸ› οΈ Essential Pixi Commands Reference

# Project initialization
pixi init my-project
pixi init --channel conda-forge --channel bioconda

# Dependency management
pixi add package-name              # Add runtime dependency
pixi add --dev pytest              # Add dev dependency
pixi add "package>=1.0,<2.0"       # Version constraints
pixi remove package-name           # Remove dependency
pixi update                        # Update all dependencies

# Environment management
pixi shell                         # Activate environment
pixi run python script.py          # Run command in environment
pixi run --environment prod start  # Run in specific environment

# Task management
pixi task add start "python -m my_cli"
pixi task add test "pytest tests/"
pixi task add lint "ruff check src/"
pixi run start                     # Run defined task

# Multi-environment setup
[feature.dev.dependencies]
pytest = "*"
ruff = "*"

[environments]
default = ["dev"]
prod = []

πŸŽ“ Learning Resources

Documentation

Tutorials & Courses

Example Projects

Community


πŸ’‘ Tips for Success

Using GitHub Copilot Effectively

  1. Write Clear Comments:

    # Create a function that takes a list of file paths,
    # sends them to GPT-4 for analysis, and returns
    # a structured JSON response with organization suggestions
    
  2. Use Descriptive Names:

    • Good: analyze_and_categorize_files()
    • Bad: process()
  3. Break Down Complex Tasks:

    • Don't ask Copilot to generate entire applications
    • Build incrementally, function by function
  4. Review and Understand:

    • Always review Copilot's suggestions
    • Understand the code before accepting it
    • Test thoroughly
  5. Use Copilot Chat for:

    • Explaining error messages
    • Suggesting alternative approaches
    • Generating test cases
    • Writing documentation

Pixi Best Practices

  1. Use Feature Flags:

    [feature.ai]
    dependencies = {openai = "*", anthropic = "*"}
    
    [feature.dev]
    dependencies = {pytest = "*", ruff = "*"}
    
    [environments]
    default = ["ai"]
    dev = ["ai", "dev"]
    
  2. Define Tasks:

    [tasks]
    dev = "python -m my_cli --debug"
    test = "pytest tests/ -v"
    lint = "ruff check src/"
    format = "black src/ tests/"
    
  3. Lock Dependencies:

    • Commit pixi.lock to version control
    • Ensures reproducible builds
  4. Use Channels Wisely:

    • Start with conda-forge
    • Add specialized channels as needed

Development Workflow

  1. Start Small:

    • Build the simplest version first
    • Add features incrementally
    • Test each addition
  2. Iterate Based on Feedback:

    • Share early with friends/colleagues
    • Gather feedback
    • Improve based on real usage
  3. Document as You Go:

    • Write docstrings immediately
    • Update README with new features
    • Keep CHANGELOG current
  4. Test Continuously:

    • Write tests alongside code
    • Run tests before committing
    • Aim for >80% coverage

🎯 Success Metrics

By the end of this learning path, you should be able to:

  • βœ… Set up a Python project with pixi
  • βœ… Build a CLI application with commands and options
  • βœ… Integrate AI/LLM capabilities effectively
  • βœ… Write tests and maintain code quality
  • βœ… Publish a package to PyPI
  • βœ… Use GitHub Copilot to accelerate development
  • βœ… Build one complete AI-powered CLI tool

πŸ“… Next Steps

After completing this learning path:

  1. Build More Projects:

    • Try different project ideas from the list
    • Experiment with different AI models
    • Contribute to open-source CLI tools
  2. Advanced Topics:

    • Plugin architectures
    • Multi-command CLIs
    • Database integration
    • Web dashboards for CLI tools
    • CI/CD automation
  3. Share Your Work:

    • Write blog posts about your projects
    • Create video tutorials
    • Contribute to the community
    • Help others learn

Last Updated: 2024-12-04