Docker Usage Guide
Complete guide for using ULTRATHINK with Docker.
π Quick Start
Run Web Interface (Default)
docker compose up app
# Visit http://localhost:7860
Run Training (CPU)
docker compose --profile train up
Run Training (GPU)
docker compose --profile train-gpu up
π¦ Available Services
1. Web Interface (app)
Purpose: Interactive Gradio UI for model inference
# Start the web interface
docker compose up app
# Run in background
docker compose up -d app
# View logs
docker compose logs -f app
Ports:
- 7860: Gradio web interface
- 8000: FastAPI (if needed)
Volumes:
./outputs- Model outputs./checkpoints- Model checkpoints
2. CPU Training (train)
Purpose: Train models on CPU (for testing/small models)
# Start training with default config
docker compose --profile train up
# Custom training command
docker compose run --rm train python train_ultrathink.py \
--dataset wikitext \
--hidden_size 512 \
--num_layers 6 \
--batch_size 2 \
--num_epochs 3
Example - WikiText Training:
docker compose run --rm train python train_advanced.py \
--config /app/configs/train_small.yaml
3. GPU Training (train-gpu)
Purpose: Train models with GPU acceleration
Prerequisites:
- NVIDIA GPU
- NVIDIA Docker runtime
- nvidia-container-toolkit
# Start GPU training
docker compose --profile train-gpu up
# Custom GPU training
docker compose run --rm train-gpu python train_ultrathink.py \
--dataset c4 --streaming \
--hidden_size 768 --num_layers 12 \
--use_amp --gradient_checkpointing \
--output_dir /app/outputs/c4_model
4. MLflow Tracking (mlflow)
Purpose: Experiment tracking and model registry
# Start MLflow server
docker compose --profile mlflow up -d
# Access UI
# http://localhost:5000
Train with MLflow tracking:
docker compose run --rm \
--env MLFLOW_TRACKING_URI=http://mlflow:5000 \
train python train_ultrathink.py \
--use_mlflow \
--dataset wikitext \
--num_epochs 3
5. Development Environment (dev)
Purpose: Interactive development with all tools
# Start dev container
docker compose --profile dev run --rm dev
# Inside container:
pytest # Run tests
python train_ultrathink.py --help
jupyter notebook --ip 0.0.0.0 --port 8888
Access Jupyter:
π― Common Use Cases
Use Case 1: Quick Demo
# Start web interface only
docker compose up app
# Visit http://localhost:7860
Use Case 2: Training + Monitoring
# Start MLflow and training
docker compose --profile mlflow up -d
docker compose --profile train up
Use Case 3: Full Development Stack
# Start all services
docker compose --profile dev --profile mlflow up
Use Case 4: Production Training
# GPU training with checkpointing
docker compose run --rm train-gpu \
python train_advanced.py \
--config /app/configs/train_medium.yaml \
--checkpoint_frequency 1000 \
--output_dir /app/outputs/production_model
π§ Advanced Usage
Building Specific Stages
Production image (minimal):
docker build --target production -t ultrathink:prod .
Development image (with tools):
docker build --target development -t ultrathink:dev .
Training image:
docker build --target training -t ultrathink:train .
Custom Environment Variables
Create .env file:
WANDB_API_KEY=your_api_key_here
HF_TOKEN=your_hf_token_here
MLFLOW_TRACKING_URI=http://mlflow:5000
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Run with env file:
docker compose --env-file .env up
Volume Mounting for Development
# Mount entire project (live editing)
docker run -it --rm \
-v $(pwd):/app \
ultrathink:dev bash
Multi-GPU Training
# Use specific GPUs
docker compose run --rm \
--env CUDA_VISIBLE_DEVICES=0,1,2,3 \
train-gpu python train_ultrathink.py \
--distributed \
--num_gpus 4
π Monitoring
View Logs
# View all logs
docker compose logs
# Follow specific service
docker compose logs -f app
# Last 100 lines
docker compose logs --tail 100 train
Check Resource Usage
# Container stats
docker stats
# Specific container
docker stats ultrathink_train_gpu
Access Running Container
# Execute command in running container
docker compose exec app bash
# Run pytest in running container
docker compose exec app pytest
π§Ή Cleanup
Stop Services
# Stop all services
docker compose down
# Stop and remove volumes
docker compose down -v
Remove Images
# Remove project images
docker rmi ultrathink:latest ultrathink:training ultrathink:dev
# Prune unused images
docker image prune -a
Clean Build Cache
docker builder prune -a
π Troubleshooting
Issue: GPU not detected
Solution:
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Install nvidia-container-toolkit if needed
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
Issue: Out of memory
Solution:
# Reduce batch size
docker compose run --rm train python train_ultrathink.py \
--batch_size 1 \
--gradient_accumulation_steps 32
# Use gradient checkpointing
docker compose run --rm train python train_ultrathink.py \
--gradient_checkpointing
Issue: Slow training
Solution:
# Enable AMP and optimizations
docker compose run --rm train-gpu python train_ultrathink.py \
--use_amp \
--gradient_checkpointing \
--use_flash_attention
Issue: Container fails to start
Solution:
# Check logs
docker compose logs app
# Rebuild image
docker compose build --no-cache app
# Check disk space
docker system df
π Security Best Practices
- Don't hardcode secrets - Use environment variables
- Use .env file - Keep secrets out of docker-compose.yml
- Limit port exposure - Only expose necessary ports
- Use specific tags - Avoid
latestin production - Scan images - Use
docker scan ultrathink:latest - Non-root user - Run containers as non-root (future enhancement)
π Additional Resources
π Examples
Example 1: Complete Training Pipeline
# 1. Start MLflow
docker compose --profile mlflow up -d
# 2. Train model
docker compose run --rm train python train_ultrathink.py \
--dataset wikitext \
--use_mlflow \
--num_epochs 5 \
--output_dir /app/outputs/wikitext_model
# 3. Check MLflow UI
# Visit http://localhost:5000
# 4. Start inference UI
docker compose up app
# 5. Cleanup
docker compose down
Example 2: Distributed Training
# Multi-GPU training with DeepSpeed
docker compose run --rm \
--env CUDA_VISIBLE_DEVICES=0,1,2,3 \
train-gpu python train_ultrathink.py \
--distributed \
--deepspeed \
--deepspeed_config /app/configs/deepspeed_z3.json \
--dataset c4 --streaming
Example 3: Development Workflow
# Start dev container
docker compose --profile dev run --rm dev bash
# Inside container:
# 1. Run tests
pytest
# 2. Train small model
python train_ultrathink.py --dataset wikitext --num_epochs 1
# 3. Profile performance
python scripts/profile_model.py --size tiny
# 4. Exit
exit
For more information, see the main README or Advanced Training Guide.