# Docker Usage Guide

Complete guide for using ULTRATHINK with Docker.

## 🚀 Quick Start

### Run Web Interface (Default)
```bash
docker compose up app
# Visit http://localhost:7860
```

### Run Training (CPU)
```bash
docker compose --profile train up
```

### Run Training (GPU)
```bash
docker compose --profile train-gpu up
```

## 📦 Available Services

### 1. Web Interface (`app`)
**Purpose**: Interactive Gradio UI for model inference

```bash
# Start the web interface
docker compose up app

# Run in background
docker compose up -d app

# View logs
docker compose logs -f app
```

**Ports**:
- 7860: Gradio web interface
- 8000: FastAPI (if needed)

**Volumes**:
- `./outputs` - Model outputs
- `./checkpoints` - Model checkpoints

---

### 2. CPU Training (`train`)
**Purpose**: Train models on CPU (for testing/small models)

```bash
# Start training with default config
docker compose --profile train up

# Custom training command
docker compose run --rm train python train_ultrathink.py \
  --dataset wikitext \
  --hidden_size 512 \
  --num_layers 6 \
  --batch_size 2 \
  --num_epochs 3
```

**Example - WikiText Training**:
```bash
docker compose run --rm train python train_advanced.py \
  --config /app/configs/train_small.yaml
```

---

### 3. GPU Training (`train-gpu`)
**Purpose**: Train models with GPU acceleration

**Prerequisites**:
- NVIDIA GPU
- NVIDIA Docker runtime
- nvidia-container-toolkit

```bash
# Start GPU training
docker compose --profile train-gpu up

# Custom GPU training
docker compose run --rm train-gpu python train_ultrathink.py \
  --dataset c4 --streaming \
  --hidden_size 768 --num_layers 12 \
  --use_amp --gradient_checkpointing \
  --output_dir /app/outputs/c4_model
```

---

### 4. MLflow Tracking (`mlflow`)
**Purpose**: Experiment tracking and model registry

```bash
# Start MLflow server
docker compose --profile mlflow up -d

# Access UI
# http://localhost:5000
```

**Train with MLflow tracking**:
```bash
docker compose run --rm \
  --env MLFLOW_TRACKING_URI=http://mlflow:5000 \
  train python train_ultrathink.py \
    --use_mlflow \
    --dataset wikitext \
    --num_epochs 3
```

---

### 5. Development Environment (`dev`)
**Purpose**: Interactive development with all tools

```bash
# Start dev container
docker compose --profile dev run --rm dev

# Inside container:
pytest                  # Run tests
python train_ultrathink.py --help
jupyter notebook --ip 0.0.0.0 --port 8888
```

**Access Jupyter**:
- http://localhost:8888

---

## 🎯 Common Use Cases

### Use Case 1: Quick Demo
```bash
# Start web interface only
docker compose up app
# Visit http://localhost:7860
```

### Use Case 2: Training + Monitoring
```bash
# Start MLflow and training
docker compose --profile mlflow up -d
docker compose --profile train up
```

### Use Case 3: Full Development Stack
```bash
# Start all services
docker compose --profile dev --profile mlflow up
```

### Use Case 4: Production Training
```bash
# GPU training with checkpointing
docker compose run --rm train-gpu \
  python train_advanced.py \
    --config /app/configs/train_medium.yaml \
    --checkpoint_frequency 1000 \
    --output_dir /app/outputs/production_model
```

---

## 🔧 Advanced Usage

### Building Specific Stages

**Production image (minimal)**:
```bash
docker build --target production -t ultrathink:prod .
```

**Development image (with tools)**:
```bash
docker build --target development -t ultrathink:dev .
```

**Training image**:
```bash
docker build --target training -t ultrathink:train .
```

### Custom Environment Variables

Create `.env` file:
```env
WANDB_API_KEY=your_api_key_here
HF_TOKEN=your_hf_token_here
MLFLOW_TRACKING_URI=http://mlflow:5000
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```

Run with env file:
```bash
docker compose --env-file .env up
```

### Volume Mounting for Development

```bash
# Mount entire project (live editing)
docker run -it --rm \
  -v $(pwd):/app \
  ultrathink:dev bash
```

### Multi-GPU Training

```bash
# Use specific GPUs
docker compose run --rm \
  --env CUDA_VISIBLE_DEVICES=0,1,2,3 \
  train-gpu python train_ultrathink.py \
    --distributed \
    --num_gpus 4
```

---

## 📊 Monitoring

### View Logs
```bash
# View all logs
docker compose logs

# Follow specific service
docker compose logs -f app

# Last 100 lines
docker compose logs --tail 100 train
```

### Check Resource Usage
```bash
# Container stats
docker stats

# Specific container
docker stats ultrathink_train_gpu
```

### Access Running Container
```bash
# Execute command in running container
docker compose exec app bash

# Run pytest in running container
docker compose exec app pytest
```

---

## 🧹 Cleanup

### Stop Services
```bash
# Stop all services
docker compose down

# Stop and remove volumes
docker compose down -v
```

### Remove Images
```bash
# Remove project images
docker rmi ultrathink:latest ultrathink:training ultrathink:dev

# Prune unused images
docker image prune -a
```

### Clean Build Cache
```bash
docker builder prune -a
```

---

## 🐛 Troubleshooting

### Issue: GPU not detected
**Solution**:
```bash
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Install nvidia-container-toolkit if needed
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
```

### Issue: Out of memory
**Solution**:
```bash
# Reduce batch size
docker compose run --rm train python train_ultrathink.py \
  --batch_size 1 \
  --gradient_accumulation_steps 32

# Use gradient checkpointing
docker compose run --rm train python train_ultrathink.py \
  --gradient_checkpointing
```

### Issue: Slow training
**Solution**:
```bash
# Enable AMP and optimizations
docker compose run --rm train-gpu python train_ultrathink.py \
  --use_amp \
  --gradient_checkpointing \
  --use_flash_attention
```

### Issue: Container fails to start
**Solution**:
```bash
# Check logs
docker compose logs app

# Rebuild image
docker compose build --no-cache app

# Check disk space
docker system df
```

---

## 🔐 Security Best Practices

1. **Don't hardcode secrets** - Use environment variables
2. **Use .env file** - Keep secrets out of docker-compose.yml
3. **Limit port exposure** - Only expose necessary ports
4. **Use specific tags** - Avoid `latest` in production
5. **Scan images** - Use `docker scan ultrathink:latest`
6. **Non-root user** - Run containers as non-root (future enhancement)

---

## 📚 Additional Resources

- [Docker Documentation](https://docs.docker.com/)
- [Docker Compose Reference](https://docs.docker.com/compose/compose-file/)
- [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker)
- [Main README](README.md)
- [Training Guide](TRAINING_QUICKSTART.md)

---

## 🎓 Examples

### Example 1: Complete Training Pipeline
```bash
# 1. Start MLflow
docker compose --profile mlflow up -d

# 2. Train model
docker compose run --rm train python train_ultrathink.py \
  --dataset wikitext \
  --use_mlflow \
  --num_epochs 5 \
  --output_dir /app/outputs/wikitext_model

# 3. Check MLflow UI
# Visit http://localhost:5000

# 4. Start inference UI
docker compose up app

# 5. Cleanup
docker compose down
```

### Example 2: Distributed Training
```bash
# Multi-GPU training with DeepSpeed
docker compose run --rm \
  --env CUDA_VISIBLE_DEVICES=0,1,2,3 \
  train-gpu python train_ultrathink.py \
    --distributed \
    --deepspeed \
    --deepspeed_config /app/configs/deepspeed_z3.json \
    --dataset c4 --streaming
```

### Example 3: Development Workflow
```bash
# Start dev container
docker compose --profile dev run --rm dev bash

# Inside container:
# 1. Run tests
pytest

# 2. Train small model
python train_ultrathink.py --dataset wikitext --num_epochs 1

# 3. Profile performance
python scripts/profile_model.py --size tiny

# 4. Exit
exit
```

---

For more information, see the [main README](README.md) or [Advanced Training Guide](ADVANCED_TRAINING_GUIDE.md).