Heart-Attack-Risk-Rate / DOCKER_OPTIMIZATION.md
Kasilanka Bhoopesh Siva Srikar
Complete Heart Attack Risk Prediction App - Ready for Deployment
08123aa

Running Model Optimization with Docker

This guide shows you how to run the model optimization scripts using Docker.

Prerequisites

  • Docker installed and running
  • Docker Compose (usually comes with Docker Desktop)
  • At least 8GB RAM available for Docker
  • Data file: content/cardio_train_extended.csv

Quick Start

Option 1: Using Docker Compose (Recommended)

# Build and run optimization
docker-compose -f docker-compose.optimization.yml up --build

# Run in detached mode (background)
docker-compose -f docker-compose.optimization.yml up -d --build

# View logs
docker-compose -f docker-compose.optimization.yml logs -f

# Stop when done
docker-compose -f docker-compose.optimization.yml down

Option 2: Using Docker Directly

# Build the image
docker build -f Dockerfile.optimization -t heart-optimization .

# Run optimization
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --name heart-optimization \
  heart-optimization

# Run with resource limits
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --cpus="4" \
  --memory="8g" \
  --name heart-optimization \
  heart-optimization

Running Specific Scripts

Run Model Optimization Only

docker-compose -f docker-compose.optimization.yml run --rm optimization python improve_models.py

Run Feature Analysis Only

docker-compose -f docker-compose.optimization.yml run --rm optimization python feature_importance_analysis.py

Run Comparison

docker-compose -f docker-compose.optimization.yml run --rm optimization python compare_models.py

Customization

Adjust Resource Limits

Edit docker-compose.optimization.yml:

deploy:
  resources:
    limits:
      cpus: '8'      # Use more CPUs if available
      memory: 16G    # More RAM for faster processing

Reduce Optimization Time

Edit improve_models.py before building:

n_trials = 50  # Reduce from 100 to 50 for faster results

Or override at runtime:

docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/improve_models.py:/app/improve_models.py" \
  heart-optimization python -c "
import sys
sys.path.insert(0, '/app')
# Modify n_trials here or use environment variable
exec(open('/app/improve_models.py').read().replace('n_trials = 100', 'n_trials = 50'))
"

Use Environment Variables

Create a .env file:

N_TRIALS=50
STUDY_TIMEOUT=1800

Then use it:

docker-compose -f docker-compose.optimization.yml --env-file .env up

Monitoring Progress

View Real-time Logs

# Using docker-compose
docker-compose -f docker-compose.optimization.yml logs -f

# Using docker
docker logs -f heart-optimization

Check Container Status

docker ps
docker stats heart-optimization

Results Location

All results are saved to your host machine in:

  • content/models/ - Optimized models and metrics
  • content/reports/ - Feature importance visualizations

These persist after the container stops.

Troubleshooting

Out of Memory

Error: Killed or memory errors

Solution:

  1. Reduce n_trials in improve_models.py
  2. Reduce memory limit in docker-compose.yml
  3. Close other applications

Build Fails

Error: Package installation fails

Solution:

# Clean build
docker-compose -f docker-compose.optimization.yml build --no-cache

Data Not Found

Error: Data file not found

Solution:

# Verify data file exists
ls -lh content/cardio_train_extended.csv

# Check volume mount
docker-compose -f docker-compose.optimization.yml config

Slow Performance

Solutions:

  1. Increase CPU allocation in docker-compose.yml
  2. Use fewer trials: n_trials = 30
  3. Run on a machine with more resources

Advanced Usage

Interactive Shell

# Get a shell in the container
docker-compose -f docker-compose.optimization.yml run --rm optimization bash

# Then run scripts manually
python improve_models.py

Run Multiple Optimizations

# Run optimization with different trial counts
for trials in 30 50 100; do
  docker run --rm \
    -v "$(pwd)/content:/app/content" \
    -e N_TRIALS=$trials \
    heart-optimization \
    python -c "import sys; sys.path.insert(0, '/app'); exec(open('/app/improve_models.py').read().replace('n_trials = 100', f'n_trials = {trials}'))"
done

Save Container State

# Commit container to image
docker commit heart-optimization heart-optimization:snapshot

# Use later
docker run --rm -v "$(pwd)/content:/app/content" heart-optimization:snapshot

Performance Tips

  1. Use SSD storage - Faster I/O for data loading
  2. Allocate more CPUs - Parallel processing in Optuna
  3. Increase memory - Better for large datasets
  4. Run overnight - Let it run while you sleep
  5. Use GPU (if available) - Requires NVIDIA Docker runtime

GPU Support (Optional)

If you have an NVIDIA GPU:

# Add to docker-compose.optimization.yml
runtime: nvidia
environment:
  - NVIDIA_VISIBLE_DEVICES=all

Then build with:

docker build -f Dockerfile.optimization -t heart-optimization .

Example Workflow

# 1. Build image
docker-compose -f docker-compose.optimization.yml build

# 2. Run optimization (takes 1-2 hours)
docker-compose -f docker-compose.optimization.yml up

# 3. In another terminal, check progress
docker-compose -f docker-compose.optimization.yml logs -f

# 4. When done, run feature analysis
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python feature_importance_analysis.py

# 5. Compare results
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python compare_models.py

# 6. Clean up
docker-compose -f docker-compose.optimization.yml down

Benefits of Using Docker

Isolation - No conflicts with your system Python
Reproducibility - Same environment every time
Resource Control - Limit CPU/memory usage
Easy Cleanup - Remove container when done
Portability - Run on any machine with Docker

Next Steps

After optimization completes:

  1. Check results in content/models/model_metrics_optimized.csv
  2. Review feature importance in content/reports/
  3. Compare with baseline using compare_models.py
  4. Deploy optimized models to your Streamlit app

Note: The optimization process can take 1-2 hours. Make sure your laptop is plugged in and won't go to sleep!