# Running Model Optimization with Docker

This guide shows you how to run the model optimization scripts using Docker.

## Prerequisites

- Docker installed and running
- Docker Compose (usually comes with Docker Desktop)
- At least 8GB RAM available for Docker
- Data file: `content/cardio_train_extended.csv`

## Quick Start

### Option 1: Using Docker Compose (Recommended)

```bash
# Build and run optimization
docker-compose -f docker-compose.optimization.yml up --build

# Run in detached mode (background)
docker-compose -f docker-compose.optimization.yml up -d --build

# View logs
docker-compose -f docker-compose.optimization.yml logs -f

# Stop when done
docker-compose -f docker-compose.optimization.yml down
```

### Option 2: Using Docker Directly

```bash
# Build the image
docker build -f Dockerfile.optimization -t heart-optimization .

# Run optimization
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --name heart-optimization \
  heart-optimization

# Run with resource limits
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --cpus="4" \
  --memory="8g" \
  --name heart-optimization \
  heart-optimization
```

## Running Specific Scripts

### Run Model Optimization Only

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python improve_models.py
```

### Run Feature Analysis Only

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python feature_importance_analysis.py
```

### Run Comparison

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python compare_models.py
```

## Customization

### Adjust Resource Limits

Edit `docker-compose.optimization.yml`:

```yaml
deploy:
  resources:
    limits:
      cpus: '8'      # Use more CPUs if available
      memory: 16G    # More RAM for faster processing
```

### Reduce Optimization Time

Edit `improve_models.py` before building:

```python
n_trials = 50  # Reduce from 100 to 50 for faster results
```

Or override at runtime:

```bash
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/improve_models.py:/app/improve_models.py" \
  heart-optimization python -c "
import sys
sys.path.insert(0, '/app')
# Modify n_trials here or use environment variable
exec(open('/app/improve_models.py').read().replace('n_trials = 100', 'n_trials = 50'))
"
```

### Use Environment Variables

Create a `.env` file:

```env
N_TRIALS=50
STUDY_TIMEOUT=1800
```

Then use it:

```bash
docker-compose -f docker-compose.optimization.yml --env-file .env up
```

## Monitoring Progress

### View Real-time Logs

```bash
# Using docker-compose
docker-compose -f docker-compose.optimization.yml logs -f

# Using docker
docker logs -f heart-optimization
```

### Check Container Status

```bash
docker ps
docker stats heart-optimization
```

## Results Location

All results are saved to your host machine in:
- `content/models/` - Optimized models and metrics
- `content/reports/` - Feature importance visualizations

These persist after the container stops.

## Troubleshooting

### Out of Memory

**Error:** `Killed` or memory errors

**Solution:**
1. Reduce `n_trials` in `improve_models.py`
2. Reduce memory limit in docker-compose.yml
3. Close other applications

### Build Fails

**Error:** Package installation fails

**Solution:**
```bash
# Clean build
docker-compose -f docker-compose.optimization.yml build --no-cache
```

### Data Not Found

**Error:** `Data file not found`

**Solution:**
```bash
# Verify data file exists
ls -lh content/cardio_train_extended.csv

# Check volume mount
docker-compose -f docker-compose.optimization.yml config
```

### Slow Performance

**Solutions:**
1. Increase CPU allocation in docker-compose.yml
2. Use fewer trials: `n_trials = 30`
3. Run on a machine with more resources

## Advanced Usage

### Interactive Shell

```bash
# Get a shell in the container
docker-compose -f docker-compose.optimization.yml run --rm optimization bash

# Then run scripts manually
python improve_models.py
```

### Run Multiple Optimizations

```bash
# Run optimization with different trial counts
for trials in 30 50 100; do
  docker run --rm \
    -v "$(pwd)/content:/app/content" \
    -e N_TRIALS=$trials \
    heart-optimization \
    python -c "import sys; sys.path.insert(0, '/app'); exec(open('/app/improve_models.py').read().replace('n_trials = 100', f'n_trials = {trials}'))"
done
```

### Save Container State

```bash
# Commit container to image
docker commit heart-optimization heart-optimization:snapshot

# Use later
docker run --rm -v "$(pwd)/content:/app/content" heart-optimization:snapshot
```

## Performance Tips

1. **Use SSD storage** - Faster I/O for data loading
2. **Allocate more CPUs** - Parallel processing in Optuna
3. **Increase memory** - Better for large datasets
4. **Run overnight** - Let it run while you sleep
5. **Use GPU** (if available) - Requires NVIDIA Docker runtime

## GPU Support (Optional)

If you have an NVIDIA GPU:

```yaml
# Add to docker-compose.optimization.yml
runtime: nvidia
environment:
  - NVIDIA_VISIBLE_DEVICES=all
```

Then build with:
```bash
docker build -f Dockerfile.optimization -t heart-optimization .
```

## Example Workflow

```bash
# 1. Build image
docker-compose -f docker-compose.optimization.yml build

# 2. Run optimization (takes 1-2 hours)
docker-compose -f docker-compose.optimization.yml up

# 3. In another terminal, check progress
docker-compose -f docker-compose.optimization.yml logs -f

# 4. When done, run feature analysis
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python feature_importance_analysis.py

# 5. Compare results
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python compare_models.py

# 6. Clean up
docker-compose -f docker-compose.optimization.yml down
```

## Benefits of Using Docker

✅ **Isolation** - No conflicts with your system Python  
✅ **Reproducibility** - Same environment every time  
✅ **Resource Control** - Limit CPU/memory usage  
✅ **Easy Cleanup** - Remove container when done  
✅ **Portability** - Run on any machine with Docker  

## Next Steps

After optimization completes:
1. Check results in `content/models/model_metrics_optimized.csv`
2. Review feature importance in `content/reports/`
3. Compare with baseline using `compare_models.py`
4. Deploy optimized models to your Streamlit app

---

**Note:** The optimization process can take 1-2 hours. Make sure your laptop is plugged in and won't go to sleep!