Heart-Attack-Risk-Rate / DOCKER_OPTIMIZATION.md
Kasilanka Bhoopesh Siva Srikar
Complete Heart Attack Risk Prediction App - Ready for Deployment
08123aa
# Running Model Optimization with Docker
This guide shows you how to run the model optimization scripts using Docker.
## Prerequisites
- Docker installed and running
- Docker Compose (usually comes with Docker Desktop)
- At least 8GB RAM available for Docker
- Data file: `content/cardio_train_extended.csv`
## Quick Start
### Option 1: Using Docker Compose (Recommended)
```bash
# Build and run optimization
docker-compose -f docker-compose.optimization.yml up --build
# Run in detached mode (background)
docker-compose -f docker-compose.optimization.yml up -d --build
# View logs
docker-compose -f docker-compose.optimization.yml logs -f
# Stop when done
docker-compose -f docker-compose.optimization.yml down
```
### Option 2: Using Docker Directly
```bash
# Build the image
docker build -f Dockerfile.optimization -t heart-optimization .
# Run optimization
docker run --rm \
-v "$(pwd)/content:/app/content" \
-v "$(pwd)/model_assets:/app/model_assets:ro" \
--name heart-optimization \
heart-optimization
# Run with resource limits
docker run --rm \
-v "$(pwd)/content:/app/content" \
-v "$(pwd)/model_assets:/app/model_assets:ro" \
--cpus="4" \
--memory="8g" \
--name heart-optimization \
heart-optimization
```
## Running Specific Scripts
### Run Model Optimization Only
```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python improve_models.py
```
### Run Feature Analysis Only
```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python feature_importance_analysis.py
```
### Run Comparison
```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python compare_models.py
```
## Customization
### Adjust Resource Limits
Edit `docker-compose.optimization.yml`:
```yaml
deploy:
resources:
limits:
cpus: '8' # Use more CPUs if available
memory: 16G # More RAM for faster processing
```
### Reduce Optimization Time
Edit `improve_models.py` before building:
```python
n_trials = 50 # Reduce from 100 to 50 for faster results
```
Or override at runtime:
```bash
docker run --rm \
-v "$(pwd)/content:/app/content" \
-v "$(pwd)/improve_models.py:/app/improve_models.py" \
heart-optimization python -c "
import sys
sys.path.insert(0, '/app')
# Modify n_trials here or use environment variable
exec(open('/app/improve_models.py').read().replace('n_trials = 100', 'n_trials = 50'))
"
```
### Use Environment Variables
Create a `.env` file:
```env
N_TRIALS=50
STUDY_TIMEOUT=1800
```
Then use it:
```bash
docker-compose -f docker-compose.optimization.yml --env-file .env up
```
## Monitoring Progress
### View Real-time Logs
```bash
# Using docker-compose
docker-compose -f docker-compose.optimization.yml logs -f
# Using docker
docker logs -f heart-optimization
```
### Check Container Status
```bash
docker ps
docker stats heart-optimization
```
## Results Location
All results are saved to your host machine in:
- `content/models/` - Optimized models and metrics
- `content/reports/` - Feature importance visualizations
These persist after the container stops.
## Troubleshooting
### Out of Memory
**Error:** `Killed` or memory errors
**Solution:**
1. Reduce `n_trials` in `improve_models.py`
2. Reduce memory limit in docker-compose.yml
3. Close other applications
### Build Fails
**Error:** Package installation fails
**Solution:**
```bash
# Clean build
docker-compose -f docker-compose.optimization.yml build --no-cache
```
### Data Not Found
**Error:** `Data file not found`
**Solution:**
```bash
# Verify data file exists
ls -lh content/cardio_train_extended.csv
# Check volume mount
docker-compose -f docker-compose.optimization.yml config
```
### Slow Performance
**Solutions:**
1. Increase CPU allocation in docker-compose.yml
2. Use fewer trials: `n_trials = 30`
3. Run on a machine with more resources
## Advanced Usage
### Interactive Shell
```bash
# Get a shell in the container
docker-compose -f docker-compose.optimization.yml run --rm optimization bash
# Then run scripts manually
python improve_models.py
```
### Run Multiple Optimizations
```bash
# Run optimization with different trial counts
for trials in 30 50 100; do
docker run --rm \
-v "$(pwd)/content:/app/content" \
-e N_TRIALS=$trials \
heart-optimization \
python -c "import sys; sys.path.insert(0, '/app'); exec(open('/app/improve_models.py').read().replace('n_trials = 100', f'n_trials = {trials}'))"
done
```
### Save Container State
```bash
# Commit container to image
docker commit heart-optimization heart-optimization:snapshot
# Use later
docker run --rm -v "$(pwd)/content:/app/content" heart-optimization:snapshot
```
## Performance Tips
1. **Use SSD storage** - Faster I/O for data loading
2. **Allocate more CPUs** - Parallel processing in Optuna
3. **Increase memory** - Better for large datasets
4. **Run overnight** - Let it run while you sleep
5. **Use GPU** (if available) - Requires NVIDIA Docker runtime
## GPU Support (Optional)
If you have an NVIDIA GPU:
```yaml
# Add to docker-compose.optimization.yml
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
```
Then build with:
```bash
docker build -f Dockerfile.optimization -t heart-optimization .
```
## Example Workflow
```bash
# 1. Build image
docker-compose -f docker-compose.optimization.yml build
# 2. Run optimization (takes 1-2 hours)
docker-compose -f docker-compose.optimization.yml up
# 3. In another terminal, check progress
docker-compose -f docker-compose.optimization.yml logs -f
# 4. When done, run feature analysis
docker-compose -f docker-compose.optimization.yml run --rm optimization \
python feature_importance_analysis.py
# 5. Compare results
docker-compose -f docker-compose.optimization.yml run --rm optimization \
python compare_models.py
# 6. Clean up
docker-compose -f docker-compose.optimization.yml down
```
## Benefits of Using Docker
**Isolation** - No conflicts with your system Python
**Reproducibility** - Same environment every time
**Resource Control** - Limit CPU/memory usage
**Easy Cleanup** - Remove container when done
**Portability** - Run on any machine with Docker
## Next Steps
After optimization completes:
1. Check results in `content/models/model_metrics_optimized.csv`
2. Review feature importance in `content/reports/`
3. Compare with baseline using `compare_models.py`
4. Deploy optimized models to your Streamlit app
---
**Note:** The optimization process can take 1-2 hours. Make sure your laptop is plugged in and won't go to sleep!