Heart-Attack-Risk-Rate / QUICK_START.md
Kasilanka Bhoopesh Siva Srikar
Complete Heart Attack Risk Prediction App - Ready for Deployment
08123aa
# Quick Start Guide: Model Improvement
## Overview
This guide helps you improve your heart attack risk prediction models using advanced optimization techniques.
## 🐳 Docker Option (Recommended)
If you have Docker installed, this is the easiest way to run optimization:
```bash
# Simple one-command execution
./run_optimization_docker.sh
# Or with custom settings
./run_optimization_docker.sh --trials 50
# Run feature analysis
./run_optimization_docker.sh --script feature_importance_analysis.py
```
See [DOCKER_OPTIMIZATION.md](DOCKER_OPTIMIZATION.md) for detailed Docker instructions.
---
## Local Installation Option
## Current Performance
Your current models achieve:
- **Accuracy:** ~85.1%
- **Recall:** ~84.3%
- **ROC-AUC:** ~92.5%
## Quick Start (3 Steps)
### Step 1: Install Dependencies
```bash
pip install -r requirements.txt
```
This will install Optuna and other required packages.
### Step 2: Run Model Optimization
```bash
python improve_models.py
```
**What this does:**
- Optimizes hyperparameters for XGBoost, CatBoost, and LightGBM using Optuna
- Finds optimal prediction thresholds for each model
- Optimizes ensemble weights
- Saves improved models to `content/models/`
**Time:** ~1-2 hours (100 trials per model)
**Output:**
- `XGBoost_optimized.joblib`
- `CatBoost_optimized.joblib`
- `LightGBM_optimized.joblib`
- `model_metrics_optimized.csv`
- `ensemble_info_optimized.json`
- `best_params_optimized.json`
### Step 3: Analyze Feature Importance (Optional)
```bash
python feature_importance_analysis.py
```
**What this does:**
- Analyzes feature importance across all models
- Performs statistical feature selection
- Generates visualizations
- Provides feature selection recommendations
**Time:** ~5-10 minutes
**Output:**
- `feature_selection_recommendations.json`
- `feature_importance_top30.png`
- `feature_correlation_top30.png`
### Step 4: Compare Results
```bash
python compare_models.py
```
**What this does:**
- Compares baseline vs optimized models
- Shows improvement metrics
- Displays optimal ensemble configuration
## Expected Improvements
After running the optimization:
| Metric | Current | Expected | Improvement |
|--------|---------|----------|-------------|
| Accuracy | 85.1% | 86-87% | +1-2% |
| Recall | 84.3% | 86-87.5% | +2-4% |
| F1 Score | 85.0% | 86-87% | +1-2% |
## Key Improvements Implemented
1.**Optuna Hyperparameter Optimization**
- Tree-structured Parzen Estimator (TPE)
- 100+ trials per model
- Expanded parameter search spaces
2.**Multi-Objective Optimization**
- Combined accuracy + recall scoring
- Threshold optimization per model
3.**Enhanced Ensemble**
- Three-model ensemble (XGBoost + CatBoost + LightGBM)
- Optimized weights
- Optimized threshold
4.**Feature Analysis**
- Importance extraction
- Statistical selection methods
- Recommendations for feature engineering
## Faster Alternative
If you want faster results (less optimal but quicker):
Edit `improve_models.py` and change:
```python
n_trials = 100 # Change to 30-50 for faster results
```
## Troubleshooting
**Problem:** Script takes too long
- **Solution:** Reduce `n_trials` to 30-50
**Problem:** Memory errors
- **Solution:** Reduce `n_jobs` or use smaller data sample
**Problem:** No improvement
- **Solution:** Check data preprocessing matches training data
## Next Steps
1. Run optimization scripts
2. Compare results with baseline
3. Test optimized models on validation set
4. Deploy best performing model
5. Monitor performance
## Files Created
- `improve_models.py` - Main optimization script
- `feature_importance_analysis.py` - Feature analysis
- `compare_models.py` - Comparison tool
- `IMPROVEMENTS.md` - Detailed improvement analysis
- `QUICK_START.md` - This guide
## Questions?
See `IMPROVEMENTS.md` for detailed explanations of all improvements.