Heart-Attack-Risk-Rate / QUICK_START.md
Kasilanka Bhoopesh Siva Srikar
Complete Heart Attack Risk Prediction App - Ready for Deployment
08123aa

Quick Start Guide: Model Improvement

Overview

This guide helps you improve your heart attack risk prediction models using advanced optimization techniques.

🐳 Docker Option (Recommended)

If you have Docker installed, this is the easiest way to run optimization:

# Simple one-command execution
./run_optimization_docker.sh

# Or with custom settings
./run_optimization_docker.sh --trials 50

# Run feature analysis
./run_optimization_docker.sh --script feature_importance_analysis.py

See DOCKER_OPTIMIZATION.md for detailed Docker instructions.


Local Installation Option

Current Performance

Your current models achieve:

  • Accuracy: ~85.1%
  • Recall: ~84.3%
  • ROC-AUC: ~92.5%

Quick Start (3 Steps)

Step 1: Install Dependencies

pip install -r requirements.txt

This will install Optuna and other required packages.

Step 2: Run Model Optimization

python improve_models.py

What this does:

  • Optimizes hyperparameters for XGBoost, CatBoost, and LightGBM using Optuna
  • Finds optimal prediction thresholds for each model
  • Optimizes ensemble weights
  • Saves improved models to content/models/

Time: ~1-2 hours (100 trials per model)

Output:

  • XGBoost_optimized.joblib
  • CatBoost_optimized.joblib
  • LightGBM_optimized.joblib
  • model_metrics_optimized.csv
  • ensemble_info_optimized.json
  • best_params_optimized.json

Step 3: Analyze Feature Importance (Optional)

python feature_importance_analysis.py

What this does:

  • Analyzes feature importance across all models
  • Performs statistical feature selection
  • Generates visualizations
  • Provides feature selection recommendations

Time: ~5-10 minutes

Output:

  • feature_selection_recommendations.json
  • feature_importance_top30.png
  • feature_correlation_top30.png

Step 4: Compare Results

python compare_models.py

What this does:

  • Compares baseline vs optimized models
  • Shows improvement metrics
  • Displays optimal ensemble configuration

Expected Improvements

After running the optimization:

Metric Current Expected Improvement
Accuracy 85.1% 86-87% +1-2%
Recall 84.3% 86-87.5% +2-4%
F1 Score 85.0% 86-87% +1-2%

Key Improvements Implemented

  1. Optuna Hyperparameter Optimization

    • Tree-structured Parzen Estimator (TPE)
    • 100+ trials per model
    • Expanded parameter search spaces
  2. Multi-Objective Optimization

    • Combined accuracy + recall scoring
    • Threshold optimization per model
  3. Enhanced Ensemble

    • Three-model ensemble (XGBoost + CatBoost + LightGBM)
    • Optimized weights
    • Optimized threshold
  4. Feature Analysis

    • Importance extraction
    • Statistical selection methods
    • Recommendations for feature engineering

Faster Alternative

If you want faster results (less optimal but quicker):

Edit improve_models.py and change:

n_trials = 100  # Change to 30-50 for faster results

Troubleshooting

Problem: Script takes too long

  • Solution: Reduce n_trials to 30-50

Problem: Memory errors

  • Solution: Reduce n_jobs or use smaller data sample

Problem: No improvement

  • Solution: Check data preprocessing matches training data

Next Steps

  1. Run optimization scripts
  2. Compare results with baseline
  3. Test optimized models on validation set
  4. Deploy best performing model
  5. Monitor performance

Files Created

  • improve_models.py - Main optimization script
  • feature_importance_analysis.py - Feature analysis
  • compare_models.py - Comparison tool
  • IMPROVEMENTS.md - Detailed improvement analysis
  • QUICK_START.md - This guide

Questions?

See IMPROVEMENTS.md for detailed explanations of all improvements.