--- tags: - tabular-classification - sklearn - medical - stroke-prediction metrics: - recall - precision - f1 library_name: sklearn model_type: stack-ensemble --- # Stroke Risk Prediction - Stacked Ensemble This repository contains a **Stacked Ensemble Machine Learning Model** optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project. ## Model Description The model is a stacked ensemble consisting of 5 base learners: - Logistic Regression (L1 & L2 penalties) - Random Forest (Balanced) - XGBoost - Gradient Boosting The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases. ## Performance - **Recall:** 80% - **Precision:** 15.7% - **AUC-ROC:** 0.865 ## How to Use ### 1. Installation Clone this repository and install dependencies: ```bash git clone https://huggingface.co/RealFishSam/DVAE26-proj cd DVAE26-proj pip install -r requirements.txt ``` ### 2. Run Prediction Script We provide a standalone script `predict.py` that loads the model and runs a prediction. **Basic Usage (Default Sample):** ```bash python predict.py ``` **Custom Input Usage:** You can pass patient data as command-line arguments: ```bash python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female ``` Use `python predict.py --help` to see all available options. ### 3. Usage in Python ```python import pickle import pandas as pd from huggingface_hub import hf_hub_download # Download model model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl") # Load with open(model_path, 'rb') as f: components = pickle.load(f) # Unpack model = components['meta_model'] preprocessor = components['preprocessor'] base_models = components['base_models'] # Prepare Data (Example) data = pd.DataFrame([{ 'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1, 'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban', 'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked' }]) # Predict # ... (See predict.py for full stacking logic) ... ``` ## Limitations * **Imbalanced Data:** The model is trained on a highly imbalanced dataset (only ~5% stroke cases). * **Not a Diagnostic Tool:** This model is for educational and screening assistance purposes only. It should not replace professional medical advice.