|
|
--- |
|
|
tags: |
|
|
- tabular-classification |
|
|
- sklearn |
|
|
- medical |
|
|
- stroke-prediction |
|
|
metrics: |
|
|
- recall |
|
|
- precision |
|
|
- f1 |
|
|
library_name: sklearn |
|
|
model_type: stack-ensemble |
|
|
--- |
|
|
|
|
|
# Stroke Risk Prediction - Stacked Ensemble |
|
|
|
|
|
This repository contains a **Stacked Ensemble Machine Learning Model** optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project. |
|
|
|
|
|
## Model Description |
|
|
The model is a stacked ensemble consisting of 5 base learners: |
|
|
- Logistic Regression (L1 & L2 penalties) |
|
|
- Random Forest (Balanced) |
|
|
- XGBoost |
|
|
- Gradient Boosting |
|
|
|
|
|
The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases. |
|
|
|
|
|
## Performance |
|
|
- **Recall:** 80% |
|
|
- **Precision:** 15.7% |
|
|
- **AUC-ROC:** 0.865 |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### 1. Installation |
|
|
Clone this repository and install dependencies: |
|
|
```bash |
|
|
git clone https://huggingface.co/RealFishSam/DVAE26-proj |
|
|
cd DVAE26-proj |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### 2. Run Prediction Script |
|
|
We provide a standalone script `predict.py` that loads the model and runs a prediction. |
|
|
|
|
|
**Basic Usage (Default Sample):** |
|
|
```bash |
|
|
python predict.py |
|
|
``` |
|
|
|
|
|
**Custom Input Usage:** |
|
|
You can pass patient data as command-line arguments: |
|
|
```bash |
|
|
python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female |
|
|
``` |
|
|
Use `python predict.py --help` to see all available options. |
|
|
|
|
|
### 3. Usage in Python |
|
|
```python |
|
|
import pickle |
|
|
import pandas as pd |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download model |
|
|
model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl") |
|
|
|
|
|
# Load |
|
|
with open(model_path, 'rb') as f: |
|
|
components = pickle.load(f) |
|
|
|
|
|
# Unpack |
|
|
model = components['meta_model'] |
|
|
preprocessor = components['preprocessor'] |
|
|
base_models = components['base_models'] |
|
|
|
|
|
# Prepare Data (Example) |
|
|
data = pd.DataFrame([{ |
|
|
'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1, |
|
|
'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban', |
|
|
'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked' |
|
|
}]) |
|
|
|
|
|
# Predict |
|
|
# ... (See predict.py for full stacking logic) ... |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
* **Imbalanced Data:** The model is trained on a highly imbalanced dataset (only ~5% stroke cases). |
|
|
* **Not a Diagnostic Tool:** This model is for educational and screening assistance purposes only. It should not replace professional medical advice. |
|
|
|