DVAE26-proj / README.md
RealFishSam's picture
Upload README.md with huggingface_hub
a9dd149 verified
---
tags:
- tabular-classification
- sklearn
- medical
- stroke-prediction
metrics:
- recall
- precision
- f1
library_name: sklearn
model_type: stack-ensemble
---
# Stroke Risk Prediction - Stacked Ensemble
This repository contains a **Stacked Ensemble Machine Learning Model** optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project.
## Model Description
The model is a stacked ensemble consisting of 5 base learners:
- Logistic Regression (L1 & L2 penalties)
- Random Forest (Balanced)
- XGBoost
- Gradient Boosting
The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases.
## Performance
- **Recall:** 80%
- **Precision:** 15.7%
- **AUC-ROC:** 0.865
## How to Use
### 1. Installation
Clone this repository and install dependencies:
```bash
git clone https://huggingface.co/RealFishSam/DVAE26-proj
cd DVAE26-proj
pip install -r requirements.txt
```
### 2. Run Prediction Script
We provide a standalone script `predict.py` that loads the model and runs a prediction.
**Basic Usage (Default Sample):**
```bash
python predict.py
```
**Custom Input Usage:**
You can pass patient data as command-line arguments:
```bash
python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female
```
Use `python predict.py --help` to see all available options.
### 3. Usage in Python
```python
import pickle
import pandas as pd
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl")
# Load
with open(model_path, 'rb') as f:
components = pickle.load(f)
# Unpack
model = components['meta_model']
preprocessor = components['preprocessor']
base_models = components['base_models']
# Prepare Data (Example)
data = pd.DataFrame([{
'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1,
'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban',
'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked'
}])
# Predict
# ... (See predict.py for full stacking logic) ...
```
## Limitations
* **Imbalanced Data:** The model is trained on a highly imbalanced dataset (only ~5% stroke cases).
* **Not a Diagnostic Tool:** This model is for educational and screening assistance purposes only. It should not replace professional medical advice.