File size: 2,498 Bytes
4a1a806
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9dd149
 
 
4a1a806
 
 
 
a9dd149
 
 
 
 
 
 
4a1a806
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
tags:
- tabular-classification
- sklearn
- medical
- stroke-prediction
metrics:
- recall
- precision
- f1
library_name: sklearn
model_type: stack-ensemble
---

# Stroke Risk Prediction - Stacked Ensemble

This repository contains a **Stacked Ensemble Machine Learning Model** optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project.

## Model Description
The model is a stacked ensemble consisting of 5 base learners:
- Logistic Regression (L1 & L2 penalties)
- Random Forest (Balanced)
- XGBoost
- Gradient Boosting

The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases.

## Performance
- **Recall:** 80%
- **Precision:** 15.7%
- **AUC-ROC:** 0.865

## How to Use

### 1. Installation
Clone this repository and install dependencies:
```bash
git clone https://huggingface.co/RealFishSam/DVAE26-proj
cd DVAE26-proj
pip install -r requirements.txt
```

### 2. Run Prediction Script
We provide a standalone script `predict.py` that loads the model and runs a prediction.

**Basic Usage (Default Sample):**
```bash
python predict.py
```

**Custom Input Usage:**
You can pass patient data as command-line arguments:
```bash
python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female
```
Use `python predict.py --help` to see all available options.

### 3. Usage in Python
```python
import pickle
import pandas as pd
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl")

# Load
with open(model_path, 'rb') as f:
    components = pickle.load(f)

# Unpack
model = components['meta_model']
preprocessor = components['preprocessor']
base_models = components['base_models']

# Prepare Data (Example)
data = pd.DataFrame([{
    'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1,
    'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban',
    'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked'
}])

# Predict
# ... (See predict.py for full stacking logic) ...
```

## Limitations
*   **Imbalanced Data:** The model is trained on a highly imbalanced dataset (only ~5% stroke cases).
*   **Not a Diagnostic Tool:** This model is for educational and screening assistance purposes only. It should not replace professional medical advice.