RealFishSam commited on
Commit
4a1a806
·
verified ·
1 Parent(s): 494ffaf

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +80 -3
README.md CHANGED
@@ -1,3 +1,80 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - tabular-classification
4
+ - sklearn
5
+ - medical
6
+ - stroke-prediction
7
+ metrics:
8
+ - recall
9
+ - precision
10
+ - f1
11
+ library_name: sklearn
12
+ model_type: stack-ensemble
13
+ ---
14
+
15
+ # Stroke Risk Prediction - Stacked Ensemble
16
+
17
+ This repository contains a **Stacked Ensemble Machine Learning Model** optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project.
18
+
19
+ ## Model Description
20
+ The model is a stacked ensemble consisting of 5 base learners:
21
+ - Logistic Regression (L1 & L2 penalties)
22
+ - Random Forest (Balanced)
23
+ - XGBoost
24
+ - Gradient Boosting
25
+
26
+ The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases.
27
+
28
+ ## Performance
29
+ - **Recall:** 80%
30
+ - **Precision:** 15.7%
31
+ - **AUC-ROC:** 0.865
32
+
33
+ ## How to Use
34
+
35
+ ### 1. Installation
36
+ Clone this repository and install dependencies:
37
+ ```bash
38
+ git clone https://huggingface.co/RealFishSam/DVAE26-proj
39
+ cd DVAE26-proj
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+ ### 2. Run Prediction Script
44
+ We provide a standalone script `predict.py` that loads the model and runs a prediction on sample data:
45
+ ```bash
46
+ python predict.py
47
+ ```
48
+
49
+ ### 3. Usage in Python
50
+ ```python
51
+ import pickle
52
+ import pandas as pd
53
+ from huggingface_hub import hf_hub_download
54
+
55
+ # Download model
56
+ model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl")
57
+
58
+ # Load
59
+ with open(model_path, 'rb') as f:
60
+ components = pickle.load(f)
61
+
62
+ # Unpack
63
+ model = components['meta_model']
64
+ preprocessor = components['preprocessor']
65
+ base_models = components['base_models']
66
+
67
+ # Prepare Data (Example)
68
+ data = pd.DataFrame([{
69
+ 'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1,
70
+ 'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban',
71
+ 'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked'
72
+ }])
73
+
74
+ # Predict
75
+ # ... (See predict.py for full stacking logic) ...
76
+ ```
77
+
78
+ ## Limitations
79
+ * **Imbalanced Data:** The model is trained on a highly imbalanced dataset (only ~5% stroke cases).
80
+ * **Not a Diagnostic Tool:** This model is for educational and screening assistance purposes only. It should not replace professional medical advice.