RealFishSam
/

DVAE26-proj

Tabular Classification

stroke-prediction

Model card Files Files and versions

DVAE26-proj / README.md

RealFishSam's picture

Upload README.md with huggingface_hub

a9dd149 verified 15 days ago

|

history blame contribute delete

2.5 kB

	---
	tags:
	- tabular-classification
	- sklearn
	- medical
	- stroke-prediction
	metrics:
	- recall
	- precision
	- f1
	library_name: sklearn
	model_type: stack-ensemble
	---

	# Stroke Risk Prediction - Stacked Ensemble

	This repository contains a Stacked Ensemble Machine Learning Model optimized for predicting stroke risk. It was developed as part of the DVAE26 Final Project.

	## Model Description
	The model is a stacked ensemble consisting of 5 base learners:
	- Logistic Regression (L1 & L2 penalties)
	- Random Forest (Balanced)
	- XGBoost
	- Gradient Boosting

	The meta-learner is a Logistic Regression model that aggregates these predictions. The model includes a custom probability threshold optimized for high recall (sensitivity) to minimize missed stroke cases.

	## Performance
	- Recall: 80%
	- Precision: 15.7%
	- AUC-ROC: 0.865

	## How to Use

	### 1. Installation
	Clone this repository and install dependencies:
	```bash
	git clone https://huggingface.co/RealFishSam/DVAE26-proj
	cd DVAE26-proj
	pip install -r requirements.txt
	```

	### 2. Run Prediction Script
	We provide a standalone script `predict.py` that loads the model and runs a prediction.

	Basic Usage (Default Sample):
	```bash
	python predict.py
	```

	Custom Input Usage:
	You can pass patient data as command-line arguments:
	```bash
	python predict.py --age 65 --bmi 28.5 --hypertension 1 --gender Female
	```
	Use `python predict.py --help` to see all available options.

	### 3. Usage in Python
	```python
	import pickle
	import pandas as pd
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(repo_id="RealFishSam/DVAE26-proj", filename="stacked_ensemble_model.pkl")

	# Load
	with open(model_path, 'rb') as f:
	components = pickle.load(f)

	# Unpack
	model = components['meta_model']
	preprocessor = components['preprocessor']
	base_models = components['base_models']

	# Prepare Data (Example)
	data = pd.DataFrame([{
	'gender': 'Male', 'age': 75, 'hypertension': 1, 'heart_disease': 1,
	'ever_married': 'Yes', 'work_type': 'Private', 'Residence_type': 'Urban',
	'avg_glucose_level': 220.5, 'bmi': 30.1, 'smoking_status': 'formerly smoked'
	}])

	# Predict
	# ... (See predict.py for full stacking logic) ...
	```

	## Limitations
	* Imbalanced Data: The model is trained on a highly imbalanced dataset (only ~5% stroke cases).
	* Not a Diagnostic Tool: This model is for educational and screening assistance purposes only. It should not replace professional medical advice.