|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- precision |
|
|
- recall |
|
|
- accuracy |
|
|
pipeline_tag: tabular-classification |
|
|
library_name: sklearn |
|
|
tags: |
|
|
- healthcare |
|
|
- science |
|
|
--- |
|
|
|
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
The following model is designed to predict, given a certain number of inputs, whether a person has and/or is it at risk of acquiring heart disease. |
|
|
This model is composed of 13 input features, and is designed to work within form-based applications, i.e. software applications which require |
|
|
user input. |
|
|
|
|
|
NOTE: The following model is meant as an assistive tool, and must NOT directly be used to produce the final verdict on a person or patient's condition. |
|
|
As it is meant to promote further evaluations upon having completed its prediction. |
|
|
|
|
|
|
|
|
- **Developed by:** DeepNeural |
|
|
- **Model type:** Tabular Classifier |
|
|
- **Language(s):** English |
|
|
- **License:** MIT |
|
|
|
|
|
### Model Inputs |
|
|
| Variable Name | Type | Description & Input Value | |
|
|
|--------------------|---------|-------------------------------------------------------------------------------| |
|
|
| age | Integer |Patient's age |
|
|
| sex | Binary | Patient's sex (1 = male 0 = female) |
|
|
| chest pain type | Integer | 1 = Typical angina, 2 = atypical angina 3 = non-anginal pain 4 = asymptomatic |
|
|
| resting blood pressure | Integer | resting blood pressure (in mm Hg on admission to the hospital) |
|
|
| serum cholestoral in mg/dl | Integer | |
|
|
| fasting blood sugar > 120 mg/dl | Binary | is the patient's blood sugar level greater than 120 mg/dl? |
|
|
| resting electrocardiographic results (values 0,1,2) | Integer | 0 = normal 1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria | |
|
|
| maximum heart rate achieved | Integer | |
|
|
| exercise induced angina | Binary | Does the patient suffer from exercise induced angina? |
|
|
| oldpeak | Integer | ST depression induced by exercise relative to rest | |
|
|
| the slope of the peak exercise ST segment | Integer | 1 = upsloping 2 = flat 3 = downsloping |
|
|
| number of major vessels (0-3) colored by flourosopy | Integer | |
|
|
| thal | Integer | 0 = normal; 1 = fixed defect; 2 = reversable defect |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
This model is primarily designed for Data Scientists, Software Engineers and Machine Learning Engineers who have an interest in developing heart disease |
|
|
software applications, for various healthcare institutions, ranging from hospitals to clinics. Furthermore, this model is also designed for educational |
|
|
purposes within acadamia, whereby diabetic risk-analysis is a priority of the study. |
|
|
|
|
|
Foreseeable users of the software applications to be developed with this model include: doctors, nurses (with respect to their patients) |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
Please be adviced that our model was trained on a specific dataset for heart disease classification, |
|
|
and although it has an high level of accuracy and precision, there may come certain moments where misclassifications occur. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More research needed for further recommendations. |
|
|
Furthermore, the following model will continously undergo improvements and testing for better results capable of fixing the limitations mentioned in the previous |
|
|
section. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
To properly make use of this model, please refer to the illustration below, which |
|
|
showcases how this model can be loaded directly into an application. Please note, that, |
|
|
because it was built with the Scikit-Learn Machine Learning library, the model has been saved |
|
|
as a .joblib file. With that in mind, please proceed by copying the following code into your coding environment (Python). |
|
|
|
|
|
1. Install Joblib |
|
|
```python |
|
|
!pip install joblib |
|
|
|
|
|
``` |
|
|
|
|
|
2. Load the model Upon Installation |
|
|
```python |
|
|
my_model = joblib.load('heart_disease_classifier_model_v1.joblib') |
|
|
|
|
|
``` |
|
|
|
|
|
3. Make predictions (Binary or Probability) |
|
|
```python |
|
|
my_model.predict(X_test) |
|
|
|
|
|
# For probability-based outputs |
|
|
|
|
|
my_model.predict_proba(X_test) |
|
|
``` |
|
|
|
|
|
NOTE: This model requires input data in a 2-Dimensional format (Pandas Series) with the column names, |
|
|
considering the model is to be used in form-based applications. |
|
|
|
|
|
|
|
|
#### Metrics |
|
|
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
We tested our model by implementing various ML models, namely: logistic regression, Stochastic Gradient Descent, Support Vector Machines, and |
|
|
K-Nearest Neighbor models. After performing hyperparameter tuning we opted to prioritize the K-Nearest Neighbor model for predictive purposes |
|
|
as it showed the best results. The metrics used were accuracy, precision, recall, f1-score and AUC. |
|
|
The results for our model can be seen in the 'Results' section. |
|
|
|
|
|
### Results |
|
|
|
|
|
Accuracy - 94% |
|
|
Precision - 94% |
|
|
Recall - 94% |
|
|
AUC ROC - 94% |