license: mit
language:
- en
base_model:
- ViniMig/stroke-risk-package
pipeline_tag: tabular-classification
Model Card for stroke-risk-package
Model Details
Model Description
- Developed by: Vini
Model Sources
- Repository: Gitlab repo
Uses
This model was developed to be used for stroke risk predictions. It is currently used in my stroke-risk-prediction HF space.
Bias, Risks, and Limitations
Limitations are mostly around the train dataset. It is a limited size dataset and highly imbalanced (imbalanced class is < 5%).
Recommendations
The dataset could benefit from improved quality and more data collected, such as new features and new categories for existing ones, and especially from individuals from the positive class.
How to Get Started with the Model
Use the code below to get started with the model. The purpose of this project is to have a fully reproducible pipeline.
Training Details
Training Data
Even though this should link to a Dataset card, the data used for training this model was obtained from kaggle.
Training Procedure
Preprocessing
For preprocessing this relied on a first EDA step and a previous version of the model. Upon further exploration the main steps of the preprocessing stage are:
- Imputation
- Binning continuous features
- OrdinalEncoding
- Over sampling with SMOTE on train set.
Some artifacts are saved with joblib, one of which is the custom function to apply pandas cut method for generating the bins. This joblib file is currently appearing as unsafe on HF space, which is probably expected, but all the code can be seen in the linked project repository above.
This artifact is saved and stored here to be used in the streamlit app which will handle user input and preprocessing for inference.