Update README.md

43febea verified about 1 month ago

2.24 kB

license: mit
language:
  - en
base_model:
  - ViniMig/stroke-risk-package
pipeline_tag: tabular-classification

Model Card for stroke-risk-package

Model Details

Model Description

Developed by: Vini

Model Sources

Repository: Gitlab repo

Uses

This model was developed to be used for stroke risk predictions. It is currently used in my stroke-risk-prediction HF space.

Bias, Risks, and Limitations

Limitations are mostly around the train dataset. It is a limited size dataset and highly imbalanced (imbalanced class is < 5%).

Recommendations

The dataset could benefit from improved quality and more data collected, such as new features and new categories for existing ones, and especially from individuals from the positive class.

How to Get Started with the Model

Use the code below to get started with the model. The purpose of this project is to have a fully reproducible pipeline.

Reproducible repo

Training Details

Training Data

Even though this should link to a Dataset card, the data used for training this model was obtained from kaggle.

Stroke risk dataset

Training Procedure

Preprocessing

For preprocessing this relied on a first EDA step and a previous version of the model. Upon further exploration the main steps of the preprocessing stage are:

Imputation
Binning continuous features
OrdinalEncoding
Over sampling with SMOTE on train set.

Preprocessing code

Some artifacts are saved with joblib, one of which is the custom function to apply pandas cut method for generating the bins. This joblib file is currently appearing as unsafe on HF space, which is probably expected, but all the code can be seen in the linked project repository above.

This artifact is saved and stored here to be used in the streamlit app which will handle user input and preprocessing for inference.