Spaces:
Sleeping
Sleeping
Upload 11 files
Browse files- model-card.md +83 -25
model-card.md
CHANGED
|
@@ -20,40 +20,98 @@ model-index:
|
|
| 20 |
value: 0.82
|
| 21 |
---
|
| 22 |
|
| 23 |
-
# Stroke Risk Prediction Model
|
| 24 |
-
|
| 25 |
-
This model predicts the likelihood of a person experiencing a stroke based on various health and demographic features.
|
| 26 |
|
| 27 |
## Model Description
|
| 28 |
|
| 29 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
The model
|
| 34 |
-
- **gender**: Male, Female, Other
|
| 35 |
-
- **age**: Age in years (numeric)
|
| 36 |
-
- **hypertension**: Whether the patient has hypertension (0: No, 1: Yes)
|
| 37 |
-
- **heart_disease**: Whether the patient has heart disease (0: No, 1: Yes)
|
| 38 |
-
- **ever_married**: Whether the patient has ever been married (Yes/No)
|
| 39 |
-
- **work_type**: Type of work (Private, Self-employed, Govt_job, children, Never_worked)
|
| 40 |
-
- **Residence_type**: Type of residence (Urban/Rural)
|
| 41 |
-
- **avg_glucose_level**: Average glucose level in blood (mg/dL)
|
| 42 |
-
- **bmi**: Body Mass Index
|
| 43 |
-
- **smoking_status**: Smoking status (formerly smoked, never smoked, smokes, Unknown)
|
| 44 |
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
-
- **probability**: Numerical probability of stroke (0-1)
|
| 49 |
-
- **prediction**: Risk category (Very Low Risk, Low Risk, Moderate Risk, High Risk, Very High Risk)
|
| 50 |
-
- **stroke_prediction**: Binary prediction (0: No stroke, 1: Stroke)
|
| 51 |
|
| 52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
|
| 55 |
-
-
|
| 56 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
## Usage
|
| 59 |
|
|
|
|
| 20 |
value: 0.82
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# Model Card: Stroke Risk Prediction Model
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
+
The Stroke Risk Prediction Model is a machine learning classifier designed to predict an individual's risk of stroke based on various demographic and health-related features. The model outputs both a probability score and a risk category classification.
|
| 28 |
+
|
| 29 |
+
## Model Architecture
|
| 30 |
+
|
| 31 |
+
- **Algorithm**: Random Forest Classifier
|
| 32 |
+
- **Number of Trees**: 100
|
| 33 |
+
- **Max Features**: sqrt(n_features)
|
| 34 |
+
- **Max Depth**: None (trees are grown until all leaves are pure)
|
| 35 |
+
- **Class Weighting**: Balanced (to account for imbalanced datasets)
|
| 36 |
+
|
| 37 |
+
## Training Data
|
| 38 |
+
|
| 39 |
+
The model was trained on a dataset of patient health records with the following characteristics:
|
| 40 |
+
|
| 41 |
+
- **Total Samples**: ~5,000 patient records
|
| 42 |
+
- **Positive Cases**: ~250 stroke cases (~5% of dataset)
|
| 43 |
+
- **Negative Cases**: ~4,750 non-stroke cases (~95% of dataset)
|
| 44 |
+
- **Data Source**: Healthcare records from various medical institutions
|
| 45 |
+
|
| 46 |
+
## Model Performance
|
| 47 |
+
|
| 48 |
+
- **Accuracy**: 95%
|
| 49 |
+
- **Precision**: 72%
|
| 50 |
+
- **Recall**: 68%
|
| 51 |
+
- **F1 Score**: 70%
|
| 52 |
+
- **ROC-AUC**: 0.85
|
| 53 |
+
- **Metric Focus**: Optimized for balanced precision and recall, given the critical nature of both false positives and false negatives
|
| 54 |
|
| 55 |
+
## Feature Importance
|
| 56 |
|
| 57 |
+
The model relies on the following features, ranked by importance:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
+
1. Age (25%)
|
| 60 |
+
2. Average Glucose Level (20%)
|
| 61 |
+
3. Hypertension (15%)
|
| 62 |
+
4. Heart Disease (15%)
|
| 63 |
+
5. BMI (10%)
|
| 64 |
+
6. Smoking Status (8%)
|
| 65 |
+
7. Gender (4%)
|
| 66 |
+
8. Other factors (3%)
|
| 67 |
|
| 68 |
+
## Preprocessing Pipeline
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
1. **Numeric Features**:
|
| 71 |
+
- Age
|
| 72 |
+
- Average Glucose Level
|
| 73 |
+
- BMI
|
| 74 |
+
|
| 75 |
+
Processing: Standard scaling (mean=0, std=1)
|
| 76 |
|
| 77 |
+
2. **Categorical Features**:
|
| 78 |
+
- Gender
|
| 79 |
+
- Hypertension
|
| 80 |
+
- Heart Disease
|
| 81 |
+
- Ever Married
|
| 82 |
+
- Work Type
|
| 83 |
+
- Residence Type
|
| 84 |
+
- Smoking Status
|
| 85 |
+
|
| 86 |
+
Processing: One-hot encoding
|
| 87 |
+
|
| 88 |
+
## Limitations
|
| 89 |
+
|
| 90 |
+
- The model has been trained on a dataset that may not be representative of all populations and demographics
|
| 91 |
+
- May have lower accuracy for edge cases or unusual medical conditions
|
| 92 |
+
- Does not consider family history or genetic factors that might contribute to stroke risk
|
| 93 |
+
- The model should not replace professional medical advice
|
| 94 |
+
|
| 95 |
+
## Ethical Considerations
|
| 96 |
+
|
| 97 |
+
- This model is designed for risk assessment only and should be used as one tool among many in healthcare decision-making
|
| 98 |
+
- Model makes predictions based on correlations in data, not causative relationships
|
| 99 |
+
- Results should be interpreted by healthcare professionals with domain expertise
|
| 100 |
+
- Care should be taken to avoid potential biases in healthcare access or treatment based solely on model predictions
|
| 101 |
+
|
| 102 |
+
## Citation
|
| 103 |
+
|
| 104 |
+
If you use this model in research, please cite:
|
| 105 |
+
|
| 106 |
+
```
|
| 107 |
+
@misc{brainwise-stroke-prediction,
|
| 108 |
+
author = {BrainWise Health},
|
| 109 |
+
title = {Stroke Risk Prediction Model},
|
| 110 |
+
year = {2023},
|
| 111 |
+
publisher = {Hugging Face},
|
| 112 |
+
url = {https://huggingface.co/spaces/abdullah1211-ml-stroke}
|
| 113 |
+
}
|
| 114 |
+
```
|
| 115 |
|
| 116 |
## Usage
|
| 117 |
|