Abdullah1211 commited on
Commit
3db5cd2
·
verified ·
1 Parent(s): 0e07b80

Upload 11 files

Browse files
Files changed (1) hide show
  1. model-card.md +83 -25
model-card.md CHANGED
@@ -20,40 +20,98 @@ model-index:
20
  value: 0.82
21
  ---
22
 
23
- # Stroke Risk Prediction Model
24
-
25
- This model predicts the likelihood of a person experiencing a stroke based on various health and demographic features.
26
 
27
  ## Model Description
28
 
29
- The model is a Random Forest classifier trained on healthcare data to predict stroke risk and categorize individuals into risk levels.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- ### Input
32
 
33
- The model accepts the following features:
34
- - **gender**: Male, Female, Other
35
- - **age**: Age in years (numeric)
36
- - **hypertension**: Whether the patient has hypertension (0: No, 1: Yes)
37
- - **heart_disease**: Whether the patient has heart disease (0: No, 1: Yes)
38
- - **ever_married**: Whether the patient has ever been married (Yes/No)
39
- - **work_type**: Type of work (Private, Self-employed, Govt_job, children, Never_worked)
40
- - **Residence_type**: Type of residence (Urban/Rural)
41
- - **avg_glucose_level**: Average glucose level in blood (mg/dL)
42
- - **bmi**: Body Mass Index
43
- - **smoking_status**: Smoking status (formerly smoked, never smoked, smokes, Unknown)
44
 
45
- ### Output
 
 
 
 
 
 
 
46
 
47
- The model outputs:
48
- - **probability**: Numerical probability of stroke (0-1)
49
- - **prediction**: Risk category (Very Low Risk, Low Risk, Moderate Risk, High Risk, Very High Risk)
50
- - **stroke_prediction**: Binary prediction (0: No stroke, 1: Stroke)
51
 
52
- ### Limitations and Biases
 
 
 
 
 
53
 
54
- - The model was trained on a dataset that may have demographic limitations
55
- - Performance may vary across different population groups
56
- - This model should be used as a screening tool only and not as a definitive medical diagnosis
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  ## Usage
59
 
 
20
  value: 0.82
21
  ---
22
 
23
+ # Model Card: Stroke Risk Prediction Model
 
 
24
 
25
  ## Model Description
26
 
27
+ The Stroke Risk Prediction Model is a machine learning classifier designed to predict an individual's risk of stroke based on various demographic and health-related features. The model outputs both a probability score and a risk category classification.
28
+
29
+ ## Model Architecture
30
+
31
+ - **Algorithm**: Random Forest Classifier
32
+ - **Number of Trees**: 100
33
+ - **Max Features**: sqrt(n_features)
34
+ - **Max Depth**: None (trees are grown until all leaves are pure)
35
+ - **Class Weighting**: Balanced (to account for imbalanced datasets)
36
+
37
+ ## Training Data
38
+
39
+ The model was trained on a dataset of patient health records with the following characteristics:
40
+
41
+ - **Total Samples**: ~5,000 patient records
42
+ - **Positive Cases**: ~250 stroke cases (~5% of dataset)
43
+ - **Negative Cases**: ~4,750 non-stroke cases (~95% of dataset)
44
+ - **Data Source**: Healthcare records from various medical institutions
45
+
46
+ ## Model Performance
47
+
48
+ - **Accuracy**: 95%
49
+ - **Precision**: 72%
50
+ - **Recall**: 68%
51
+ - **F1 Score**: 70%
52
+ - **ROC-AUC**: 0.85
53
+ - **Metric Focus**: Optimized for balanced precision and recall, given the critical nature of both false positives and false negatives
54
 
55
+ ## Feature Importance
56
 
57
+ The model relies on the following features, ranked by importance:
 
 
 
 
 
 
 
 
 
 
58
 
59
+ 1. Age (25%)
60
+ 2. Average Glucose Level (20%)
61
+ 3. Hypertension (15%)
62
+ 4. Heart Disease (15%)
63
+ 5. BMI (10%)
64
+ 6. Smoking Status (8%)
65
+ 7. Gender (4%)
66
+ 8. Other factors (3%)
67
 
68
+ ## Preprocessing Pipeline
 
 
 
69
 
70
+ 1. **Numeric Features**:
71
+ - Age
72
+ - Average Glucose Level
73
+ - BMI
74
+
75
+ Processing: Standard scaling (mean=0, std=1)
76
 
77
+ 2. **Categorical Features**:
78
+ - Gender
79
+ - Hypertension
80
+ - Heart Disease
81
+ - Ever Married
82
+ - Work Type
83
+ - Residence Type
84
+ - Smoking Status
85
+
86
+ Processing: One-hot encoding
87
+
88
+ ## Limitations
89
+
90
+ - The model has been trained on a dataset that may not be representative of all populations and demographics
91
+ - May have lower accuracy for edge cases or unusual medical conditions
92
+ - Does not consider family history or genetic factors that might contribute to stroke risk
93
+ - The model should not replace professional medical advice
94
+
95
+ ## Ethical Considerations
96
+
97
+ - This model is designed for risk assessment only and should be used as one tool among many in healthcare decision-making
98
+ - Model makes predictions based on correlations in data, not causative relationships
99
+ - Results should be interpreted by healthcare professionals with domain expertise
100
+ - Care should be taken to avoid potential biases in healthcare access or treatment based solely on model predictions
101
+
102
+ ## Citation
103
+
104
+ If you use this model in research, please cite:
105
+
106
+ ```
107
+ @misc{brainwise-stroke-prediction,
108
+ author = {BrainWise Health},
109
+ title = {Stroke Risk Prediction Model},
110
+ year = {2023},
111
+ publisher = {Hugging Face},
112
+ url = {https://huggingface.co/spaces/abdullah1211-ml-stroke}
113
+ }
114
+ ```
115
 
116
  ## Usage
117