auphong2707 commited on
Commit
1f035b7
·
verified ·
1 Parent(s): bba7dd7

Upload hospital-readmission-logistic-regression training results and visualizations

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ calibration_curve.png filter=lfs diff=lfs merge=lfs -text
37
+ confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
38
+ cv_results_analysis.png filter=lfs diff=lfs merge=lfs -text
39
+ precision_recall_curve.png filter=lfs diff=lfs merge=lfs -text
40
+ roc_curve.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - healthcare
4
+ - clinical-ml
5
+ - diabetes
6
+ - readmission-prediction
7
+ - lightgbm
8
+ - gradient-boosting
9
+ library_name: lightgbm
10
+ pipeline_tag: tabular-classification
11
+ ---
12
+
13
+ # hospital-readmission-logistic-regression - Hospital Readmission Risk Prediction
14
+
15
+ ## Model Description
16
+
17
+ This hospital-readmission-logistic-regression model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation.
18
+
19
+ **Task:** Hospital 30-Day Readmission Risk Prediction
20
+ **Model Type:** Gradient Boosting Machine (LightGBM)
21
+ **Training Date:** 2025-12-10 03:11:28
22
+ **Environment:** kaggle (CPU)
23
+
24
+ ## Performance Metrics
25
+
26
+ ### Cross-Validation Results (5-Fold CV)
27
+
28
+ | Metric | Value |
29
+ |--------|-------|
30
+ | Mean ROC-AUC | 0.8115 ± 0.0062 |
31
+
32
+ ### Final Test Set Results
33
+
34
+ #### Primary Metrics
35
+ | Metric | Value |
36
+ |--------|-------|
37
+ | ROC-AUC | 0.8075 |
38
+ | PR-AUC | 0.3009 |
39
+ | F1 Score | 0.4066 |
40
+
41
+ #### Classification Metrics
42
+ | Metric | Value |
43
+ |--------|-------|
44
+ | Precision | 0.2782 |
45
+ | Recall | 0.7553 |
46
+
47
+ #### Clinical Metrics
48
+ | Metric | Value |
49
+ |--------|-------|
50
+ | Sensitivity (TPR) | 0.7553 |
51
+ | Specificity (TNR) | 0.7538 |
52
+
53
+ ## Model Visualizations
54
+
55
+ ### ROC Curve
56
+ ![ROC Curve](./roc_curve.png)
57
+
58
+ ### Precision-Recall Curve
59
+ ![Precision-Recall Curve](./precision_recall_curve.png)
60
+
61
+ ### Confusion Matrix
62
+ ![Confusion Matrix](./confusion_matrix.png)
63
+
64
+ ### Calibration Curve
65
+ ![Calibration Curve](./calibration_curve.png)
66
+
67
+ ### Feature Importance
68
+ ![Feature Importance](./feature_importance.png)
69
+
70
+ ### Learning Curves
71
+ ![Learning Curves](./learning_curves.png)
72
+
73
+ ### Validation Curves
74
+ ![Validation Curves](./validation_curves.png)
75
+
76
+ ### Cross-Fold Metrics Comparison
77
+ ![Metrics Comparison](./metrics_comparison_across_folds.png)
78
+
79
+ ## Dataset Information
80
+
81
+ | Property | Value |
82
+ |----------|-------|
83
+ | Total Samples | 101,766 |
84
+ | Features | 113 |
85
+ | Development Set | 86,501 |
86
+ | Final Test Set | 15,265 |
87
+
88
+ ## Training Configuration
89
+
90
+ ### Evaluation Pipeline
91
+ - **Final Holdout Split:** Stratified split into development and test sets
92
+ - **Hyperparameter Search:** Grid search with 5-fold cross-validation
93
+ - **Nested Early Stopping:** Inner validation split within each fold
94
+ - **Final Evaluation:** Untouched holdout test set
95
+
96
+ ### Best Hyperparameters
97
+
98
+ ```python
99
+ {
100
+ "C": 0.1,
101
+ "class_weight": "balanced",
102
+ "max_iter": 2000,
103
+ "penalty": "l1",
104
+ "solver": "liblinear"
105
+ }
106
+ ```
107
+
108
+ ## Training Details
109
+
110
+ - **Total Training Time:** 107.28 minutes
111
+ - **Hyperparameter Search Time:** 0.00 minutes
112
+ - **Cross-Validation Folds:** 5
113
+ - **Early Stopping:** Yes
114
+ - **Device:** CPU
115
+
116
+ ## Usage
117
+
118
+ ### Loading the Model
119
+
120
+ ```python
121
+ import joblib
122
+ import pandas as pd
123
+
124
+ # Load the trained model
125
+ model = joblib.load('gradient_boosting_model.joblib')
126
+
127
+ # Load your preprocessed features
128
+ X_new = pd.read_csv('your_features.csv')
129
+
130
+ # Make predictions
131
+ predictions = model.predict(X_new)
132
+ probabilities = model.predict_proba(X_new)[:, 1]
133
+ ```
134
+
135
+ ### Feature Requirements
136
+
137
+ The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include:
138
+ - Patient demographics (age, gender, race)
139
+ - Admission details (admission type, source, length of stay)
140
+ - Medical history (number of diagnoses, procedures)
141
+ - Medication information
142
+ - Lab results (A1c test results, glucose serum test)
143
+ - Previous utilization (outpatient, inpatient, emergency visits)
144
+
145
+ See `feature_importance.csv` for complete feature list and importance scores.
146
+
147
+ ## Limitations and Biases
148
+
149
+ - **Domain-Specific:** Model is trained specifically for diabetic patient readmissions
150
+ - **Dataset Bias:** Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings
151
+ - **Class Imbalance:** Dataset may have imbalanced readmission rates
152
+ - **Temporal Drift:** Healthcare practices have evolved since data collection
153
+ - **Geographic Limitation:** US-based dataset may not apply to other healthcare systems
154
+
155
+ ## Ethical Considerations
156
+
157
+ This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should:
158
+ - **NOT** be used as the sole basis for treatment decisions
159
+ - Be validated on your specific patient population before deployment
160
+ - Be monitored for fairness across different demographic groups
161
+ - Be regularly retrained with recent data to account for changing patterns
162
+
163
+ ## Citation
164
+
165
+ ```bibtex
166
+ @misc{hospital-readmission-lgbm,
167
+ author = {Your Name},
168
+ title = {LightGBM Model for Hospital Readmission Prediction},
169
+ year = {2025},
170
+ url = {https://huggingface.co/your-repo}
171
+ }
172
+ ```
173
+
174
+ ## Dataset Citation
175
+
176
+ ```bibtex
177
+ @misc{strack2014impact,
178
+ title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
179
+ author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
180
+ journal={BioMed Research International},
181
+ volume={2014},
182
+ year={2014},
183
+ publisher={Hindawi}
184
+ }
185
+ ```
186
+
187
+ ## License
188
+
189
+ This model is released under the MIT License. The underlying dataset has its own license terms.
190
+
191
+ ## Contact
192
+
193
+ For questions or issues, please open an issue in the repository.
194
+
195
+ ---
196
+
197
+ **Disclaimer:** This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.
calibration_curve.png ADDED

Git LFS Details

  • SHA256: 7e8dcd29699362ddfda19cfc450b5f04ce8654fe3ebbae4f9a29a5e0e1d3d5a3
  • Pointer size: 131 Bytes
  • Size of remote file: 160 kB
confusion_matrix.png ADDED

Git LFS Details

  • SHA256: 558ccebc8619869aa63696baaba58cc88638df5e6b50014785a80789f089bc0d
  • Pointer size: 131 Bytes
  • Size of remote file: 124 kB
cv_results_analysis.png ADDED

Git LFS Details

  • SHA256: 8e2a77a926ec754dc65171bc96b0aec689e8bae906fdc285bd505348dc455ba9
  • Pointer size: 131 Bytes
  • Size of remote file: 302 kB
logistic_regression.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9ee1d425237e10a6fbef180c70223030dd99ab360b04f51718810ec28cdfad5
3
+ size 1631
logistic_regression_cv_fold_details.json ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "fold": 1,
4
+ "metrics": {
5
+ "roc_auc": 0.8122679751356745,
6
+ "pr_auc": 0.3028558665578993,
7
+ "precision": 0.279434546862897,
8
+ "recall": 0.7472812014500259,
9
+ "f1": 0.4067653276955603,
10
+ "accuracy": 0.7567192647823825,
11
+ "balanced_accuracy": 0.7525931056046486,
12
+ "sensitivity": 0.7472812014500259,
13
+ "specificity": 0.7579050097592713,
14
+ "ppv": 0.279434546862897,
15
+ "npv": 0.9597923704375052,
16
+ "fpr": 0.2420949902407287,
17
+ "fnr": 0.2527187985499741,
18
+ "true_positives": 1443,
19
+ "true_negatives": 11649,
20
+ "false_positives": 3721,
21
+ "false_negatives": 488,
22
+ "brier_score": 0.17748822906267203
23
+ },
24
+ "train_size": 69200,
25
+ "test_size": 17301
26
+ },
27
+ {
28
+ "fold": 2,
29
+ "metrics": {
30
+ "roc_auc": 0.8159403454006695,
31
+ "pr_auc": 0.30635822352791775,
32
+ "precision": 0.2831858407079646,
33
+ "recall": 0.7626943005181347,
34
+ "f1": 0.41301907968574636,
35
+ "accuracy": 0.758150289017341,
36
+ "balanced_accuracy": 0.7601370006169073,
37
+ "sensitivity": 0.7626943005181347,
38
+ "specificity": 0.7575797007156799,
39
+ "ppv": 0.2831858407079646,
40
+ "npv": 0.9621550156998843,
41
+ "fpr": 0.2424202992843201,
42
+ "fnr": 0.23730569948186528,
43
+ "true_positives": 1472,
44
+ "true_negatives": 11644,
45
+ "false_positives": 3726,
46
+ "false_negatives": 458,
47
+ "brier_score": 0.17780692500569936
48
+ },
49
+ "train_size": 69201,
50
+ "test_size": 17300
51
+ },
52
+ {
53
+ "fold": 3,
54
+ "metrics": {
55
+ "roc_auc": 0.8091322844785448,
56
+ "pr_auc": 0.30491163142671524,
57
+ "precision": 0.27534866189219753,
58
+ "recall": 0.7569948186528498,
59
+ "f1": 0.4038142620232173,
60
+ "accuracy": 0.750635838150289,
61
+ "balanced_accuracy": 0.753416082065527,
62
+ "sensitivity": 0.7569948186528498,
63
+ "specificity": 0.7498373454782042,
64
+ "ppv": 0.27534866189219753,
65
+ "npv": 0.9608971152242788,
66
+ "fpr": 0.2501626545217957,
67
+ "fnr": 0.24300518134715027,
68
+ "true_positives": 1461,
69
+ "true_negatives": 11525,
70
+ "false_positives": 3845,
71
+ "false_negatives": 469,
72
+ "brier_score": 0.1788931634704443
73
+ },
74
+ "train_size": 69201,
75
+ "test_size": 17300
76
+ },
77
+ {
78
+ "fold": 4,
79
+ "metrics": {
80
+ "roc_auc": 0.8023195252140012,
81
+ "pr_auc": 0.28982929011467456,
82
+ "precision": 0.270821421764818,
83
+ "recall": 0.7358881408596583,
84
+ "f1": 0.39593201448871557,
85
+ "accuracy": 0.749364161849711,
86
+ "balanced_accuracy": 0.7434727320213446,
87
+ "sensitivity": 0.7358881408596583,
88
+ "specificity": 0.7510573231830308,
89
+ "ppv": 0.270821421764818,
90
+ "npv": 0.9576868829337094,
91
+ "fpr": 0.24894267681696922,
92
+ "fnr": 0.2641118591403418,
93
+ "true_positives": 1421,
94
+ "true_negatives": 11543,
95
+ "false_positives": 3826,
96
+ "false_negatives": 510,
97
+ "brier_score": 0.18181133483024475
98
+ },
99
+ "train_size": 69201,
100
+ "test_size": 17300
101
+ },
102
+ {
103
+ "fold": 5,
104
+ "metrics": {
105
+ "roc_auc": 0.8180064391457796,
106
+ "pr_auc": 0.30548944186346383,
107
+ "precision": 0.2858253937390628,
108
+ "recall": 0.7612635939927499,
109
+ "f1": 0.4156064461407972,
110
+ "accuracy": 0.7610404624277457,
111
+ "balanced_accuracy": 0.761138010803389,
112
+ "sensitivity": 0.7612635939927499,
113
+ "specificity": 0.7610124276140282,
114
+ "ppv": 0.2858253937390628,
115
+ "npv": 0.9620794603931891,
116
+ "fpr": 0.23898757238597176,
117
+ "fnr": 0.23873640600725013,
118
+ "true_positives": 1470,
119
+ "true_negatives": 11696,
120
+ "false_positives": 3673,
121
+ "false_negatives": 461,
122
+ "brier_score": 0.17472857169481126
123
+ },
124
+ "train_size": 69201,
125
+ "test_size": 17300
126
+ }
127
+ ]
logistic_regression_metadata.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01182053796102d87d817b171997e02c4314355bcd72479dff199977d3b6a4ce
3
+ size 2660
logistic_regression_metrics.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "roc_auc": 0.8075390869910365,
3
+ "pr_auc": 0.30086630344304616,
4
+ "precision": 0.2782101167315175,
5
+ "recall": 0.7552816901408451,
6
+ "f1": 0.4066350710900474,
7
+ "accuracy": 0.753946937438585,
8
+ "balanced_accuracy": 0.7545304549811961,
9
+ "sensitivity": 0.7552816901408451,
10
+ "specificity": 0.753779219821547,
11
+ "ppv": 0.2782101167315175,
12
+ "npv": 0.9608045868972648,
13
+ "fpr": 0.24622078017845292,
14
+ "fnr": 0.24471830985915494,
15
+ "true_positives": 1287,
16
+ "true_negatives": 10222,
17
+ "false_positives": 3339,
18
+ "false_negatives": 417,
19
+ "brier_score": 0.17877245555730717
20
+ }
logistic_regression_training_summary.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "Logistic Regression",
3
+ "task": "Hospital 30-Day Readmission Risk Prediction",
4
+ "timestamp": "2025-12-10 03:11:28",
5
+ "evaluation_pipeline": {
6
+ "description": "Robust nested CV with final holdout and validation monitoring",
7
+ "final_holdout_size": 0.15,
8
+ "inner_val_size": 0.15,
9
+ "k_folds": 5,
10
+ "cv_strategy": "StratifiedKFold"
11
+ },
12
+ "data": {
13
+ "total_samples": 101766,
14
+ "development_size": 86501,
15
+ "dev_train_size": 73525,
16
+ "dev_val_size": 12976,
17
+ "final_test_size": 15265,
18
+ "n_features": 113
19
+ },
20
+ "best_hyperparameters": {
21
+ "C": 0.1,
22
+ "class_weight": "balanced",
23
+ "max_iter": 2000,
24
+ "penalty": "l1",
25
+ "solver": "liblinear"
26
+ },
27
+ "cross_validation": {
28
+ "mean_roc_auc": 0.8115333138749339,
29
+ "std_roc_auc": 0.00617498681090857,
30
+ "fold_scores": [
31
+ 0.8122679751356745,
32
+ 0.8159403454006695,
33
+ 0.8091322844785448,
34
+ 0.8023195252140012,
35
+ 0.8180064391457796
36
+ ],
37
+ "n_folds": 5
38
+ },
39
+ "validation_monitoring": {
40
+ "dev_val_auc": 0.8071556977774029
41
+ },
42
+ "final_test_metrics": {
43
+ "roc_auc": 0.8075390869910365,
44
+ "pr_auc": 0.30086630344304616,
45
+ "precision": 0.2782101167315175,
46
+ "recall": 0.7552816901408451,
47
+ "f1": 0.4066350710900474,
48
+ "accuracy": 0.753946937438585,
49
+ "balanced_accuracy": 0.7545304549811961,
50
+ "sensitivity": 0.7552816901408451,
51
+ "specificity": 0.753779219821547,
52
+ "ppv": 0.2782101167315175,
53
+ "npv": 0.9608045868972648,
54
+ "fpr": 0.24622078017845292,
55
+ "fnr": 0.24471830985915494,
56
+ "true_positives": 1287,
57
+ "true_negatives": 10222,
58
+ "false_positives": 3339,
59
+ "false_negatives": 417,
60
+ "brier_score": 0.17877245555730717
61
+ },
62
+ "total_time_seconds": 6436.704177379608,
63
+ "random_state": 42
64
+ }
precision_recall_curve.png ADDED

Git LFS Details

  • SHA256: 5927e55ad83f1429593a9bbfd98c5b74432d0fa995c8ee8657833256b98354c7
  • Pointer size: 131 Bytes
  • Size of remote file: 133 kB
roc_curve.png ADDED

Git LFS Details

  • SHA256: 624352325bb0fa142d3851b557e3a2aa43a7c9e5e04bb795f5e57020cea5a687
  • Pointer size: 131 Bytes
  • Size of remote file: 191 kB