Upload hospital-readmission-logistic-regression training results and visualizations

Browse files

Files changed (12) hide show

.gitattributes +5 -0
README.md +197 -0
calibration_curve.png +3 -0
confusion_matrix.png +3 -0
cv_results_analysis.png +3 -0
logistic_regression.pkl +3 -0
logistic_regression_cv_fold_details.json +127 -0
logistic_regression_metadata.pkl +3 -0
logistic_regression_metrics.json +20 -0
logistic_regression_training_summary.json +64 -0
precision_recall_curve.png +3 -0
roc_curve.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+calibration_curve.png filter=lfs diff=lfs merge=lfs -text
+confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
+cv_results_analysis.png filter=lfs diff=lfs merge=lfs -text
+precision_recall_curve.png filter=lfs diff=lfs merge=lfs -text
+roc_curve.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,197 @@

+---
+tags:
+- healthcare
+- clinical-ml
+- diabetes
+- readmission-prediction
+- lightgbm
+- gradient-boosting
+library_name: lightgbm
+pipeline_tag: tabular-classification
+---
+# hospital-readmission-logistic-regression - Hospital Readmission Risk Prediction
+## Model Description
+This hospital-readmission-logistic-regression model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation.
+**Task:** Hospital 30-Day Readmission Risk Prediction
+**Model Type:** Gradient Boosting Machine (LightGBM)
+**Training Date:** 2025-12-10 03:11:28
+**Environment:** kaggle (CPU)
+## Performance Metrics
+### Cross-Validation Results (5-Fold CV)
+| Metric | Value |
+|--------|-------|
+| Mean ROC-AUC | 0.8115 ± 0.0062 |
+### Final Test Set Results
+#### Primary Metrics
+| Metric | Value |
+|--------|-------|
+| ROC-AUC | 0.8075 |
+| PR-AUC | 0.3009 |
+| F1 Score | 0.4066 |
+#### Classification Metrics
+| Metric | Value |
+|--------|-------|
+| Precision | 0.2782 |
+| Recall | 0.7553 |
+#### Clinical Metrics
+| Metric | Value |
+|--------|-------|
+| Sensitivity (TPR) | 0.7553 |
+| Specificity (TNR) | 0.7538 |
+## Model Visualizations
+### ROC Curve
+![ROC Curve](./roc_curve.png)
+### Precision-Recall Curve
+![Precision-Recall Curve](./precision_recall_curve.png)
+### Confusion Matrix
+![Confusion Matrix](./confusion_matrix.png)
+### Calibration Curve
+![Calibration Curve](./calibration_curve.png)
+### Feature Importance
+![Feature Importance](./feature_importance.png)
+### Learning Curves
+![Learning Curves](./learning_curves.png)
+### Validation Curves
+![Validation Curves](./validation_curves.png)
+### Cross-Fold Metrics Comparison
+![Metrics Comparison](./metrics_comparison_across_folds.png)
+## Dataset Information
+| Property | Value |
+|----------|-------|
+| Total Samples | 101,766 |
+| Features | 113 |
+| Development Set | 86,501 |
+| Final Test Set | 15,265 |
+## Training Configuration
+### Evaluation Pipeline
+- **Final Holdout Split:** Stratified split into development and test sets
+- **Hyperparameter Search:** Grid search with 5-fold cross-validation
+- **Nested Early Stopping:** Inner validation split within each fold
+- **Final Evaluation:** Untouched holdout test set
+### Best Hyperparameters
+```python
+{
+  "C": 0.1,
+  "class_weight": "balanced",
+  "max_iter": 2000,
+  "penalty": "l1",
+  "solver": "liblinear"
+}
+```
+## Training Details
+- **Total Training Time:** 107.28 minutes
+- **Hyperparameter Search Time:** 0.00 minutes
+- **Cross-Validation Folds:** 5
+- **Early Stopping:** Yes
+- **Device:** CPU
+## Usage
+### Loading the Model
+```python
+import joblib
+import pandas as pd
+# Load the trained model
+model = joblib.load('gradient_boosting_model.joblib')
+# Load your preprocessed features
+X_new = pd.read_csv('your_features.csv')
+# Make predictions
+predictions = model.predict(X_new)
+probabilities = model.predict_proba(X_new)[:, 1]
+```
+### Feature Requirements
+The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include:
+- Patient demographics (age, gender, race)
+- Admission details (admission type, source, length of stay)
+- Medical history (number of diagnoses, procedures)
+- Medication information
+- Lab results (A1c test results, glucose serum test)
+- Previous utilization (outpatient, inpatient, emergency visits)
+See `feature_importance.csv` for complete feature list and importance scores.
+## Limitations and Biases
+- **Domain-Specific:** Model is trained specifically for diabetic patient readmissions
+- **Dataset Bias:** Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings
+- **Class Imbalance:** Dataset may have imbalanced readmission rates
+- **Temporal Drift:** Healthcare practices have evolved since data collection
+- **Geographic Limitation:** US-based dataset may not apply to other healthcare systems
+## Ethical Considerations
+This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should:
+- **NOT** be used as the sole basis for treatment decisions
+- Be validated on your specific patient population before deployment
+- Be monitored for fairness across different demographic groups
+- Be regularly retrained with recent data to account for changing patterns
+## Citation
+```bibtex
+@misc{hospital-readmission-lgbm,
+  author = {Your Name},
+  title = {LightGBM Model for Hospital Readmission Prediction},
+  year = {2025},
+  url = {https://huggingface.co/your-repo}
+}
+```
+## Dataset Citation
+```bibtex
+@misc{strack2014impact,
+  title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
+  author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
+  journal={BioMed Research International},
+  volume={2014},
+  year={2014},
+  publisher={Hindawi}
+}
+```
+## License
+This model is released under the MIT License. The underlying dataset has its own license terms.
+## Contact
+For questions or issues, please open an issue in the repository.
+---
+**Disclaimer:** This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.

calibration_curve.png ADDED Viewed

Git LFS Details

SHA256: 7e8dcd29699362ddfda19cfc450b5f04ce8654fe3ebbae4f9a29a5e0e1d3d5a3
Pointer size: 131 Bytes
Size of remote file: 160 kB

confusion_matrix.png ADDED Viewed

Git LFS Details

SHA256: 558ccebc8619869aa63696baaba58cc88638df5e6b50014785a80789f089bc0d
Pointer size: 131 Bytes
Size of remote file: 124 kB

cv_results_analysis.png ADDED Viewed

Git LFS Details

SHA256: 8e2a77a926ec754dc65171bc96b0aec689e8bae906fdc285bd505348dc455ba9
Pointer size: 131 Bytes
Size of remote file: 302 kB

logistic_regression.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9ee1d425237e10a6fbef180c70223030dd99ab360b04f51718810ec28cdfad5
+size 1631

logistic_regression_cv_fold_details.json ADDED Viewed

	@@ -0,0 +1,127 @@

+[
+  {
+    "fold": 1,
+    "metrics": {
+      "roc_auc": 0.8122679751356745,
+      "pr_auc": 0.3028558665578993,
+      "precision": 0.279434546862897,
+      "recall": 0.7472812014500259,
+      "f1": 0.4067653276955603,
+      "accuracy": 0.7567192647823825,
+      "balanced_accuracy": 0.7525931056046486,
+      "sensitivity": 0.7472812014500259,
+      "specificity": 0.7579050097592713,
+      "ppv": 0.279434546862897,
+      "npv": 0.9597923704375052,
+      "fpr": 0.2420949902407287,
+      "fnr": 0.2527187985499741,
+      "true_positives": 1443,
+      "true_negatives": 11649,
+      "false_positives": 3721,
+      "false_negatives": 488,
+      "brier_score": 0.17748822906267203
+    },
+    "train_size": 69200,
+    "test_size": 17301
+  },
+  {
+    "fold": 2,
+    "metrics": {
+      "roc_auc": 0.8159403454006695,
+      "pr_auc": 0.30635822352791775,
+      "precision": 0.2831858407079646,
+      "recall": 0.7626943005181347,
+      "f1": 0.41301907968574636,
+      "accuracy": 0.758150289017341,
+      "balanced_accuracy": 0.7601370006169073,
+      "sensitivity": 0.7626943005181347,
+      "specificity": 0.7575797007156799,
+      "ppv": 0.2831858407079646,
+      "npv": 0.9621550156998843,
+      "fpr": 0.2424202992843201,
+      "fnr": 0.23730569948186528,
+      "true_positives": 1472,
+      "true_negatives": 11644,
+      "false_positives": 3726,
+      "false_negatives": 458,
+      "brier_score": 0.17780692500569936
+    },
+    "train_size": 69201,
+    "test_size": 17300
+  },
+  {
+    "fold": 3,
+    "metrics": {
+      "roc_auc": 0.8091322844785448,
+      "pr_auc": 0.30491163142671524,
+      "precision": 0.27534866189219753,
+      "recall": 0.7569948186528498,
+      "f1": 0.4038142620232173,
+      "accuracy": 0.750635838150289,
+      "balanced_accuracy": 0.753416082065527,
+      "sensitivity": 0.7569948186528498,
+      "specificity": 0.7498373454782042,
+      "ppv": 0.27534866189219753,
+      "npv": 0.9608971152242788,
+      "fpr": 0.2501626545217957,
+      "fnr": 0.24300518134715027,
+      "true_positives": 1461,
+      "true_negatives": 11525,
+      "false_positives": 3845,
+      "false_negatives": 469,
+      "brier_score": 0.1788931634704443
+    },
+    "train_size": 69201,
+    "test_size": 17300
+  },
+  {
+    "fold": 4,
+    "metrics": {
+      "roc_auc": 0.8023195252140012,
+      "pr_auc": 0.28982929011467456,
+      "precision": 0.270821421764818,
+      "recall": 0.7358881408596583,
+      "f1": 0.39593201448871557,
+      "accuracy": 0.749364161849711,
+      "balanced_accuracy": 0.7434727320213446,
+      "sensitivity": 0.7358881408596583,
+      "specificity": 0.7510573231830308,
+      "ppv": 0.270821421764818,
+      "npv": 0.9576868829337094,
+      "fpr": 0.24894267681696922,
+      "fnr": 0.2641118591403418,
+      "true_positives": 1421,
+      "true_negatives": 11543,
+      "false_positives": 3826,
+      "false_negatives": 510,
+      "brier_score": 0.18181133483024475
+    },
+    "train_size": 69201,
+    "test_size": 17300
+  },
+  {
+    "fold": 5,
+    "metrics": {
+      "roc_auc": 0.8180064391457796,
+      "pr_auc": 0.30548944186346383,
+      "precision": 0.2858253937390628,
+      "recall": 0.7612635939927499,
+      "f1": 0.4156064461407972,
+      "accuracy": 0.7610404624277457,
+      "balanced_accuracy": 0.761138010803389,
+      "sensitivity": 0.7612635939927499,
+      "specificity": 0.7610124276140282,
+      "ppv": 0.2858253937390628,
+      "npv": 0.9620794603931891,
+      "fpr": 0.23898757238597176,
+      "fnr": 0.23873640600725013,
+      "true_positives": 1470,
+      "true_negatives": 11696,
+      "false_positives": 3673,
+      "false_negatives": 461,
+      "brier_score": 0.17472857169481126
+    },
+    "train_size": 69201,
+    "test_size": 17300
+  }
+]

logistic_regression_metadata.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01182053796102d87d817b171997e02c4314355bcd72479dff199977d3b6a4ce
+size 2660

logistic_regression_metrics.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "roc_auc": 0.8075390869910365,
+  "pr_auc": 0.30086630344304616,
+  "precision": 0.2782101167315175,
+  "recall": 0.7552816901408451,
+  "f1": 0.4066350710900474,
+  "accuracy": 0.753946937438585,
+  "balanced_accuracy": 0.7545304549811961,
+  "sensitivity": 0.7552816901408451,
+  "specificity": 0.753779219821547,
+  "ppv": 0.2782101167315175,
+  "npv": 0.9608045868972648,
+  "fpr": 0.24622078017845292,
+  "fnr": 0.24471830985915494,
+  "true_positives": 1287,
+  "true_negatives": 10222,
+  "false_positives": 3339,
+  "false_negatives": 417,
+  "brier_score": 0.17877245555730717
+}

logistic_regression_training_summary.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "model": "Logistic Regression",
+  "task": "Hospital 30-Day Readmission Risk Prediction",
+  "timestamp": "2025-12-10 03:11:28",
+  "evaluation_pipeline": {
+    "description": "Robust nested CV with final holdout and validation monitoring",
+    "final_holdout_size": 0.15,
+    "inner_val_size": 0.15,
+    "k_folds": 5,
+    "cv_strategy": "StratifiedKFold"
+  },
+  "data": {
+    "total_samples": 101766,
+    "development_size": 86501,
+    "dev_train_size": 73525,
+    "dev_val_size": 12976,
+    "final_test_size": 15265,
+    "n_features": 113
+  },
+  "best_hyperparameters": {
+    "C": 0.1,
+    "class_weight": "balanced",
+    "max_iter": 2000,
+    "penalty": "l1",
+    "solver": "liblinear"
+  },
+  "cross_validation": {
+    "mean_roc_auc": 0.8115333138749339,
+    "std_roc_auc": 0.00617498681090857,
+    "fold_scores": [
+      0.8122679751356745,
+      0.8159403454006695,
+      0.8091322844785448,
+      0.8023195252140012,
+      0.8180064391457796
+    ],
+    "n_folds": 5
+  },
+  "validation_monitoring": {
+    "dev_val_auc": 0.8071556977774029
+  },
+  "final_test_metrics": {
+    "roc_auc": 0.8075390869910365,
+    "pr_auc": 0.30086630344304616,
+    "precision": 0.2782101167315175,
+    "recall": 0.7552816901408451,
+    "f1": 0.4066350710900474,
+    "accuracy": 0.753946937438585,
+    "balanced_accuracy": 0.7545304549811961,
+    "sensitivity": 0.7552816901408451,
+    "specificity": 0.753779219821547,
+    "ppv": 0.2782101167315175,
+    "npv": 0.9608045868972648,
+    "fpr": 0.24622078017845292,
+    "fnr": 0.24471830985915494,
+    "true_positives": 1287,
+    "true_negatives": 10222,
+    "false_positives": 3339,
+    "false_negatives": 417,
+    "brier_score": 0.17877245555730717
+  },
+  "total_time_seconds": 6436.704177379608,
+  "random_state": 42
+}

precision_recall_curve.png ADDED Viewed

Git LFS Details

SHA256: 5927e55ad83f1429593a9bbfd98c5b74432d0fa995c8ee8657833256b98354c7
Pointer size: 131 Bytes
Size of remote file: 133 kB

roc_curve.png ADDED Viewed

Git LFS Details

SHA256: 624352325bb0fa142d3851b557e3a2aa43a7c9e5e04bb795f5e57020cea5a687
Pointer size: 131 Bytes
Size of remote file: 191 kB