hospital-readmission-logistic-regression - Hospital Readmission Risk Prediction
Model Description
This hospital-readmission-logistic-regression model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation.
Task: Hospital 30-Day Readmission Risk Prediction
Model Type: Gradient Boosting Machine (LightGBM)
Training Date: 2025-12-10 05:49:14
Environment: kaggle (CPU)
Performance Metrics
Cross-Validation Results (5-Fold CV)
| Metric | Value |
|---|---|
| Mean ROC-AUC | 0.8115 ± 0.0062 |
Final Test Set Results
Primary Metrics
| Metric | Value |
|---|---|
| ROC-AUC | 0.8075 |
| PR-AUC | 0.3009 |
| F1 Score | 0.4066 |
Classification Metrics
| Metric | Value |
|---|---|
| Precision | 0.2782 |
| Recall | 0.7553 |
Clinical Metrics
| Metric | Value |
|---|---|
| Sensitivity (TPR) | 0.7553 |
| Specificity (TNR) | 0.7538 |
Model Visualizations
ROC Curve
Precision-Recall Curve
Confusion Matrix
Calibration Curve
Feature Importance
Learning Curves
Validation Curves
Cross-Fold Metrics Comparison
Dataset Information
| Property | Value |
|---|---|
| Total Samples | 101,766 |
| Features | 113 |
| Development Set | 86,501 |
| Final Test Set | 15,265 |
Training Configuration
Evaluation Pipeline
- Final Holdout Split: Stratified split into development and test sets
- Hyperparameter Search: Grid search with 5-fold cross-validation
- Nested Early Stopping: Inner validation split within each fold
- Final Evaluation: Untouched holdout test set
Best Hyperparameters
{
"C": 0.1,
"class_weight": "balanced",
"max_iter": 2000,
"penalty": "l1",
"solver": "liblinear"
}
Training Details
- Total Training Time: 110.84 minutes
- Hyperparameter Search Time: 0.00 minutes
- Cross-Validation Folds: 5
- Early Stopping: Yes
- Device: CPU
Usage
Loading the Model
import joblib
import pandas as pd
# Load the trained model
model = joblib.load('gradient_boosting_model.joblib')
# Load your preprocessed features
X_new = pd.read_csv('your_features.csv')
# Make predictions
predictions = model.predict(X_new)
probabilities = model.predict_proba(X_new)[:, 1]
Feature Requirements
The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include:
- Patient demographics (age, gender, race)
- Admission details (admission type, source, length of stay)
- Medical history (number of diagnoses, procedures)
- Medication information
- Lab results (A1c test results, glucose serum test)
- Previous utilization (outpatient, inpatient, emergency visits)
See feature_importance.csv for complete feature list and importance scores.
Limitations and Biases
- Domain-Specific: Model is trained specifically for diabetic patient readmissions
- Dataset Bias: Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings
- Class Imbalance: Dataset may have imbalanced readmission rates
- Temporal Drift: Healthcare practices have evolved since data collection
- Geographic Limitation: US-based dataset may not apply to other healthcare systems
Ethical Considerations
This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should:
- NOT be used as the sole basis for treatment decisions
- Be validated on your specific patient population before deployment
- Be monitored for fairness across different demographic groups
- Be regularly retrained with recent data to account for changing patterns
Citation
@misc{hospital-readmission-lgbm,
author = {Your Name},
title = {LightGBM Model for Hospital Readmission Prediction},
year = {2025},
url = {https://huggingface.co/your-repo}
}
Dataset Citation
@misc{strack2014impact,
title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
journal={BioMed Research International},
volume={2014},
year={2014},
publisher={Hindawi}
}
License
This model is released under the MIT License. The underlying dataset has its own license terms.
Contact
For questions or issues, please open an issue in the repository.
Disclaimer: This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.







