Gradient Boosting (LightGBM) - Calibrated for Hospital Readmission Prediction

Model Description

This is a calibrated Gradient Boosting (LightGBM) model for predicting 30-day hospital readmission risk in diabetic patients. The model has been calibrated using PLATT to ensure that predicted probabilities accurately reflect true readmission risk.

Calibration Method: PLATT
Calibration Date: 2025-12-10 08:15:54
Base Model: Gradient Boosting (LightGBM)

Why Calibration Matters

Model calibration ensures that predicted probabilities are reliable for clinical decision-making:

  • A predicted 15% risk should mean ~15 out of 100 similar patients are readmitted
  • Enables accurate risk stratification and resource allocation
  • Critical for patient safety and clinical trust
  • Required for regulatory compliance in healthcare AI

Calibration Performance

Success Criteria

Criterion Target Result Status
Brier Score < 0.15 0.0818 βœ… PASS
ECE (Β±5% accuracy) < 0.05 0.0180 βœ… PASS
Hosmer-Lemeshow Test p > 0.05 0.0000 ❌ FAIL

Overall Result: ⚠️ SOME CRITERIA NOT MET

Before vs After Calibration

Metric Uncalibrated Calibrated Improvement
Brier Score 0.0802 0.0818 -0.0016
Log Loss 0.2665 0.2734 -0.0068
ECE 0.0064 0.0180 -0.0116
ROC-AUC 0.8424 0.8424 Unchanged*

*Note: Calibration improves probability estimates without changing discrimination (ranking) ability.

Hosmer-Lemeshow Goodness-of-Fit Test

  • Uncalibrated: χ²=15.23, p=0.0185 (Poorly calibrated)
  • Calibrated: χ²=136.54, p=0.0000 (Poorly calibrated)

Clinical Risk Categories

Calibrated probabilities are mapped to actionable risk categories:

Risk Level Probability Range Recommended Action
Low 0-5% Standard discharge planning
Medium 5-15% Enhanced patient education + 1-week follow-up call
High 15%+ Intensive case management + home health visit

Risk Category Validation

The table below shows how well predicted risk categories align with actual readmission rates:

Visualizations

Reliability Diagram (Calibration Curve)

Reliability Diagram

Shows how well predicted probabilities match observed frequencies. The closer to the diagonal, the better calibrated.

Reliability Comparison (Before vs After)

Reliability Comparison

Compares uncalibrated vs calibrated predictions.

Risk Distribution

Risk Distribution

Distribution of patients across risk categories.

Detailed Risk Distribution

Detailed Risk Distribution

Enhanced visualization with probability thresholds.

Usage

Loading the Calibrated Model

import joblib
import pandas as pd
import sys

# Add utilities to path
sys.path.append('./phase-3-model-calibration')
from utilities import ModelCalibrator

# Load original model
model = joblib.load('gradient_boosting_model_original.joblib')

# Load calibrator
calibrator = ModelCalibrator.load('Gradient_Boosting_(LightGBM)_calibrator.pkl')

# Load your preprocessed features (MUST use same preprocessing as training!)
X_new = pd.read_csv('your_preprocessed_features.csv')

# Step 1: Get uncalibrated predictions from original model
uncalibrated_proba = model.predict_proba(X_new)[:, 1]

# Step 2: Apply calibration
calibrated_proba = calibrator.predict_proba(uncalibrated_proba)

# Create results DataFrame
results = pd.DataFrame({
    'patient_id': X_new.index,
    'uncalibrated_probability': uncalibrated_proba,
    'calibrated_probability': calibrated_proba
})

# Display results
print(results.head(10))

# Example output:
#    patient_id  uncalibrated_probability  calibrated_probability risk_category  recommended_action
# 0           0                    0.0834                  0.0234           Low  Standard discharge
# 1           1                    0.2341                  0.1876          High  Intensive case management, home visit
# 2           2                    0.1123                  0.0891        Medium  Enhanced education, 1-week follow-up

Quick Prediction Pipeline

def predict_readmission_risk(patient_features):
    """
    Complete pipeline for readmission risk prediction.
    
    Args:
        patient_features: DataFrame with preprocessed patient features
        
    Returns:
        DataFrame with calibrated probabilities
    """
    # Load models
    model = joblib.load('gradient_boosting_model_original.joblib')
    calibrator = ModelCalibrator.load('Gradient_Boosting_(LightGBM)_calibrator.pkl')
    
    # Generate calibrated predictions
    uncalibrated = model.predict_proba(patient_features)[:, 1]
    calibrated = calibrator.predict_proba(uncalibrated)
    
    # Return calibrated probabilities
    results = pd.DataFrame({
        'readmission_probability': calibrated
    })
    
    return results

# Use the pipeline
predictions = predict_readmission_risk(X_new)
print(predictions)

Important Notes

1. Preprocessing Requirements

Input features MUST be preprocessed using the exact same pipeline as training:

  • Same missing value imputation
  • Same feature engineering
  • Same scaling/encoding
  • Same feature set

See phase-1-data-explore-preprocessing/simple_preprocessing.py for the preprocessing pipeline.

2. Calibration Preserves Discrimination

  • Calibration improves probability estimates
  • Does NOT change model's ranking ability (ROC-AUC stays the same)
  • Patients ranked as higher risk remain higher risk

3. When to Recalibrate

Recalibrate the model when:

  • Patient population characteristics change
  • Healthcare practices evolve
  • Model performance degrades
  • Annually as a best practice

4. Clinical Validation Required

Before deployment:

  • Validate risk thresholds with clinical experts
  • Test on local patient population
  • Ensure alignment with clinical workflows
  • Obtain necessary regulatory approvals

Files Included

  • gradient_boosting_model_original.joblib - Original trained model
  • Gradient_Boosting_(LightGBM)_calibrator.pkl - Calibration transformer (platt)
  • Gradient_Boosting_(LightGBM)_report.txt - Detailed calibration report
  • Gradient_Boosting_(LightGBM)_metrics.json - Metrics in JSON format
  • calibration_comparison_metrics.json - Before/after comparison
  • risk_validation_detailed.csv - Risk category validation table
  • reliability_diagram*.png - Calibration visualizations
  • risk_distribution*.png - Risk distribution plots
  • DEPLOYMENT_INSTRUCTIONS.md - Deployment guide

Limitations and Ethical Considerations

Limitations

  1. Domain-Specific: Trained for diabetic patient readmissions only
  2. Temporal Drift: Data from 1999-2008 may not reflect current practices
  3. Geographic Bias: US hospital data may not generalize internationally
  4. Population Shift: Recalibration needed if patient demographics change

Ethical Considerations

This model should:

  • βœ… Assist clinical decision-making, not replace it
  • βœ… Be validated on your local patient population
  • βœ… Be monitored for fairness across demographic groups
  • βœ… Be recalibrated regularly with recent data
  • ❌ NOT be the sole basis for treatment decisions
  • ❌ NOT be deployed without clinical expert validation

Fairness

  • Evaluate calibration quality separately for different demographic groups
  • Monitor for disparate impact across protected attributes
  • Consider group-specific calibration if needed
  • Document fairness metrics for regulatory compliance

Citation

@misc{hospital-readmission-calibrated,
  title={Calibrated Model for Hospital Readmission Prediction},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/your-repo}}
}

Dataset Citation

@article{strack2014impact,
  title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
  author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
  journal={BioMed Research International},
  volume={2014},
  year={2014},
  publisher={Hindawi}
}

License

This calibrated model is released under the MIT License. The underlying dataset and original model have their own license terms.

Contact

For questions or issues, please open an issue in the repository.


Disclaimer: This model is for research and educational purposes. Always consult healthcare professionals for medical decisions. Regular monitoring and recalibration are essential for safe deployment.

Last Updated: 2025-12-10 08:15:54

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support