CRWS-ICU Mortality Prediction Models

Clinical Risk-Weighted Score (CRWS) β€” ICU Mortality Prediction from MIMIC-IV

This repository contains four trained classifiers for predicting in-hospital mortality of ICU patients, trained and evaluated on the MIMIC-IV dataset. Models are evaluated with the Clinical Risk-Weighted Score (CRWS), a novel metric that penalises missed deaths (False Negatives) more heavily than false alarms.


Models Included

File Model F1 CRWS (w_fn=3) AUPRC FN
logistic_regression.pkl Logistic Regression 0.2659 0.4805 0.1794 564
random_forest.pkl Random Forest 0.1910 0.1470 0.2247 1220
xgboost.pkl XGBoost ⭐ 0.2978 0.5965 0.2658 289
mlp.pkl MLP Neural Network 0.2691 0.2817 0.2263 1013

⭐ XGBoost is the recommended model β€” highest CRWS, AUROC, and fewest missed deaths (FN=289).


The CRWS Metric

The Clinical Risk-Weighted Score generalises F1 to account for asymmetric clinical costs:

CRWS(w_fn, w_fp) = (1 + w_fnΒ²) Γ— TP
                   ─────────────────────────────────────────────────
                   (1 + w_fnΒ²) Γ— TP  +  w_fnΒ² Γ— FN  +  w_fp Γ— FP
  • w_fn = 3, w_fp = 1 (default): Missed deaths penalised 3Γ— more than false alarms
  • When w_fn = 1, w_fp = 1: CRWS = F1

Files

model_artifacts/
β”œβ”€β”€ scaler.pkl                 # StandardScaler (fit on SMOTE-balanced training data)
β”œβ”€β”€ logistic_regression.pkl    # Logistic Regression
β”œβ”€β”€ random_forest.pkl          # Random Forest (100 trees)
β”œβ”€β”€ xgboost.pkl                # XGBoost (scale_pos_weight=10)
β”œβ”€β”€ mlp.pkl                    # MLP (64β†’32 hidden layers)
β”œβ”€β”€ model_metadata.json        # Feature info, metrics, label encoder mappings
└── README.md                  # This file

Features

The models take 10 features derived from MIMIC-IV patients, admissions, and icustays tables:

Feature Description
los ICU Length of Stay (days)
anchor_age Patient age at admission
first_careunit First ICU care unit (label-encoded)
last_careunit Last ICU care unit (label-encoded)
gender Patient gender (label-encoded)
insurance Insurance type (label-encoded)
marital_status Marital status (label-encoded)
race Race/ethnicity (label-encoded)
admission_type Type of hospital admission (label-encoded)
admission_location Admission location (label-encoded)

Quick Start

import joblib
import numpy as np

# Load model and scaler
model  = joblib.load("xgboost.pkl")
scaler = joblib.load("scaler.pkl")

# Example input (must match feature order above)
# [los, anchor_age, first_careunit, last_careunit, gender,
#  insurance, marital_status, race, admission_type, admission_location]
X_raw = np.array([[3.5, 67, 2, 2, 1, 1, 2, 0, 1, 3]])

# Preprocess and predict
X_scaled = scaler.transform(X_raw)
prob      = model.predict_proba(X_scaled)[:, 1]

# Default threshold (0.5) β€” balanced
pred_default = (prob >= 0.5).astype(int)

# CRWS-optimal threshold (0.26 for XGBoost) β€” minimises missed deaths
pred_clinical = (prob >= 0.26).astype(int)

print(f"Mortality probability: {prob[0]:.4f}")
print(f"Prediction (clinical): {'High risk' if pred_clinical[0] else 'Low risk'}")

Threshold Recommendations

Model Default (t=0.5) FN CRWS-Optimal Threshold CRWS-Optimal FN FN Reduction
Logistic Regression 564 0.37 142 ↓74.8%
Random Forest 1208 0.05 188 ↓84.4%
XGBoost 289 0.26 99 ↓65.7%
MLP Neural Network 1013 0.05 174 ↓82.8%

Preprocessing Pipeline

MIMIC-IV CSVs
  β†’ Merge patients + admissions + icustays
  β†’ Keep first ICU stay per patient
  β†’ Median imputation (los, anchor_age)
  β†’ "Unknown" fill + LabelEncoder for 8 categorical features
  β†’ 80/20 stratified train/test split
  β†’ SMOTE (k=3) on training set only  ← no data leakage
  β†’ StandardScaler (fit on SMOTE train, transform test)

Dataset & Ethics

  • Dataset: MIMIC-IV v2.2 β€” requires PhysioNet credentialed access
  • Cohort: 65,366 adult ICU patients (first ICU stay), BIDMC 2008–2019
  • Mortality rate: 10.84% (7,086 deaths)
  • Intended use: Research and clinical decision support β€” not a replacement for clinical judgment
  • Limitations: Single-centre data (BIDMC), limited feature set (no vitals/labs), label encoding may not generalise

Citation

If you use this work, please cite:

@misc{crws-icu-mortality-2025,
  title  = {CRWS: A Clinical Risk-Weighted Score for ICU Mortality Prediction},
  author = {Amanchandra H},
  year   = {2025},
  note   = {Trained on MIMIC-IV. HuggingFace Model Repository.}
}

License

MIT β€” see LICENSE. Note: MIMIC-IV data requires a separate credentialed access agreement from PhysioNet.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support