CRWS-ICU Mortality Prediction Models

Clinical Risk-Weighted Score (CRWS) — ICU Mortality Prediction from MIMIC-IV

This repository contains four trained classifiers for predicting in-hospital mortality of ICU patients, trained and evaluated on the MIMIC-IV dataset. Models are evaluated with the Clinical Risk-Weighted Score (CRWS), a novel metric that penalises missed deaths (False Negatives) more heavily than false alarms.

Models Included

File	Model	F1	CRWS (w_fn=3)	AUPRC	FN
`logistic_regression.pkl`	Logistic Regression	0.2659	0.4805	0.1794	564
`random_forest.pkl`	Random Forest	0.1910	0.1470	0.2247	1220
`xgboost.pkl`	XGBoost ⭐	0.2978	0.5965	0.2658	289
`mlp.pkl`	MLP Neural Network	0.2691	0.2817	0.2263	1013

⭐ XGBoost is the recommended model — highest CRWS, AUROC, and fewest missed deaths (FN=289).

The CRWS Metric

The Clinical Risk-Weighted Score generalises F1 to account for asymmetric clinical costs:

CRWS(w_fn, w_fp) = (1 + w_fn²) × TP
                   ─────────────────────────────────────────────────
                   (1 + w_fn²) × TP  +  w_fn² × FN  +  w_fp × FP

w_fn = 3, w_fp = 1 (default): Missed deaths penalised 3× more than false alarms
When w_fn = 1, w_fp = 1: CRWS = F1

Files

model_artifacts/
├── scaler.pkl                 # StandardScaler (fit on SMOTE-balanced training data)
├── logistic_regression.pkl    # Logistic Regression
├── random_forest.pkl          # Random Forest (100 trees)
├── xgboost.pkl                # XGBoost (scale_pos_weight=10)
├── mlp.pkl                    # MLP (64→32 hidden layers)
├── model_metadata.json        # Feature info, metrics, label encoder mappings
└── README.md                  # This file

Features

The models take 10 features derived from MIMIC-IV patients, admissions, and icustays tables:

Feature	Description
`los`	ICU Length of Stay (days)
`anchor_age`	Patient age at admission
`first_careunit`	First ICU care unit (label-encoded)
`last_careunit`	Last ICU care unit (label-encoded)
`gender`	Patient gender (label-encoded)
`insurance`	Insurance type (label-encoded)
`marital_status`	Marital status (label-encoded)
`race`	Race/ethnicity (label-encoded)
`admission_type`	Type of hospital admission (label-encoded)
`admission_location`	Admission location (label-encoded)

Quick Start

import joblib
import numpy as np

# Load model and scaler
model  = joblib.load("xgboost.pkl")
scaler = joblib.load("scaler.pkl")

# Example input (must match feature order above)
# [los, anchor_age, first_careunit, last_careunit, gender,
#  insurance, marital_status, race, admission_type, admission_location]
X_raw = np.array([[3.5, 67, 2, 2, 1, 1, 2, 0, 1, 3]])

# Preprocess and predict
X_scaled = scaler.transform(X_raw)
prob      = model.predict_proba(X_scaled)[:, 1]

# Default threshold (0.5) — balanced
pred_default = (prob >= 0.5).astype(int)

# CRWS-optimal threshold (0.26 for XGBoost) — minimises missed deaths
pred_clinical = (prob >= 0.26).astype(int)

print(f"Mortality probability: {prob[0]:.4f}")
print(f"Prediction (clinical): {'High risk' if pred_clinical[0] else 'Low risk'}")

Threshold Recommendations

Model	Default (t=0.5) FN	CRWS-Optimal Threshold	CRWS-Optimal FN	FN Reduction
Logistic Regression	564	0.37	142	↓74.8%
Random Forest	1208	0.05	188	↓84.4%
XGBoost	289	0.26	99	↓65.7%
MLP Neural Network	1013	0.05	174	↓82.8%

Preprocessing Pipeline

MIMIC-IV CSVs
  → Merge patients + admissions + icustays
  → Keep first ICU stay per patient
  → Median imputation (los, anchor_age)
  → "Unknown" fill + LabelEncoder for 8 categorical features
  → 80/20 stratified train/test split
  → SMOTE (k=3) on training set only  ← no data leakage
  → StandardScaler (fit on SMOTE train, transform test)

Dataset & Ethics

Dataset: MIMIC-IV v2.2 — requires PhysioNet credentialed access
Cohort: 65,366 adult ICU patients (first ICU stay), BIDMC 2008–2019
Mortality rate: 10.84% (7,086 deaths)
Intended use: Research and clinical decision support — not a replacement for clinical judgment
Limitations: Single-centre data (BIDMC), limited feature set (no vitals/labs), label encoding may not generalise

Citation

If you use this work, please cite:

@misc{crws-icu-mortality-2025,
  title  = {CRWS: A Clinical Risk-Weighted Score for ICU Mortality Prediction},
  author = {Amanchandra H},
  year   = {2025},
  note   = {Trained on MIMIC-IV. HuggingFace Model Repository.}
}

License

MIT — see LICENSE. Note: MIMIC-IV data requires a separate credentialed access agreement from PhysioNet.

Downloads last month: -