Instructions to use amanchandra/crws-icu-mortality with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use amanchandra/crws-icu-mortality with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("amanchandra/crws-icu-mortality", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
CRWS-ICU Mortality Prediction Models
Clinical Risk-Weighted Score (CRWS) β ICU Mortality Prediction from MIMIC-IV
This repository contains four trained classifiers for predicting in-hospital mortality of ICU patients, trained and evaluated on the MIMIC-IV dataset. Models are evaluated with the Clinical Risk-Weighted Score (CRWS), a novel metric that penalises missed deaths (False Negatives) more heavily than false alarms.
Models Included
| File | Model | F1 | CRWS (w_fn=3) | AUPRC | FN |
|---|---|---|---|---|---|
logistic_regression.pkl |
Logistic Regression | 0.2659 | 0.4805 | 0.1794 | 564 |
random_forest.pkl |
Random Forest | 0.1910 | 0.1470 | 0.2247 | 1220 |
xgboost.pkl |
XGBoost β | 0.2978 | 0.5965 | 0.2658 | 289 |
mlp.pkl |
MLP Neural Network | 0.2691 | 0.2817 | 0.2263 | 1013 |
β XGBoost is the recommended model β highest CRWS, AUROC, and fewest missed deaths (FN=289).
The CRWS Metric
The Clinical Risk-Weighted Score generalises F1 to account for asymmetric clinical costs:
CRWS(w_fn, w_fp) = (1 + w_fnΒ²) Γ TP
βββββββββββββββββββββββββββββββββββββββββββββββββ
(1 + w_fnΒ²) Γ TP + w_fnΒ² Γ FN + w_fp Γ FP
- w_fn = 3, w_fp = 1 (default): Missed deaths penalised 3Γ more than false alarms
- When w_fn = 1, w_fp = 1: CRWS = F1
Files
model_artifacts/
βββ scaler.pkl # StandardScaler (fit on SMOTE-balanced training data)
βββ logistic_regression.pkl # Logistic Regression
βββ random_forest.pkl # Random Forest (100 trees)
βββ xgboost.pkl # XGBoost (scale_pos_weight=10)
βββ mlp.pkl # MLP (64β32 hidden layers)
βββ model_metadata.json # Feature info, metrics, label encoder mappings
βββ README.md # This file
Features
The models take 10 features derived from MIMIC-IV patients, admissions, and icustays tables:
| Feature | Description |
|---|---|
los |
ICU Length of Stay (days) |
anchor_age |
Patient age at admission |
first_careunit |
First ICU care unit (label-encoded) |
last_careunit |
Last ICU care unit (label-encoded) |
gender |
Patient gender (label-encoded) |
insurance |
Insurance type (label-encoded) |
marital_status |
Marital status (label-encoded) |
race |
Race/ethnicity (label-encoded) |
admission_type |
Type of hospital admission (label-encoded) |
admission_location |
Admission location (label-encoded) |
Quick Start
import joblib
import numpy as np
# Load model and scaler
model = joblib.load("xgboost.pkl")
scaler = joblib.load("scaler.pkl")
# Example input (must match feature order above)
# [los, anchor_age, first_careunit, last_careunit, gender,
# insurance, marital_status, race, admission_type, admission_location]
X_raw = np.array([[3.5, 67, 2, 2, 1, 1, 2, 0, 1, 3]])
# Preprocess and predict
X_scaled = scaler.transform(X_raw)
prob = model.predict_proba(X_scaled)[:, 1]
# Default threshold (0.5) β balanced
pred_default = (prob >= 0.5).astype(int)
# CRWS-optimal threshold (0.26 for XGBoost) β minimises missed deaths
pred_clinical = (prob >= 0.26).astype(int)
print(f"Mortality probability: {prob[0]:.4f}")
print(f"Prediction (clinical): {'High risk' if pred_clinical[0] else 'Low risk'}")
Threshold Recommendations
| Model | Default (t=0.5) FN | CRWS-Optimal Threshold | CRWS-Optimal FN | FN Reduction |
|---|---|---|---|---|
| Logistic Regression | 564 | 0.37 | 142 | β74.8% |
| Random Forest | 1208 | 0.05 | 188 | β84.4% |
| XGBoost | 289 | 0.26 | 99 | β65.7% |
| MLP Neural Network | 1013 | 0.05 | 174 | β82.8% |
Preprocessing Pipeline
MIMIC-IV CSVs
β Merge patients + admissions + icustays
β Keep first ICU stay per patient
β Median imputation (los, anchor_age)
β "Unknown" fill + LabelEncoder for 8 categorical features
β 80/20 stratified train/test split
β SMOTE (k=3) on training set only β no data leakage
β StandardScaler (fit on SMOTE train, transform test)
Dataset & Ethics
- Dataset: MIMIC-IV v2.2 β requires PhysioNet credentialed access
- Cohort: 65,366 adult ICU patients (first ICU stay), BIDMC 2008β2019
- Mortality rate: 10.84% (7,086 deaths)
- Intended use: Research and clinical decision support β not a replacement for clinical judgment
- Limitations: Single-centre data (BIDMC), limited feature set (no vitals/labs), label encoding may not generalise
Citation
If you use this work, please cite:
@misc{crws-icu-mortality-2025,
title = {CRWS: A Clinical Risk-Weighted Score for ICU Mortality Prediction},
author = {Amanchandra H},
year = {2025},
note = {Trained on MIMIC-IV. HuggingFace Model Repository.}
}
License
MIT β see LICENSE. Note: MIMIC-IV data requires a separate credentialed access agreement from PhysioNet.
- Downloads last month
- -