Papers
arxiv:2511.11935

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

Published on May 12
Authors:
,

Abstract

A standardized preprocessing pipeline for deep-learning survival models in electronic health records addresses inconsistencies in cohort definition, time discretization, and missing data handling to enable fair model comparison.

AI-generated summary

Deep-learning survival models for electronic health record (EHR) data are hard to compare across papers because the upstream preprocessing step, which includes cohort definition, time discretisation, missingness handling, and censoring rules, is typically undocumented and inconsistent. A reported difference in concordance between two mortality models can therefore reflect any of these choices rather than a modelling contribution. We present SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports into model-ready tensors for survival analysis. SurvBench covers four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) and four input modalities: time-series vitals and laboratory values, static demographics, International Classification of Diseases (ICD) codes, and radiology report embeddings. Every preprocessing decision is controlled through YAML configuration. Imputation, scaling, and feature filtering are fit on the training fold only. Missingness is recorded as a binary mask alongside each feature tensor. The pipeline handles single-risk endpoints (in-hospital and in-ICU mortality) and competing-risks endpoints (a three-way emergency-department admission pathway, with home discharge treated as administrative censoring). We also provide support for harmonised cross-dataset external validation between eICU and MIMIC-IV. SurvBench is publicly available at https://github.com/munibmesinovic/SurvBench, providing a robust platform that future deep-learning EHR survival work, especially nascent multi-modal approaches, can be measured against under matched preprocessing.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2511.11935
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.11935 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.11935 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.11935 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.