Leveraging Large Language Models to Predict Unplanned ICU Readmissions from Electronic Health Records

This repository contains the fine-tuned Large Language Models (LLMs) developed for predicting unplanned Intensive Care Unit (ICU) readmissions from Electronic Health Records (EHRs).

The models were introduced in the paper:

Leveraging Large Language Models to Predict Unplanned ICU Readmissions from Electronic Health Records

Hoda Helmy, Ahmed Ibrahim, Maryam Arabi, Aamenah Sattar, and Ahmed Serag

Natural Language Processing Journal (2025)

Paper: https://doi.org/10.1016/j.nlp.2025.100182

Abstract

Unplanned ICU readmissions are associated with increased mortality, prolonged hospital stays, and higher healthcare costs. Early identification of high-risk patients can support clinical decision-making and improve patient outcomes.

This work investigates the use of Large Language Models (LLMs) for ICU readmission prediction by transforming structured Electronic Health Records (EHRs) into a textual representation suitable for language model processing. We evaluate both explicit classification and implicit text-generation approaches using fine-tuned Gemma 2B and Apollo 2B models.

The proposed framework not only predicts readmission risk but also generates interpretable clinical explanations that can assist healthcare professionals in understanding the reasoning behind model predictions.

Dataset

This work utilizes the publicly available eICU Collaborative Research Database (eICU-CRD).

Dataset Access:

https://eicu-crd.mit.edu/gettingstarted/access/

Please ensure compliance with all data usage agreements before accessing or using the dataset.

Loading the Model

PEFT / LoRA Models

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, PeftModel

peft_model_id = "serag-ai/ICU-GEMMA-EXP "

config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path
)

model = PeftModel.from_pretrained(
    model,
    peft_model_id
)

tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path
)

Example Usage

prompt = """
Patient Summary:

Age: 72
Gender: Male
Heart Rate: 118 bpm
White Blood Cell Count: Elevated
Mechanical Ventilation: Yes

Predict ICU readmission risk.
"""

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=128
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this work in your research, please cite:

@article{HELMY2025100182,
title = {Leveraging large language models to predict unplanned ICU readmissions from electronic health records},
journal = {Natural Language Processing Journal},
volume = {13},
pages = {100182},
year = {2025},
issn = {2949-7191},
doi = {10.1016/j.nlp.2025.100182},
url = {https://www.sciencedirect.com/science/article/pii/S2949719125000585},
author = {Hoda Helmy and Ahmed Ibrahim and Maryam Arabi and Aamenah Sattar and Ahmed Serag}
}

Downloads last month: 40

Safetensors

Model size

3B params

Tensor type

F32

F16

Model tree for serag-ai/ICU-GEMMA-EXP

Base model

google/gemma-2b-it

Finetuned

(125)

this model