Cadence β€” Next Clinical Event Prediction

cadence-core is a pretrained neural model for next clinical event prediction from electronic health record (EHR) sequences. Given a patient's longitudinal clinical history, it predicts which of 48 clinical event categories will occur next and how many days until that event.

PyPI GitHub bioRxiv

Model Description

Cadence implements the Narrative Velocity Composite (NV-C) framework β€” a 5.86M parameter residual MLP that fuses structured EHR features with PubMedBERT cluster-semantic embeddings under self-knowledge distillation.

Component Details
Architecture 3-block residual MLP with LayerNorm
Input dimension 2,420 (884 NV features + 768 PubMedBERT mean + 768 PubMedBERT last)
Classification head Linear β†’ 48 event-class logits
Regression head Linear β†’ 19-bin discretized time-to-event
Parameters 5.86M
Training Cross-entropy + ordinal regression + self-knowledge distillation

Performance

Trained and evaluated on MIMIC-IV v3.1 (100k training tier, male cohort):

Model Top-1 Accuracy 95% CI MAE (days) 95% CI
Cadence (NV-C) 34.18% [33.84%, 34.42%] 36.95 [36.10, 37.68]
XGBoost-884 32.35% β€” 38.58 β€”
Majority-class 9.25% β€” β€” β€”
Random 2.08% β€” β€” β€”

Bootstrap 95% CI: N=105,968 test instances, 2,000 resamples. XGBoost falls outside Cadence's CI on both metrics.

Key finding: Self-knowledge distillation yields a disproportionately large top-1 gain (+0.81 pp) when applied after PubMedBERT cluster-semantic fusion β€” compared to ~0 pp gain from self-KD on structured features alone β€” suggesting a genuine interaction between frozen semantic embeddings and knowledge distillation.

Installation & Usage

pip install cadence-core
import torch
from cadence import CadenceModel, load_checkpoint

model = CadenceModel()
load_checkpoint(model)
model.eval()

# Input: 2420-dimensional feature vector
# [0:884]    β€” 884 Narrative Velocity (NV) clinical features
# [884:1652] β€” 768-dim PubMedBERT mean-pooled cluster-semantic embedding
# [1652:2420] β€” 768-dim PubMedBERT last-token cluster-semantic embedding
x = torch.randn(1, 2420)

with torch.no_grad():
    logits, time_bins = model(x)

event_probs = torch.softmax(logits, dim=-1)
top1_event  = event_probs.argmax(dim=-1).item()

Training Data

Trained on MIMIC-IV v3.1 (PhysioNet credentialed access required):

  • 100k patient sequences, male cohort
  • 48 event categories derived from ICD-10 diagnosis and procedure codes
  • External validation: 1,120 BWH patients (JSD=0.27 domain shift)

Intended Use & Limitations

Intended use: Research on clinical event prediction, EHR sequence modelling, benchmarking.

Not intended for: Direct clinical decision-making without prospective validation.

Limitations:

  • Trained on MIMIC-IV (US academic medical center); performance under distribution shift varies
  • Structured features (labs, vitals, medications) required at inference; performance degrades when unavailable
  • 48-class coarse event vocabulary; within-cluster distinctions are discarded
  • Calibration analysis pending

Citation

@article{rouhollahi2026cadence,
  title   = {Cadence: A Benchmark Evaluation of the Narrative Velocity Framework for Next Clinical Event Prediction in {MIMIC-IV}},
  author  = {Rouhollahi, Amir and Nezami, Farhad R.},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.05.06.722409},
  url     = {https://doi.org/10.64898/2026.05.06.722409}
}

License

MIT License. The pretrained checkpoint is provided for research use only. MIMIC-IV data is subject to the PhysioNet Credentialed Health Data License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support