Cadence β€” Next Clinical Event Prediction

cadence-core is a neural model for next clinical event prediction from electronic health record (EHR) sequences. Given a patient's longitudinal clinical history, it predicts which of 48 clinical event categories will occur next and how many days until that event.

PyPI GitHub bioRxiv

Model Description

Cadence implements the Narrative Velocity Composite (NV-C) framework β€” a 5.86M parameter residual MLP that fuses structured EHR features with PubMedBERT cluster-semantic embeddings under self-knowledge distillation.

Component Details
Architecture 3-block residual MLP with LayerNorm
Input dimension 2,420 (884 NV features + 768 PubMedBERT mean + 768 PubMedBERT last)
Classification head Linear β†’ 48 event-class logits
Regression head Linear β†’ 19-bin discretized time-to-event
Parameters 5.86M
Training Cross-entropy + ordinal regression + self-knowledge distillation

Performance

Trained and evaluated on MIMIC-IV v3.1 (100k training tier, male cohort):

Model Top-1 Accuracy 95% CI MAE (days) 95% CI
Cadence (NV-C) 34.18% [33.84%, 34.42%] 36.95 [36.10, 37.68]
XGBoost-884 32.35% β€” 38.58 β€”
Majority-class 9.25% β€” β€” β€”
Random 2.08% β€” β€” β€”

Bootstrap 95% CI: N=105,968 test instances, 2,000 resamples. XGBoost falls outside Cadence's CI on both metrics.

Key finding: Self-knowledge distillation yields a disproportionately large top-1 gain (+0.81 pp) when applied after PubMedBERT cluster-semantic fusion β€” compared to ~0 pp gain from self-KD on structured features alone β€” suggesting a genuine interaction between frozen semantic embeddings and knowledge distillation.

Installation & Usage

pip install cadence-core>=1.3.0

Pretrained weights not distributed. The MIMIC-trained classifier is dataset-specific (50-cluster space derived from MIMIC text). Transfer to other datasets is not meaningful, so we ship the architecture and training code rather than weights. Train your own model on your own data using cadence.train(...).

import cadence

classifier = cadence.train(
    train_jsonl="my_data/train.jsonl",
    val_jsonl="my_data/val.jsonl",
    embeddings_path="my_data/embeddings.npy",
    event_index_path="my_data/event_index.json",
    n_clusters=50,
    out_dir="./runs/my_run",
    n_epochs=30,
)

preds = cadence.predict(
    classifier,
    "my_data/test.jsonl",
    embeddings_path="my_data/embeddings.npy",
    event_index_path="my_data/event_index.json",
)
# preds: [{"patient_id": "...", "top_3_clusters": [...], "top_3_probs": [...], "days_until_next": ...}, ...]

Train on Custom Labels (Binary / Multiclass)

Starting with v1.2.0, cadence.train() accepts task="binary" or task="multiclass" so you can train Cadence on arbitrary labels instead of next-event prediction. Add a label_field key to your target objects and pass it along:

import cadence

# Binary classification on JSONL data
# (your target objects include e.g. {"cluster_id": ..., "readmitted_30d": 1})
classifier = cadence.train(
    train_jsonl="train.jsonl",
    val_jsonl="val.jsonl",
    embeddings_path="embeddings.npy",
    event_index_path="event_index.json",
    n_clusters=50,
    n_epochs=30,
    out_dir="./runs/binary_run",
    task="binary",
    label_field="readmitted_30d",
)
preds = cadence.predict(
    classifier,
    "test.jsonl",
    embeddings_path="embeddings.npy",
    event_index_path="event_index.json",
)
# preds: [{"patient_id": "...", "probabilities": 0.83}, ...]

# Multiclass (4 classes) on JSONL data
classifier = cadence.train(
    train_jsonl="train.jsonl",
    val_jsonl="val.jsonl",
    embeddings_path="embeddings.npy",
    event_index_path="event_index.json",
    n_clusters=50,
    n_epochs=30,
    out_dir="./runs/multiclass_run",
    task="multiclass",
    label_field="discharge_category",
    n_classes=4,
)
preds = cadence.predict(
    classifier,
    "test.jsonl",
    embeddings_path="embeddings.npy",
    event_index_path="event_index.json",
)
# preds: [{"patient_id": "...", "probabilities": [0.1, 0.5, 0.3, 0.1]}, ...]

Pre-built feature matrix

If you already have a feature matrix, skip JSONL entirely:

import cadence
import numpy as np

# X_train: (N, D) numpy array, y_train: (N,) integer labels
classifier = cadence.train_classifier(
    X_train, y_train,
    X_val=X_val, y_val=y_val,
    task="binary",
    n_epochs=30,
    out_dir="./runs/features_run",
)
probs = cadence.predict_from_features(classifier, X_test)
# probs: (N,) array of probabilities for binary; (N, K) for multiclass

Recommended for small datasets (n < 5000)

On small datasets Cadence can overfit quickly. Use early stopping, class weighting, and stronger L2 regularization to stabilize training (v1.2.1+):

classifier = cadence.train_classifier(
    X_train, y_train,
    X_val=X_val, y_val=y_val,
    task="binary",
    n_epochs=200,
    hidden_dims=(128, 64),         # smaller model for small n
    early_stopping_patience=10,    # halt when val plateaus
    early_stopping_metric="val_auroc",
    class_weight="balanced",       # imbalanced clinical labels
    weight_decay=1e-3,             # stronger L2 vs default 1e-4
    lr=1e-3,
)
probs = cadence.predict_from_features(classifier, X_test)

The same kwargs are available on cadence.train() for task="binary" or task="multiclass" (JSONL path). For task="next_event", these kwargs are accepted but ignored.


Training Data

Trained on MIMIC-IV v3.1 (PhysioNet credentialed access required):

  • 100k patient sequences, male cohort
  • 48 event categories derived from ICD-10 diagnosis and procedure codes
  • External validation: 1,120 BWH patients (JSD=0.27 domain shift)

Intended Use & Limitations

Intended use: Research on clinical event prediction, EHR sequence modelling, benchmarking.

Not intended for: Direct clinical decision-making without prospective validation.

Limitations:

  • Trained on MIMIC-IV (US academic medical center); performance under distribution shift varies
  • Structured features (labs, vitals, medications) required at inference; performance degrades when unavailable
  • 48-class coarse event vocabulary; within-cluster distinctions are discarded
  • Calibration analysis pending

Citation

@article{rouhollahi2026cadence,
  title   = {Cadence: A Benchmark Evaluation of the Narrative Velocity Framework for Next Clinical Event Prediction in {MIMIC-IV}},
  author  = {Rouhollahi, Amir and Nezami, Farhad R.},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.05.06.722409},
  url     = {https://doi.org/10.64898/2026.05.06.722409}
}

Changelog

Version Notes
v1.3.0 Main model class renamed NVCClean β†’ Cadence for clarity.
v1.2.1 Early stopping, class weighting, and L2 regularization kwargs for small datasets.
v1.2.0 Binary and multiclass task support via task= and label_field= arguments.
v1.1.1 Removed load_checkpoint(); pretrained weights are not portable across datasets.
v1.1.0 Public training/inference API.

License

MIT License. MIMIC-IV data is subject to the PhysioNet Credentialed Health Data License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support