CardioSafe — paper-snapshot weights

Paper-snapshot weights for CardioSafe: multi-task prediction of cardiac ion channel activity with reverse-leak audited benchmarking (Jovanović et al., 2026, bioRxiv).

CardioSafe is a three-branch multi-task neural network that predicts blocker status and pIC50 for the four CiPA cardiac ion channels — hERG, Nav1.5, Cav1.2, and (exploratory) IKs — trained on the largest publicly reported multi-channel cardiac ion channel dataset (ChEMBL 36 + hERG Central, 334,444 curated compounds, 8 heads).

This HuggingFace repo is a mirror. The canonical home is github.com/AppliedScientific/CardioSafe-benchmark, which ships the curated dataset, splits, supplementary materials, the reverse-leak audit script, the reference model + training-step code, and runnable inference (inference/predict.py). The continually-updated deployed ensemble is served at platform.appliedscientific.ai/cardiosafe.

Files

v1.0/                                     # preprint snapshot, 5-seed ensemble
  cardiosafe_v1.0_seed_{42..46}.pt        # 15 MB each
v1.1/                                     # audit-clean snapshot, 5-seed ensemble
  cardiosafe_v1.1_seed_{42..46}.pt        # 15 MB each — RECOMMENDED for new work
l1000/
  l1000_encoder.pt                        # 10 MB — shared by v1.0 + v1.1
  l1000_per_gene_pearson.json             # per-gene test-set Pearson r (diagnostic)

Each .pt contains model_state_dict, descriptor / L1000 / regression-head scalers, and a clean config dict. The L1000 encoder checkpoint additionally contains the gene co-expression edge_index and per-gene scaler stats.

v1.0 vs v1.1

  • v1.0 is the exact ensemble evaluated in the bioRxiv preprint.
  • v1.1 is an audit-clean retrain: the exhaustive O(n_train × n_other) Tanimoto leakage audit flagged 12 train↔val edges in tan70 v1.0 at Morgan-r2-2048 Tanimoto ≥ 0.70, all within the canonical cardiac-cliff cluster (terfenadine / fexofenadine / hydroxymethyl-terfenadine analogs). v1.1 force-routes the 2 HMT analogs (rows 317153, 331406) to val so the cluster is fully audit-clean.
  • Test fold is identical between v1.0 and v1.1 — headline test metrics (Tables 2 / 3 of the paper) are unchanged. v1.1 just gives an audit-clean training set for the per-seed val fold selection.
  • See Note S3 for the full audit findings + re-evaluation of the cardiac-cliff case study.

Use v1.1 for new work. v1.0 is retained so the preprint numbers stay reproducible.

Inputs and outputs

The model expects a single flat float32 tensor of shape (B, 7526):

dims block source
0 – 2047 Morgan radius-2 2048-bit binary fingerprint RDKit GetMorganGenerator(radius=2, fpSize=2048)
2048 – 4095 AtomPair 2048-bit binary fingerprint RDKit GetAtomPairGenerator(fpSize=2048)
4096 – 6143 TopologicalTorsion 2048-bit binary fingerprint RDKit GetTopologicalTorsionGenerator(fpSize=2048)
6144 – 6163 20-descriptor block, training-fold z-scored Spec in data/supplementary/table_s0_descriptor_spec.*
6164 – 6547 ChemBERTa-77M-MTR mean-pooled embedding (384) model/chemberta_encoder.py
6548 – 7525 L1000 predicted expression z-scores (978) model/l1000_encoder.py

forward(x) returns a dict[str, Tensor] with 8 keys, each value a (B,) tensor:

Head Output Channel
herg_pchembl regression — raw pIC50 hERG
herg_blocker_10um logit (apply sigmoid for P) hERG
herg_blocker_1um logit hERG
nav15_pchembl regression — raw pIC50 Nav1.5
nav15_blocker logit Nav1.5
cav12_pchembl regression — raw pIC50 Cav1.2
cav12_blocker logit Cav1.2
iks_blocker logit IKs

IKs has no regression head (n = 115 labelled compounds; treated as exploratory). See the full model card for architecture details.

Usage

The recommended path is the runnable inference shipped in the GitHub repo. It handles all featurization (RDKit + ChemBERTa + L1000 encoder) and the ensemble forward pass:

git clone https://github.com/AppliedScientific/CardioSafe-benchmark
cd CardioSafe-benchmark
pip install -e .[inference]

# CSV in / CSV out — auto-downloads weights from GitHub Releases on first call
python -m inference.predict --in inference/example_smiles.csv \
                            --out predictions.csv \
                            --version v1.1

To download these weight files from the HuggingFace mirror instead:

from huggingface_hub import snapshot_download

local = snapshot_download(repo_id="appliedscientific/cardiosafe")
# v1.0/, v1.1/, l1000/ subdirectories under `local`

The repo's inference.ensemble module loads the seed checkpoints; see inference/README.md for the loader API and a Python example.

Verified

Loading the v1.1 weights into the public model.cross_attn.CrossAttnIonChannelPredictor and running the cardiac-cliff anchors reproduces the published v1.1 case-study values to within 0.01: terfenadine pIC50 6.258 (published 6.247), fexofenadine pIC50 4.505 (4.512), cliff 1.754 (1.736).

License

CC-BY-NC-4.0. Academic, educational, and non-profit research use is permitted with attribution. Commercial use requires a separate license — contact the authors (lukas@appliedscientific.ai, mihailo@appliedscientific.ai).

The code in the GitHub repository is MIT; the dataset there is CC-BY-4.0. Only the model weights distributed here and in the GitHub Releases are CC-BY-NC-4.0.

Citation

@article{cardiosafe2026,
  title   = {CardioSafe: multi-task prediction of cardiac ion channel
             activity with reverse-leak audited benchmarking},
  author  = {Jovanović, Mihailo and Weidener, Lukas and Brkić, Marko and
             Ulgac, Emre and Meduri, Aakaash},
  year    = {2026},
  journal = {bioRxiv},
  doi     = {10.64898/2026.05.06.723181},
  url     = {https://www.biorxiv.org/content/10.64898/2026.05.06.723181v1}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using appliedscientific/cardiosafe 1

Collection including appliedscientific/cardiosafe