chorus-epinformerseq-v2

Per-cell EPInformer-seq checkpoints for 11 Roadmap Epigenomics cell types. Drop-in artifacts for the Chorus epinformerseq oracle.

Architecture

PerCellProfileNetWide โ€” a dilated-CNN, 2-channel profile model. A 2114-bp input is run through the body, then the central 1024 bp is cropped for the heads (ChromBPNet-style "valid" geometry, so every output base has a full real-sequence receptive field). The two channels are:

  • ch0 โ€” DNase: 5โ€ฒ cut-sites (motif-sensitive, sharp).
  • ch1 โ€” H3K27ac: read signal (the active-enhancer histone mark).

Each cell line gets its own main checkpoint (no FiLM, no cell embedding), paired with a per-cell frozen 2-channel BiasNet (1024-bp, run on the central crop): ch0 subtracts the Tn5/DNase enzymatic cut bias, ch1 the H3K27ac background bias. H3K27ac has no enzymatic cut, so no separate bias model is needed.

The retired joint CellCondProfileNet (FiLM over 11 cells) and the earlier 1024-bp SAME-padded PerCellProfileNet are not used by Chorus.

Layout

per_cell_widewin/
  K562/main.pt        # PerCellProfileNetWide state_dict (~136K params)
  ... (11 cells)
bias/
  K562/bias.pt        # 2-channel BiasNet state_dict (~37K params, frozen)
  ... (11 cells)

Shipped weights (2026-06-04): the roadmap retrain โ€” trained on Roadmap DNase-summit peaks with the Roadmap-pipeline H3K27ac. Strongest per-cell DNase test-r and a functional H3K27ac channel; supersedes an earlier ENCODE-IDR-peak / full-coverage-H3K27ac variant.

Training

  • Window: 2114-bp input โ†’ central 1024-bp profile crop, on roadmap DNase peak summits.
  • Signals: ch0 = per-bp 5โ€ฒ DNase cut-site counts; ch1 = per-bp H3K27ac signal (Roadmap v2.1 pipeline).
  • Data: ENCODE DNase + H3K27ac BAMs (SE/PE), fold-10 leave-chromosomes-out split (chr11+chr21 held out).
  • Loss: multinomial NLL on the per-bp 2-channel profile + MSE on log10 count per channel. AdamW + OneCycleLR.
  • Per-cell held-out test-r: DNase 0.59โ€“0.88, H3K27ac 0.48โ€“0.67.

Usage

from chorus.oracles import EPInformerSeqOracle

oracle = EPInformerSeqOracle(cell_type="K562")
oracle.load_pretrained_model()  # downloads from this repo on first run

result = oracle.predict(
    sequence="A" * 2114,
    assay_ids=["Enhancer_DNase:K562"],   # or Enhancer_H3K27ac, Enhancer_H3K27ac_DNase
)

Available cells: K562, GM12878, HepG2, A549, H1, HeLa, HMEC, HSMM, HUVEC, NHEK, NHLF.

Available assays (max over the central 256 bp of the 1024-bp output): Enhancer_DNase (default, max DNase), Enhancer_H3K27ac (max H3K27ac), Enhancer_H3K27ac_DNase (composite sqrt(max DNase ยท max H3K27ac)).

Background CDFs

Chorus pulls per-track CDFs from the companion dataset lucapinello/chorus-backgrounds (epinformerseq_pertrack.npz, 33 tracks = 3 assays ร— 11 cells).

Citation

Cite the EPInformer paper and the Chorus pipeline (see the Chorus README).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support