Add canonical HetMoE checkpoints (ConvNet experts, hyperparams, ConvNet MoE, genomic zoo seeds 0/1/42) + model card

fb066bb verified about 11 hours ago

3.73 kB

license: cc-by-nd-4.0
tags:
  - TFBS
  - transcription-factor
  - genomics
  - mixture-of-experts
  - DNA
library_name: pytorch
pretty_name: ENCODE-TFBS Heterogeneous Mixture-of-Experts checkpoints
datasets:
  - Lab-Rasool/ENCODE-TFBS

ENCODE-TFBS — Heterogeneous Mixture-of-Experts checkpoints

Trained model checkpoints for robust transcription-factor binding-site (TFBS) prediction with a heterogeneous Mixture-of-Experts (MoE). A dense, soft MoE gates over per-expert embeddings from a heterogeneous expert zoo (modified-DeepBIND ConvNet + DeepSEA + DanQ, each probed to a common 32-dim embedding), which improves out-of-distribution (OOD) generalization to unseen transcription factors.

These weights back the paper "Robust Transcription Factor Binding Site Prediction and Explainability Using a Heterogeneous Mixture of Experts Architecture." Code, training and evaluation pipeline: https://github.com/lab-rasool/TFBS. Training/eval data: Lab-Rasool/ENCODE-TFBS.

Headline result (genomic, fair-negative protocol, 7 training factors)

Feeding the unchanged embedding-gating MoE a heterogeneous expert pool beats a fine-tuned DNABERT-6 baseline on the motif-bearing OOD strata, averaged over seeds 0/1/42:

Model	OOD AUC (mean ± std)
HetMoE (this work)	0.821 ± 0.005
DNABERT-6	0.799 ± 0.008

Margin +0.022. Per-seed: seed 42 → 0.827, seed 0 → 0.818, seed 1 → 0.819.

Repository contents

experts/<TF>.pth            7 ConvNet experts (modified DeepBIND), one per training TF
hyperparams/<TF>.pth        per-expert hyperparameters (reproduce training without Optuna)
moe/moe_model.pth           homogeneous ConvNet-only MoE gate (+ moe_model_config.pth)
zoo/seed{0,1,42}/           heterogeneous zoo probes — DeepSEA_<TF>.pth, DanQ_<TF>.pth
                            (E=32 FeatureProbeExpert heads over frozen DeepSEA/DanQ trunks)

The genomic HetMoE for a given seed is the 21-expert pool: the 7 experts/ ConvNets plus the 14 zoo/seed<N>/ DeepSEA + DanQ probes, with the MixtureOfExperts gate applied unchanged over the concatenated 32-dim embeddings. Only the three paper seeds (0, 1, 42) are published here.

Transcription factors. Training: ARID3A, FOXM1, GATA3, JUND, MAX, GABPA, SP1. OOD evaluation is stratified into within-family, cross-family (e.g. CTCF, STAT3), cell-line-transfer, and a separately reported non-motif appendix — see tfbs/constants.py in the code repo.

Usage

Install the tfbs package and load with the provided classes (map_location handles CPU-only nodes):

from huggingface_hub import hf_hub_download
import torch
from tfbs.models import ConvNet, MixtureOfExperts  # pip install -e . from github.com/lab-rasool/TFBS

ckpt = hf_hub_download("Lab-Rasool/ENCODE-TFBS", "experts/GATA3.pth")
state = torch.load(ckpt, map_location="cpu", weights_only=True)

See the GitHub repo's README and experiments/hetmoe/ for the full caching + gating pipeline that rebuilds the heterogeneous MoE from these checkpoints. DNABERT-6 features are derived on the fly from zhihan1996/DNA_bert_6; no BERT weights are stored here.

Reproducibility

The ConvNet conv bias (wRect) is a saved nn.Parameter and expert order is pinned to tfbs.constants.TRAIN_TFS, so re-running evaluation from these checkpoints is byte-identical on a given machine (minor device-numerics differences may remain across machines).

License

cc-by-nd-4.0, matching the ENCODE-TFBS dataset. The underlying ENCODE data are from the ENCODE Project.