Add canonical HetMoE checkpoints (ConvNet experts, hyperparams, ConvNet MoE, genomic zoo seeds 0/1/42) + model card (#1)

a4c5685 about 14 hours ago

preview code

Raw

History Blame Contribute Delete

3.73 kB

	---
	license: cc-by-nd-4.0
	tags:
	- TFBS
	- transcription-factor
	- genomics
	- mixture-of-experts
	- DNA
	library_name: pytorch
	pretty_name: ENCODE-TFBS Heterogeneous Mixture-of-Experts checkpoints
	datasets:
	- Lab-Rasool/ENCODE-TFBS
	---

	# ENCODE-TFBS — Heterogeneous Mixture-of-Experts checkpoints

	Trained model checkpoints for **robust transcription-factor binding-site (TFBS) prediction with a
	heterogeneous Mixture-of-Experts (MoE)*. A dense, soft MoE gates over per-expert embeddings* from a
	heterogeneous expert zoo (modified-DeepBIND ConvNet + DeepSEA + DanQ, each probed to a
	common 32-dim embedding), which improves out-of-distribution (OOD) generalization to unseen
	transcription factors.

	These weights back the paper *"Robust Transcription Factor Binding Site Prediction and Explainability
	Using a Heterogeneous Mixture of Experts Architecture."* Code, training and evaluation pipeline:
	https://github.com/lab-rasool/TFBS. Training/eval data: [Lab-Rasool/ENCODE-TFBS](https://huggingface.co/datasets/Lab-Rasool/ENCODE-TFBS).

	## Headline result (genomic, fair-negative protocol, 7 training factors)

	Feeding the unchanged embedding-gating MoE a heterogeneous expert pool beats a fine-tuned DNABERT-6
	baseline on the motif-bearing OOD strata, averaged over seeds 0/1/42:

	\| Model \| OOD AUC (mean ± std) \|
	\|---\|---\|
	\| HetMoE (this work) \| 0.821 ± 0.005 \|
	\| DNABERT-6 \| 0.799 ± 0.008 \|

	Margin +0.022. Per-seed: seed 42 → 0.827, seed 0 → 0.818, seed 1 → 0.819.

	## Repository contents

	```
	experts/<TF>.pth 7 ConvNet experts (modified DeepBIND), one per training TF
	hyperparams/<TF>.pth per-expert hyperparameters (reproduce training without Optuna)
	moe/moe_model.pth homogeneous ConvNet-only MoE gate (+ moe_model_config.pth)
	zoo/seed{0,1,42}/ heterogeneous zoo probes — DeepSEA_<TF>.pth, DanQ_<TF>.pth
	(E=32 FeatureProbeExpert heads over frozen DeepSEA/DanQ trunks)
	```

	The genomic HetMoE for a given seed is the 21-expert pool: the 7 `experts/` ConvNets plus the
	14 `zoo/seed<N>/` DeepSEA + DanQ probes, with the `MixtureOfExperts` gate applied unchanged over the
	concatenated 32-dim embeddings. Only the three paper seeds (0, 1, 42) are published here.

	Transcription factors. Training: `ARID3A, FOXM1, GATA3, JUND, MAX, GABPA, SP1`. OOD evaluation is
	stratified into within-family, cross-family (e.g. CTCF, STAT3), cell-line-transfer, and a separately
	reported non-motif appendix — see `tfbs/constants.py` in the code repo.

	## Usage

	Install the `tfbs` package and load with the provided classes (`map_location` handles CPU-only nodes):

	```python
	from huggingface_hub import hf_hub_download
	import torch
	from tfbs.models import ConvNet, MixtureOfExperts # pip install -e . from github.com/lab-rasool/TFBS

	ckpt = hf_hub_download("Lab-Rasool/ENCODE-TFBS", "experts/GATA3.pth")
	state = torch.load(ckpt, map_location="cpu", weights_only=True)
	```

	See the GitHub repo's README and `experiments/hetmoe/` for the full caching + gating pipeline that
	rebuilds the heterogeneous MoE from these checkpoints. DNABERT-6 features are derived on the fly from
	`zhihan1996/DNA_bert_6`; no BERT weights are stored here.

	## Reproducibility

	The ConvNet conv bias (`wRect`) is a saved `nn.Parameter` and expert order is pinned to
	`tfbs.constants.TRAIN_TFS`, so re-running evaluation from these checkpoints is byte-identical on a
	given machine (minor device-numerics differences may remain across machines).

	## License

	`cc-by-nd-4.0`, matching the [ENCODE-TFBS dataset](https://huggingface.co/datasets/Lab-Rasool/ENCODE-TFBS).
	The underlying ENCODE data are from the ENCODE Project.
	</content>
	</invoke>