Ctrl-DNA × NTV3-100M-POST per-cell enhancer generators (T1)

Lagrangian primal-dual reinforcement-learning fine-tunes of InstaDeepAI's NTv3 backbones for cell-type-specific enhancer generation in a 7-cell brain panel (Excitatory, Inhibitory, OPC, Astrocyte, Oligodendrocyte, Microglia, Endothelial). One model is trained per target cell; the on-target oracle drives the reward, and the six off-target oracles drive the constraint terms. Following the Ctrl-DNA recipe (Chen et al., 2025; bowang-lab/Ctrl-DNA), 2000 RL steps × per-cell on a stratified 35,000-row T1 subset (train.enhancer_generation.strat7c.n35k).

Folder layout

Each cell type has its own subfolder; load the policy via regureasoner.baselines.ctrl_dna.policy.CtrlDNAPolicy.load(...) with the matching NTv3 backbone snapshot.

Ex/   best.pt             # full policy state dict (NTv3 + lambda + step)
      metrics.json        # 28-column eval against gold n7k held-out test
      samples_generated.jsonl  # ~64 generated enhancers from final policy
      training_log.jsonl  # per-step train metrics (loss / reward / lambda)
In/   ...
OPC/  ...
Ast/  ...
Oli/  ...
Mic/  ...
End/  ...

Headline eval (held-out gold n7k, oracle-based)

Higher argmax_acc and specificity are better; lower fid is better. Random argmax baseline = 0.143 (1/7 cells).

cell parse_rate argmax_acc on_target off_target specificity FID
Ex 1.000 0.625 0.045 -20.283 20.329 22.946
In 1.000 0.625 -1.379 -15.172 13.793 48.750
OPC 1.000 0.000 -2.984 -3.627 0.644 342.561
Ast 1.000 0.109 -18.677 -15.324 -3.354 90.369
Oli 1.000 0.703 2.115 -2.494 4.609 433.018
Mic 1.000 0.000 -6.625 -10.303 3.677 210.108
End 1.000 0.562 2.611 -1.775 4.386 432.451

Training recipe (verbatim)

# Per-cell run (one of seven, NTv3-650M shown — 100M variant
# uses NTV3_SNAPSHOT_PATH=/extra/.../ntv3_local/100m_post)
TARGET_CELL=Ex N_RL_STEPS=2000 BATCH_SIZE=8 \
  NTV3_SNAPSHOT_PATH=/extra/zhanglab0/INDV/pengchx3/ntv3_local/generative \
  sbatch slurm/run_ctrl_dna_t1.sh

See the upstream Ctrl-DNA repository for the algorithmic background and the lab port at explcre/biomodel_reasoning_calling_study2 for our extension (NTv3 backbone, oracle reward, 7-cell panel).

Citation

@misc{dnathinker_ctrl_dna_per_cell_2026,
  title = {Per-cell Ctrl-DNA enhancer generators on the NTv3 backbone},
  author = {Xu, Pengcheng and the DNAThinker team},
  year = {2026},
  howpublished = {\url{https://huggingface.co/explcre/ctrl_dna_*_per_cell_t1}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support