Ctrl-DNA × NTV3-650M per-cell enhancer generators (T1)

Lagrangian primal-dual reinforcement-learning fine-tunes of InstaDeepAI's NTv3 backbones for cell-type-specific enhancer generation in a 7-cell brain panel (Excitatory, Inhibitory, OPC, Astrocyte, Oligodendrocyte, Microglia, Endothelial). One model is trained per target cell; the on-target oracle drives the reward, and the six off-target oracles drive the constraint terms. Following the Ctrl-DNA recipe (Chen et al., 2025; bowang-lab/Ctrl-DNA), 2000 RL steps × per-cell on a stratified 35,000-row T1 subset (train.enhancer_generation.strat7c.n35k).

Folder layout

Each cell type has its own subfolder; load the policy via regureasoner.baselines.ctrl_dna.policy.CtrlDNAPolicy.load(...) with the matching NTv3 backbone snapshot.

Ex/   best.pt             # full policy state dict (NTv3 + lambda + step)
      metrics.json        # 28-column eval against gold n7k held-out test
      samples_generated.jsonl  # ~64 generated enhancers from final policy
      training_log.jsonl  # per-step train metrics (loss / reward / lambda)
In/   ...
OPC/  ...
Ast/  ...
Oli/  ...
Mic/  ...
End/  ...

Headline eval (held-out gold n7k, oracle-based)

Higher argmax_acc and specificity are better; lower fid is better. Random argmax baseline = 0.143 (1/7 cells).

cell parse_rate argmax_acc on_target off_target specificity FID
Ex 1.000 0.609 0.026 -21.278 21.304 28.729
In 1.000 0.156 -15.064 -18.658 3.594 48.876
OPC 1.000 0.078 -21.557 -18.071 -3.486 87.056
Ast 1.000 0.016 -24.481 -17.239 -7.241 99.053
Oli 1.000 0.234 -10.657 -18.704 8.047 40.697
Mic 1.000 0.250 -9.934 -19.493 9.559 40.670
End 1.000 0.016 -35.284 -15.215 -20.069 279.892

Training recipe (verbatim)

# Per-cell run (one of seven, NTv3-650M shown — 100M variant
# uses NTV3_SNAPSHOT_PATH=/extra/.../ntv3_local/100m_post)
TARGET_CELL=Ex N_RL_STEPS=2000 BATCH_SIZE=8 \
  NTV3_SNAPSHOT_PATH=/extra/zhanglab0/INDV/pengchx3/ntv3_local/generative \
  sbatch slurm/run_ctrl_dna_t1.sh

See the upstream Ctrl-DNA repository for the algorithmic background and the lab port at explcre/biomodel_reasoning_calling_study2 for our extension (NTv3 backbone, oracle reward, 7-cell panel).

Citation

@misc{dnathinker_ctrl_dna_per_cell_2026,
  title = {Per-cell Ctrl-DNA enhancer generators on the NTv3 backbone},
  author = {Xu, Pengcheng and the DNAThinker team},
  year = {2026},
  howpublished = {\url{https://huggingface.co/explcre/ctrl_dna_*_per_cell_t1}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support