🧬 NTv3 β€” Foundation Models for Long-Range Genomics

This Space is the companion hub for NTv3 models: runnable notebooks for inference, fine-tuning, interpretation, and sequence generation.

πŸ€– Foundation Models 🧬 Long-context genomics 🌍 Multi-species ⚑ Inference β€’ Fine-tune β€’ Interpret β€’ Generate πŸ““ Torch notebooks

πŸ“– About NTv3

NTv3 is a multi-species genomic foundation model family that unifies representation learning, functional-track prediction, genome annotation, and controllable sequence generation within a single U-Net-style backbone. It models up to 1 Mb of DNA at single-base resolution, using a conv–Transformer–deconv architecture that efficiently captures both local motifs and long-range regulatory dependencies. NTv3 is first pretrained on ~9T base pairs from the OpenGenome2 corpus spanning >128k species using masked language modeling, and then post-trained with a joint objective on ~16k functional tracks and annotation labels across 24 animal and plant species, enabling state-of-the-art cross-species functional prediction and base-resolution genome annotation.

Beyond prediction, NTv3 can be fine-tuned into a controllable generative model via masked-diffusion language modeling, allowing targeted design of regulatory sequences (for example, enhancers with specified activity and promoter selectivity) that have been validated experimentally.

πŸ€– Models (see collection)

πŸ““ Notebooks (browse folder)

πŸ’» Model usage

Here is a quick example of how to use the post-trained NTv3 650M model on a human genomic window.

from transformers import AutoConfig

model_name = "InstaDeepAI/NTv3_650M"

# Load track prediction pipeline
cfg = AutoConfig.from_pretrained(model_name, trust_remote_code=True, force_download=True)
pipe = cfg.load_tracks_pipeline(model_name, device="auto")  # or "cpu"/"cuda"/"mps"

# Run track prediction
out = pipe(
    {
        "chrom": "chr19",
        "start": 6_700_000,
        "end": 6_831_072,
        "species": "human"
    }
)

print(out.bigwig_tracks_logits.shape)   # functional track predictions
print(out.bed_tracks_logits.shape)      # genome annotation predictions
print(out.mlm_logits.shape)             # MLM logits: (B, L, V = 11)

πŸ”— Links

πŸ“„ A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction

NTv3 Paper Summary