ntv3 / README.md
bernardo-de-almeida's picture
feat: improve title
88d1cd8
|
raw
history blame
2.28 kB
metadata
title: NTv3  Foundation Models for Long-Range Genomics
emoji: 🧬
colorFrom: indigo
colorTo: blue
sdk: static
pinned: false

NTv3 — Foundation Models for Long-Range Genomics

This Space is the companion hub for NTv3 checkpoints on the Hugging Face Hub. It provides PyTorch notebooks and minimal examples for inference, sequence-to-function prediction (functional tracks), genome annotation, fine-tuning, model interpretation and sequence generation.

Notebooks

Notebooks live in ./notebooks/:

  • 00_quickstart_inference.ipynb — load a checkpoint + run inference
  • 01_tracks_prediction.ipynb — sequence → functional tracks (+ plotting)
  • 02_genome_annotation_segmentation.ipynb — sequence → annotation
  • 03_finetune_head.ipynb — fine-tune on a bigwig track
  • 04_model_interpretation.ipynb — interpretation of post-trained model
  • 05_sequence_generation.ipynb — fine-tune NTv3 to generate enhancer sequences

Install

pip install torch transformers accelerate safetensors huggingface_hub numpy

Load a model (To DO)


Pipelines (To DO)

from transformers import pipeline
import torch

pipe = pipeline(
    task="ntv3-tracks",
    model="InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb",
    trust_remote_code=True,
    device="cuda",
    torch_dtype=torch.bfloat16,
)

out = pipe("ACGT...")

Checkpoints

Pre-trained: InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb, InstaDeepAI/ntv3_106M_7downsample_pretrained_le_1mb, InstaDeepAI/ntv3_650M_7downsample_pretrained_le_1mb

Post-trained: InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb, InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb

Links

Citation

@article{ntv3,
  title   = {A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction},
  author  = {…},
  journal = {…},
  year    = {…}
}

License

Code & notebooks in this Space: (choose and add, e.g., Apache-2.0)

Model weights: see the license specified in each model repository