Spaces:

InstaDeepAI
/

ntv3

Running

App Files Files Community

bernardo-de-almeida commited on Dec 11, 2025

Commit

1fb2a3c

1 Parent(s): df00fce

feat: improve main page

Browse files

Files changed (2) hide show

README.md +57 -29
index.html +88 -34

README.md CHANGED Viewed

@@ -7,63 +7,91 @@ sdk: static
 pinned: false
 ---
-# NTv3 — Foundation Models for Long-Range Genomics
 This Space is the companion hub for NTv3 checkpoints on the Hugging Face Hub. It provides PyTorch notebooks and minimal examples for inference, sequence-to-function prediction (functional tracks), genome annotation, fine-tuning, model interpretation and sequence generation.
-## Notebooks
 Notebooks live in `./notebooks/`:
-- `00_quickstart_inference.ipynb` — load a checkpoint + run inference
-- `01_tracks_prediction.ipynb` — sequence → functional tracks (+ plotting)
-- `02_genome_annotation_segmentation.ipynb` — sequence → annotation
-- `03_finetune_head.ipynb` — fine-tune on bigwig tracks
-- `04_model_interpretation.ipynb` — interpretation of post-trained model
-- `05_sequence_generation.ipynb` — fine-tune NTv3 to generate enhancer sequences
-## Install
 ```bash
 pip install torch transformers accelerate safetensors huggingface_hub numpy
 ```
-## Load a model (To DO)
 ```python
 ```
-## Pipelines (To DO)
 ```python
-from transformers import pipeline
-import torch
-pipe = pipeline(
-    task="ntv3-tracks",
-    model="InstaDeepAI/NTv3_650M",
-    trust_remote_code=True,
-    device="cuda",
-    torch_dtype=torch.bfloat16,
 )
-out = pipe("ACGT...")
 ```
-## Checkpoints
-**Pre-trained:** `InstaDeepAI/NTv3_8M_pre`, `InstaDeepAI/NTv3_100M_pre`, `InstaDeepAI/NTv3_650M_pre`
-**Post-trained:** `InstaDeepAI/NTv3_100M`, `InstaDeepAI/NTv3_650M`
-## Links
-- **Paper:** (add link)
-- **JAX research code (GitHub):** [https://github.com/instadeepai/nucleotide-transformer](https://github.com/instadeepai/nucleotide-transformer)
-## Citation
 ```bibtex
 @article{ntv3,
@@ -74,7 +102,7 @@ out = pipe("ACGT...")
 }
 ```
-## License
 **Code & notebooks in this Space:** (choose and add, e.g., Apache-2.0)

 pinned: false
 ---
+# 🧬 NTv3 — Foundation Models for Long-Range Genomics
 This Space is the companion hub for NTv3 checkpoints on the Hugging Face Hub. It provides PyTorch notebooks and minimal examples for inference, sequence-to-function prediction (functional tracks), genome annotation, fine-tuning, model interpretation and sequence generation.
+## 📖 About NTv3
+NTv3 is a multi-species genomic foundation model family that unifies representation learning, functional-track prediction, genome annotation, and controllable sequence generation within a single U-Net-style backbone. It models up to 1 Mb of DNA at single-base resolution, using a conv–Transformer–deconv architecture that efficiently captures both local motifs and long-range regulatory dependencies. NTv3 is first pretrained on ~9T base pairs from the OpenGenome2 corpus spanning >128k species using masked language modeling, and then post-trained with a joint objective on ~16k functional tracks and annotation labels across 24 animal and plant species, enabling state-of-the-art cross-species functional prediction and base-resolution genome annotation.
+Beyond prediction, NTv3 can be fine-tuned into a controllable generative model via masked-diffusion language modeling, allowing targeted design of regulatory sequences (for example, enhancers with specified activity and promoter selectivity) that have been validated experimentally.
+## 📓 Notebooks
 Notebooks live in `./notebooks/`:
+- 🚀 `00_quickstart_inference.ipynb` — load a checkpoint + run inference
+- 📊 `01_tracks_prediction.ipynb` — sequence → functional tracks (+ plotting)
+- 🏷️ `02_genome_annotation_segmentation.ipynb` — sequence → annotation
+- 🎯 `03_finetune_head.ipynb` — fine-tune on bigwig tracks
+- 🔍 `04_model_interpretation.ipynb` — interpretation of post-trained model
+- 🧪 `05_sequence_generation.ipynb` — fine-tune NTv3 to generate enhancer sequences
+## 📦 Install
 ```bash
 pip install torch transformers accelerate safetensors huggingface_hub numpy
 ```
+## 🤖 Load a pre-trained model
 ```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+repo = "InstaDeepAI/NTv3_650M_pre"
+tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
+model = AutoModelForMaskedLM.from_pretrained(repo, trust_remote_code=True)
+batch = tok(["ATCGNATCG", "ACGT"], add_special_tokens=False, padding=True, pad_to_multiple_of=128, return_tensors="pt")
+out = model(**batch, output_hidden_states=True, output_attentions=True)
+print(out.logits.shape)       # (B, L, V = 11)
+print(len(out.hidden_states)) # convs + transformers + deconvs
+print(len(out.attentions))    # equals transformer layers = 12
 ```
+## 💻 Pipelines
+Here is a quick example of how to use the post-trained NTv3 650M model on a human genomic window.
 ```python
+from transformers import AutoConfig
+model_name = "InstaDeepAI/NTv3_100M"
+# Load track prediction pipeline
+cfg = AutoConfig.from_pretrained(model_name, trust_remote_code=True, force_download=True)
+pipe = cfg.load_tracks_pipeline(model_name, device="auto")  # or "cpu"/"cuda"/"mps"
+# Run track prediction
+out = pipe(
+    {
+        "chrom": "chr19",
+        "start": 6_700_000,
+        "end": 6_831_072,
+        "species": "human"
+    }
 )
+print(out.bigwig_tracks_logits.shape)   # functional track predictions
+print(out.bed_tracks_logits.shape)      # genome annotation predictions
+print(out.mlm_logits.shape)             # MLM logits: (B, L, V = 11)
 ```
+## 🤖 Checkpoints
+**📦 Pre-trained:** `InstaDeepAI/NTv3_8M_pre`, `InstaDeepAI/NTv3_100M_pre`, `InstaDeepAI/NTv3_650M_pre`
+**🎯 Post-trained:** `InstaDeepAI/NTv3_100M`, `InstaDeepAI/NTv3_650M`
+## 🔗 Links
+- **📄 Paper:** (add link)
+- **💻 JAX research code (GitHub):** [https://github.com/instadeepai/nucleotide-transformer](https://github.com/instadeepai/nucleotide-transformer)
+- **🏆 NTv3 benchmark leaderboard: (add link)**
+## 📝 Citation
 ```bibtex
 @article{ntv3,
 }
 ```
+## 📜 License
 **Code & notebooks in this Space:** (choose and add, e.g., Apache-2.0)

index.html CHANGED Viewed

@@ -5,6 +5,7 @@
   <meta name="viewport" content="width=device-width,initial-scale=1" />
   <title>NTv3 — Foundation Models for Long-Range Genomics</title>
   <meta name="description" content="NTv3 companion hub: PyTorch notebooks for inference, fine-tuning, interpretation, and sequence generation on NTv3 models hosted on Hugging Face." />
   <style>
     :root {
       --bg: #0b1020;
@@ -85,6 +86,36 @@
       font-size: inherit;
       color: inherit;
     }
     .paper-summary {
       margin-top: 12px;
       padding: 24px;
@@ -114,79 +145,100 @@
 <body>
   <div class="wrap">
     <div class="hero">
-      <h1>NTv3 — Foundation Models for Long-Range Genomics</h1>
       <p>
         This Space is the companion hub for <strong>NTv3</strong> models: runnable notebooks for inference, fine-tuning, interpretation, and sequence generation.
       </p>
       <div class="pillrow">
-        <span class="pill">Foundation Models</span>
-		<span class="pill">Long-context genomics</span>
-		<span class="pill">Multi-species</span>
-        <span class="pill">Inference • Fine-tune • Interpret • Generate</span>
-        <span class="pill">Torch notebooks</span>
       </div>
     </div>
     <div class="grid">
       <div class="card">
-        <h2>Models</h2>
         <ul>
-          <li>Pretrained checkpoints (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>):
             <div style="margin-top: 8px; margin-left: 0;">
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_8M_pre"><code>InstaDeepAI/NTv3_8M_pre</code></a></div>
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_100M_pre"><code>InstaDeepAI/NTv3_100M_pre</code></a></div>
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_650M_pre"><code>InstaDeepAI/NTv3_650M_pre</code></a></div>
             </div>
           </li>
-          <li>Post-trained checkpoints:
             <div style="margin-top: 8px; margin-left: 0;">
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb"><code>InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb</code></a></div>
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb"><code>InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb</code></a></div>
             </div>
           </li>
         </ul>
       </div>
 	  <div class="card">
-        <h2>Notebooks</h2>
         <ul>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">Browse notebooks folder</a></li>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">00 — Quickstart inference</a></li>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">01 — Tracks prediction</a></li>
-          <li>02 — Genome annotation / segmentation</li>
-          <li>03 — Fine-tune on bigwig tracks</li>
-          <li>04 — Model interpretation</li>
-          <li>05 — Sequence generation</li>
         </ul>
       </div>
       <div class="card">
-        <h2>Model usage (to update)</h2>
-        <p>Here is a quick example of how to use NTv3 models.</p>
-        <div class="code"><code>from transformers import pipeline
-pipe = pipeline(
-    task="ntv3-tracks",
-    model="InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb",
-    trust_remote_code=True,
-    device="cuda",
-    torch_dtype=torch.bfloat16,
-)</code></div>
       </div>
       <div class="card">
-        <h2>Links</h2>
         <ul>
-          <li>Paper: (add link)</li>
-          <li><a href="https://github.com/instadeepai/nucleotide-transformer">JAX model code (GitHub)</a></li>
-		  <li>NTv3 benchmark leaderboard: (add link)</li>
         </ul>
       </div>
     </div>
     <div class="paper-summary">
-		<h2>A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
 		<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
     </div>
@@ -194,5 +246,7 @@ pipe = pipeline(
       © instadeep-ai — NTv3 companion Space.
     </p>
   </div>
 </body>
 </html>

   <meta name="viewport" content="width=device-width,initial-scale=1" />
   <title>NTv3 — Foundation Models for Long-Range Genomics</title>
   <meta name="description" content="NTv3 companion hub: PyTorch notebooks for inference, fine-tuning, interpretation, and sequence generation on NTv3 models hosted on Hugging Face." />
+  <link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism-tomorrow.min.css" rel="stylesheet" />
   <style>
     :root {
       --bg: #0b1020;
       font-size: inherit;
       color: inherit;
     }
+    /* Prism.js theme overrides to match dark theme */
+    .code pre[class*="language-"] {
+      background: transparent;
+      margin: 0;
+      padding: 0;
+    }
+    .code code[class*="language-"] {
+      background: transparent;
+    }
+    .summary {
+      margin-top: 18px;
+      padding: 24px;
+      border: 1px solid var(--border);
+      background: var(--card);
+      border-radius: var(--radius);
+      box-shadow: var(--shadow);
+    }
+    .summary h2 {
+      margin: 0 0 16px 0;
+      font-size: 18px;
+      letter-spacing: 0.01em;
+    }
+    .summary p {
+      margin: 0 0 14px 0;
+      color: var(--muted);
+      line-height: 1.7;
+    }
+    .summary p:last-child {
+      margin-bottom: 0;
+    }
     .paper-summary {
       margin-top: 12px;
       padding: 24px;
 <body>
   <div class="wrap">
     <div class="hero">
+      <h1>🧬 NTv3 — Foundation Models for Long-Range Genomics</h1>
       <p>
         This Space is the companion hub for <strong>NTv3</strong> models: runnable notebooks for inference, fine-tuning, interpretation, and sequence generation.
       </p>
       <div class="pillrow">
+        <span class="pill">🤖 Foundation Models</span>
+		<span class="pill">🧬 Long-context genomics</span>
+		<span class="pill">🌍 Multi-species</span>
+        <span class="pill">⚡ Inference • Fine-tune • Interpret • Generate</span>
+        <span class="pill">📓 Torch notebooks</span>
       </div>
     </div>
+    <div class="summary">
+      <h2>📖 About NTv3</h2>
+      <p>
+        NTv3 is a multi-species genomic foundation model family that unifies representation learning, functional-track prediction, genome annotation, and controllable sequence generation within a single U-Net-style backbone. It models up to 1 Mb of DNA at single-base resolution, using a conv–Transformer–deconv architecture that efficiently captures both local motifs and long-range regulatory dependencies. NTv3 is first pretrained on ~9T base pairs from the OpenGenome2 corpus spanning >128k species using masked language modeling, and then post-trained with a joint objective on ~16k functional tracks and annotation labels across 24 animal and plant species, enabling state-of-the-art cross-species functional prediction and base-resolution genome annotation.
+      </p>
+      <p>
+        Beyond prediction, NTv3 can be fine-tuned into a controllable generative model via masked-diffusion language modeling, allowing targeted design of regulatory sequences (for example, enhancers with specified activity and promoter selectivity) that have been validated experimentally.
+      </p>
+    </div>
     <div class="grid">
       <div class="card">
+        <h2>🤖 Models (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>)</h2>
         <ul>
+          <li>📦 Pretrained checkpoints:
             <div style="margin-top: 8px; margin-left: 0;">
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_8M_pre"><code>InstaDeepAI/NTv3_8M_pre</code></a></div>
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_100M_pre"><code>InstaDeepAI/NTv3_100M_pre</code></a></div>
               <div><a href="https://huggingface.co/InstaDeepAI/NTv3_650M_pre"><code>InstaDeepAI/NTv3_650M_pre</code></a></div>
             </div>
           </li>
+          <li>🎯 Post-trained checkpoints:
             <div style="margin-top: 8px; margin-left: 0;">
+              <div><a href="https://huggingface.co/InstaDeepAI/NTv3_100M"><code>InstaDeepAI/NTv3_100M</code></a></div>
+              <div><a href="https://huggingface.co/InstaDeepAI/NTv3_650M"><code>InstaDeepAI/NTv3_650M</code></a></div>
             </div>
           </li>
         </ul>
       </div>
 	  <div class="card">
+        <h2>📓 Notebooks (browse <a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">folder</a>)</h2>
         <ul>
+          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">🚀 00 — Quickstart inference</a></li>
+          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">📊 01 — Tracks prediction</a></li>
+          <li>🏷️ 02 — Genome annotation / segmentation</li>
+          <li>🎯 03 — Fine-tune on bigwig tracks</li>
+          <li>🔍 04 — Model interpretation</li>
+          <li>🧪 05 — Sequence generation</li>
         </ul>
       </div>
       <div class="card">
+        <h2>💻 Model usage</h2>
+        <p>Here is a quick example of how to use the post-trained NTv3 650M model on a human genomic window.</p>
+        <div class="code"><pre><code class="language-python">from transformers import AutoConfig
+model_name = "InstaDeepAI/NTv3_650M"
+# Load track prediction pipeline
+cfg = AutoConfig.from_pretrained(model_name, trust_remote_code=True, force_download=True)
+pipe = cfg.load_tracks_pipeline(model_name, device="auto")  # or "cpu"/"cuda"/"mps"
+# Run track prediction
+out = pipe(
+    {
+        "chrom": "chr19",
+        "start": 6_700_000,
+        "end": 6_831_072,
+        "species": "human"
+    }
+)
+print(out.bigwig_tracks_logits.shape)   # functional track predictions
+print(out.bed_tracks_logits.shape)      # genome annotation predictions
+print(out.mlm_logits.shape)             # MLM logits: (B, L, V = 11)</code></pre></div>
       </div>
       <div class="card">
+        <h2>🔗 Links</h2>
         <ul>
+          <li>📄 Paper: (add link)</li>
+          <li><a href="https://github.com/instadeepai/nucleotide-transformer">💻 JAX model code (GitHub)</a></li>
+		  <li>🏆 NTv3 benchmark leaderboard: (add link)</li>
         </ul>
       </div>
     </div>
     <div class="paper-summary">
+		<h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
 		<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
     </div>
       © instadeep-ai — NTv3 companion Space.
     </p>
   </div>
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-core.min.js"></script>
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/plugins/autoloader/prism-autoloader.min.js"></script>
 </body>
 </html>