Spaces:

InstaDeepAI
/

ntv3

Running

App Files Files Community

bernardo-de-almeida commited on Dec 11, 2025

Commit

df00fce

1 Parent(s): a3047bf

fix model links

Browse files

Files changed (4) hide show

README.md +4 -4
index.html +5 -4
notebooks/00_quickstart_inference.ipynb +11 -9
notebooks/01_tracks_prediction.ipynb +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ Notebooks live in `./notebooks/`:
 - `00_quickstart_inference.ipynb` — load a checkpoint + run inference
 - `01_tracks_prediction.ipynb` — sequence → functional tracks (+ plotting)
 - `02_genome_annotation_segmentation.ipynb` — sequence → annotation
-- `03_finetune_head.ipynb` — fine-tune on a bigwig track
 - `04_model_interpretation.ipynb` — interpretation of post-trained model
 - `05_sequence_generation.ipynb` — fine-tune NTv3 to generate enhancer sequences
@@ -43,7 +43,7 @@ import torch
 pipe = pipeline(
     task="ntv3-tracks",
-    model="InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb",
     trust_remote_code=True,
     device="cuda",
     torch_dtype=torch.bfloat16,
@@ -54,9 +54,9 @@ out = pipe("ACGT...")
 ## Checkpoints
-**Pre-trained:** `InstaDeepAI/ntv3_8M_pre`, `InstaDeepAI/ntv3_100M_pre`, `InstaDeepAI/ntv3_650M_pre`
-**Post-trained:** `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`
 ## Links

 - `00_quickstart_inference.ipynb` — load a checkpoint + run inference
 - `01_tracks_prediction.ipynb` — sequence → functional tracks (+ plotting)
 - `02_genome_annotation_segmentation.ipynb` — sequence → annotation
+- `03_finetune_head.ipynb` — fine-tune on bigwig tracks
 - `04_model_interpretation.ipynb` — interpretation of post-trained model
 - `05_sequence_generation.ipynb` — fine-tune NTv3 to generate enhancer sequences
 pipe = pipeline(
     task="ntv3-tracks",
+    model="InstaDeepAI/NTv3_650M",
     trust_remote_code=True,
     device="cuda",
     torch_dtype=torch.bfloat16,
 ## Checkpoints
+**Pre-trained:** `InstaDeepAI/NTv3_8M_pre`, `InstaDeepAI/NTv3_100M_pre`, `InstaDeepAI/NTv3_650M_pre`
+**Post-trained:** `InstaDeepAI/NTv3_100M`, `InstaDeepAI/NTv3_650M`
 ## Links

index.html CHANGED Viewed

@@ -134,9 +134,9 @@
         <ul>
           <li>Pretrained checkpoints (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>):
             <div style="margin-top: 8px; margin-left: 0;">
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_8M_pre"><code>InstaDeepAI/ntv3_8M_pre</code></a></div>
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_100M_pre"><code>InstaDeepAI/ntv3_100M_pre</code></a></div>
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_650M_pre"><code>InstaDeepAI/ntv3_650M_pre</code></a></div>
             </div>
           </li>
           <li>Post-trained checkpoints:
@@ -179,7 +179,8 @@ pipe = pipeline(
         <h2>Links</h2>
         <ul>
           <li>Paper: (add link)</li>
-          <li><a href="https://github.com/instadeepai/nucleotide-transformer">JAX training code</a></li>
         </ul>
       </div>
     </div>

         <ul>
           <li>Pretrained checkpoints (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>):
             <div style="margin-top: 8px; margin-left: 0;">
+              <div><a href="https://huggingface.co/InstaDeepAI/NTv3_8M_pre"><code>InstaDeepAI/NTv3_8M_pre</code></a></div>
+              <div><a href="https://huggingface.co/InstaDeepAI/NTv3_100M_pre"><code>InstaDeepAI/NTv3_100M_pre</code></a></div>
+              <div><a href="https://huggingface.co/InstaDeepAI/NTv3_650M_pre"><code>InstaDeepAI/NTv3_650M_pre</code></a></div>
             </div>
           </li>
           <li>Post-trained checkpoints:
         <h2>Links</h2>
         <ul>
           <li>Paper: (add link)</li>
+          <li><a href="https://github.com/instadeepai/nucleotide-transformer">JAX model code (GitHub)</a></li>
+		  <li>NTv3 benchmark leaderboard: (add link)</li>
         </ul>
       </div>
     </div>

notebooks/00_quickstart_inference.ipynb CHANGED Viewed

@@ -9,8 +9,8 @@
         "\n",
         "This notebook demonstrates how to run **quick inference** with both the pre- and post-trained NTv3 checkpoints:\n",
         "\n",
-        "- **Pre-trained (MLM-focused):** `InstaDeepAI/ntv3_8M_pre`, `InstaDeepAI/ntv3_100M_pre`, `InstaDeepAI/ntv3_650M_pre`\n",
-        "- **Post-trained (task heads):** `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`\n",
         "\n",
         "We show how to:\n",
         "\n",
@@ -105,7 +105,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 14,
       "id": "336bb40c",
       "metadata": {},
       "outputs": [
@@ -260,7 +260,7 @@
         }
       ],
       "source": [
-        "pretrained_model_name = \"InstaDeepAI/ntv3_8M_pre\"\n",
         "\n",
         "# Load tokenizer/model\n",
         "tok_pre = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)\n",
@@ -318,7 +318,7 @@
         }
       ],
       "source": [
-        "posttrained_model_name = \"InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb\"\n",
         "\n",
         "# Load config/tokenizers/model\n",
         "cfg_pos = AutoConfig.from_pretrained(posttrained_model_name, trust_remote_code=True)\n",
@@ -345,10 +345,12 @@
         "    output_attentions=True,\n",
         ")\n",
         "\n",
-        "# Access model outputs\n",
-        "print(out[\"bigwig_tracks_logits\"].shape)  # per-assembly functional track predictions\n",
-        "print(out[\"bed_tracks_logits\"].shape)     # genomic element classifications\n",
-        "print(out[\"logits\"].shape)                # masked LM logits"
       ]
     }
   ],

         "\n",
         "This notebook demonstrates how to run **quick inference** with both the pre- and post-trained NTv3 checkpoints:\n",
         "\n",
+        "- **Pre-trained (MLM-focused):** `InstaDeepAI/NTv3_8M_pre`, `InstaDeepAI/NTv3_100M_pre`, `InstaDeepAI/NTv3_650M_pre`\n",
+        "- **Post-trained (task heads):** `InstaDeepAI/NTv3_100M`, `InstaDeepAI/NTv3_650M`\n",
         "\n",
         "We show how to:\n",
         "\n",
     },
     {
       "cell_type": "code",
+      "execution_count": null,
       "id": "336bb40c",
       "metadata": {},
       "outputs": [
         }
       ],
       "source": [
+        "pretrained_model_name = \"InstaDeepAI/NTv3_8M_pre\"\n",
         "\n",
         "# Load tokenizer/model\n",
         "tok_pre = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)\n",
         }
       ],
       "source": [
+        "posttrained_model_name = \"InstaDeepAI/NTv3_100M\"\n",
         "\n",
         "# Load config/tokenizers/model\n",
         "cfg_pos = AutoConfig.from_pretrained(posttrained_model_name, trust_remote_code=True)\n",
         "    output_attentions=True,\n",
         ")\n",
         "\n",
+        "# 7k human tracks over 37.5 % center region of the input sequence\n",
+        "print(\"bigwig_tracks_logits:\", out[\"bigwig_tracks_logits\"].shape)\n",
+        "# Location of 21 genomic elements over 37.5 % center region of the input sequence\n",
+        "print(\"bed_tracks_logits:\", out[\"bed_tracks_logits\"].shape)\n",
+        "# Language model logits for whole sequence over vocabulary\n",
+        "print(\"language model logits:\", out[\"logits\"].shape)"
       ]
     }
   ],

notebooks/01_tracks_prediction.ipynb CHANGED Viewed

@@ -112,7 +112,7 @@
         "# -----------------------------\n",
         "# User inputs\n",
         "# -----------------------------\n",
-        "model_name = \"InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb\" # options: \"InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb\" or \"InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb_v2\"\n",
         "\n",
         "# Example window from a given species (edit these) - needs to be multiple of 128 due to the model downsampling\n",
         "assembly = \"hg38\"\n",

         "# -----------------------------\n",
         "# User inputs\n",
         "# -----------------------------\n",
+        "model_name = \"InstaDeepAI/NTv3_100M\" # options: \"InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb\" or \"InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb_v2\"\n",
         "\n",
         "# Example window from a given species (edit these) - needs to be multiple of 128 due to the model downsampling\n",
         "assembly = \"hg38\"\n",