Spaces:

InstaDeepAI
/

ntv3

Running

App Files Files Community

bernardo-de-almeida commited on Dec 10, 2025

Commit

66b0297

1 Parent(s): cb9b9a5

feat: improve main page and notebooks

Browse files

Files changed (4) hide show

README.md +1 -1
index.html +44 -19
notebooks/00_quickstart_inference.ipynb +142 -12
notebooks/01_tracks_prediction.ipynb +0 -0

README.md CHANGED Viewed

@@ -54,7 +54,7 @@ out = pipe("ACGT...")
 ## Checkpoints
-**Pre-trained:** `InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb`, `InstaDeepAI/ntv3_106M_7downsample_pretrained_le_1mb`, `InstaDeepAI/ntv3_650M_7downsample_pretrained_le_1mb`
 **Post-trained:** `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`

 ## Checkpoints
+**Pre-trained:** `InstaDeepAI/ntv3_8M_pre`, `InstaDeepAI/ntv3_100M_pre`, `InstaDeepAI/ntv3_650M_pre`
 **Post-trained:** `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`

index.html CHANGED Viewed

@@ -85,6 +85,24 @@
       font-size: inherit;
       color: inherit;
     }
     .footer { margin-top: 22px; color: var(--muted); font-size: 13px; }
     @media (max-width: 860px) {
       .card { grid-column: span 12; }
@@ -102,34 +120,23 @@
       </p>
       <div class="pillrow">
-        <span class="pill">Long-context genomics</span>
-        <span class="pill">Torch notebooks</span>
         <span class="pill">Inference • Fine-tune • Interpret • Generate</span>
       </div>
     </div>
     <div class="grid">
-      <div class="card">
-        <h2>Notebooks</h2>
-        <ul>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">Browse notebooks folder</a></li>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">00 — Quickstart inference</a></li>
-          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">01 — Tracks prediction</a></li>
-          <li>02 — Genome annotation / segmentation</li>
-          <li>03 — Fine-tune a head</li>
-          <li>04 — Model interpretation</li>
-          <li>05 — Sequence generation</li>
-        </ul>
-      </div>
       <div class="card">
         <h2>Models</h2>
         <ul>
-          <li>Pretrained checkpoints:
             <div style="margin-top: 8px; margin-left: 0;">
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb"><code>InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb</code></a></div>
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_106M_7downsample_pretrained_le_1mb"><code>InstaDeepAI/ntv3_106M_7downsample_pretrained_le_1mb</code></a></div>
-              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_650M_7downsample_pretrained_le_1mb"><code>InstaDeepAI/ntv3_650M_7downsample_pretrained_le_1mb</code></a></div>
             </div>
           </li>
           <li>Post-trained checkpoints:
@@ -141,6 +148,19 @@
         </ul>
       </div>
       <div class="card">
         <h2>Model usage (to update)</h2>
         <p>Here is a quick example of how to use NTv3 models.</p>
@@ -164,6 +184,11 @@ pipe = pipeline(
       </div>
     </div>
     <p class="footer">
       © instadeep-ai — NTv3 companion Space.
     </p>

       font-size: inherit;
       color: inherit;
     }
+    .paper-summary {
+      margin-top: 12px;
+      padding: 24px;
+      border: 1px solid var(--border);
+      background: var(--card);
+      border-radius: var(--radius);
+      box-shadow: var(--shadow);
+    }
+    .paper-summary h2 {
+      text-align: center;
+      margin: 0 0 20px 0;
+    }
+    .paper-summary img {
+      width: 100%;
+      height: auto;
+      display: block;
+      border-radius: 12px;
+    }
     .footer { margin-top: 22px; color: var(--muted); font-size: 13px; }
     @media (max-width: 860px) {
       .card { grid-column: span 12; }
       </p>
       <div class="pillrow">
+        <span class="pill">Foundation Models</span>
+		<span class="pill">Long-context genomics</span>
+		<span class="pill">Multi-species</span>
         <span class="pill">Inference • Fine-tune • Interpret • Generate</span>
+        <span class="pill">Torch notebooks</span>
       </div>
     </div>
     <div class="grid">
       <div class="card">
         <h2>Models</h2>
         <ul>
+          <li>Pretrained checkpoints (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>):
             <div style="margin-top: 8px; margin-left: 0;">
+              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_8M_pre"><code>InstaDeepAI/ntv3_8M_pre</code></a></div>
+              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_100M_pre"><code>InstaDeepAI/ntv3_100M_pre</code></a></div>
+              <div><a href="https://huggingface.co/InstaDeepAI/ntv3_650M_pre"><code>InstaDeepAI/ntv3_650M_pre</code></a></div>
             </div>
           </li>
           <li>Post-trained checkpoints:
         </ul>
       </div>
+	  <div class="card">
+        <h2>Notebooks</h2>
+        <ul>
+          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">Browse notebooks folder</a></li>
+          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">00 — Quickstart inference</a></li>
+          <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">01 — Tracks prediction</a></li>
+          <li>02 — Genome annotation / segmentation</li>
+          <li>03 — Fine-tune a head</li>
+          <li>04 — Model interpretation</li>
+          <li>05 — Sequence generation</li>
+        </ul>
+      </div>
       <div class="card">
         <h2>Model usage (to update)</h2>
         <p>Here is a quick example of how to use NTv3 models.</p>
       </div>
     </div>
+    <div class="paper-summary">
+		<h2>A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
+		<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
+    </div>
     <p class="footer">
       © instadeep-ai — NTv3 companion Space.
     </p>

notebooks/00_quickstart_inference.ipynb CHANGED Viewed

@@ -7,9 +7,9 @@
       "source": [
         "# NTv3 Quickstart — Pre-trained and Post-trained models\n",
         "\n",
-        "This notebook demonstrates how to run **quick inference** with bothe pre- and post-trained NTv3 checkpoints:\n",
         "\n",
-        "- **Pre-trained (MLM-focused):** `InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb`, `InstaDeepAI/ntv3_106M_7downsample_pretrained_le_1mb`, `InstaDeepAI/ntv3_650M_ntv3_650M_7downsample_pretrained_le_1mb7downsample_pre_trained_1mb`\n",
         "- **Post-trained (task heads):** `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`\n",
         "\n",
         "We show how to:\n",
@@ -103,32 +103,162 @@
     },
     {
       "cell_type": "code",
-      "execution_count": null,
       "id": "336bb40c",
       "metadata": {},
       "outputs": [
         {
-          "name": "stdout",
           "output_type": "stream",
           "text": [
-            "torch.Size([2, 128, 11])\n",
-            "16\n",
-            "2\n",
-            "MLM logits shape: (2, 128, 11)\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "/opt/anaconda3/envs/hf-finetune/lib/python3.10/site-packages/torch/amp/autocast_mode.py:283: UserWarning: In CPU autocast, but the target dtype is not supported. Disabling autocast.\n",
-            "CPU Autocast only supports dtype of torch.bfloat16, torch.float16 currently.\n",
-            "  warnings.warn(error_message)\n"
           ]
         }
       ],
       "source": [
-        "pretrained_model_name = \"InstaDeepAI/ntv3_8M_7downsample_pretrained_le_1mb\"\n",
         "\n",
         "# Load tokenizer/model\n",
         "tok_pre = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)\n",

       "source": [
         "# NTv3 Quickstart — Pre-trained and Post-trained models\n",
         "\n",
+        "This notebook demonstrates how to run **quick inference** with both the pre- and post-trained NTv3 checkpoints:\n",
         "\n",
+        "- **Pre-trained (MLM-focused):** `InstaDeepAI/ntv3_8M_pre`, `InstaDeepAI/ntv3_100M_pre`, `InstaDeepAI/ntv3_650M_pre`\n",
         "- **Post-trained (task heads):** `InstaDeepAI/ntv3_106M_7downsample_post_trained_1mb`, `InstaDeepAI/ntv3_650M_7downsample_post_trained_1mb`\n",
         "\n",
         "We show how to:\n",
     },
     {
       "cell_type": "code",
+      "execution_count": 14,
       "id": "336bb40c",
       "metadata": {},
       "outputs": [
         {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "411ee47e94ae467f9685c35b65e3e52d",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "tokenizer_config.json:   0%|          | 0.00/1.48k [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "30447edb44b849bd936290f3a6b1b863",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "tokenization_ntv3.py:   0%|          | 0.00/12.0k [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
           "output_type": "stream",
           "text": [
+            "A new version of the following files was downloaded from https://huggingface.co/InstaDeepAI/ntv3_base_model:\n",
+            "- tokenization_ntv3.py\n",
+            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
           ]
         },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "766f183dcc84421588e5cf0241d3efe7",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "vocab.json:   0%|          | 0.00/138 [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "b0db83f7cb824d3288a30bebf7891a63",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "special_tokens_map.json:   0%|          | 0.00/149 [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "33cf5391dcc549f088e4e927651d1cdb",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "config.json:   0%|          | 0.00/1.70k [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "85772d5369234ca286cfa518e1725b12",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "configuration_ntv3.py:   0%|          | 0.00/5.90k [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
+            "A new version of the following files was downloaded from https://huggingface.co/InstaDeepAI/ntv3_base_model:\n",
+            "- configuration_ntv3.py\n",
+            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "ec1153d073e444c5b255ee5adea6ba68",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "modeling_ntv3_base.py:   0%|          | 0.00/33.9k [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "A new version of the following files was downloaded from https://huggingface.co/InstaDeepAI/ntv3_base_model:\n",
+            "- modeling_ntv3_base.py\n",
+            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "94b9bb7fe0da4f4994adb9127d9af7e6",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "model.safetensors:   0%|          | 0.00/30.8M [00:00<?, ?B/s]"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "torch.Size([2, 128, 11])\n",
+            "16\n",
+            "2\n",
+            "MLM logits shape: (2, 128, 11)\n"
           ]
         }
       ],
       "source": [
+        "pretrained_model_name = \"InstaDeepAI/ntv3_8M_pre\"\n",
         "\n",
         "# Load tokenizer/model\n",
         "tok_pre = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True)\n",

notebooks/01_tracks_prediction.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff