Spaces:

InstaDeepAI
/

ntv3

Running

App Files Files Community

bernardo-de-almeida commited on Dec 11, 2025

Commit

b7abdc5

1 Parent(s): c7d398c

feat: improve style of notebooks

Browse files

Files changed (2) hide show

notebooks/00_quickstart_inference.ipynb +7 -7
notebooks/01_tracks_prediction.ipynb +14 -14

notebooks/00_quickstart_inference.ipynb CHANGED Viewed

@@ -5,7 +5,7 @@
       "id": "024bb8a8",
       "metadata": {},
       "source": [
-        "# NTv3 Quickstart — Pre-trained and Post-trained models\n",
         "\n",
         "This notebook demonstrates how to run **quick inference** with both the pre- and post-trained NTv3 checkpoints:\n",
         "\n",
@@ -18,7 +18,7 @@
         "2. Run a forward pass on a DNA sequence window\n",
         "3. Inspect key outputs\n",
         "\n",
-        "> **Note for Google Colab users:** This notebook is compatible with Colab! For faster inference, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended)."
       ]
     },
     {
@@ -26,7 +26,7 @@
       "id": "5d58bf1d",
       "metadata": {},
       "source": [
-        "## 0) Colab Setup (if running on Google Colab)\n",
         "\n",
         "This cell detects if you're running on Google Colab and sets up the environment accordingly."
       ]
@@ -46,7 +46,7 @@
       "id": "5827af7e",
       "metadata": {},
       "source": [
-        "## 1) Imports + setup"
       ]
     },
     {
@@ -95,7 +95,7 @@
       "id": "82146876",
       "metadata": {},
       "source": [
-        "## 2) Pre-trained checkpoint (MLM-focused)\n",
         "\n",
         "This shows the simplest usage: load model + tokenizer, then run a forward pass.\n",
         "\n",
@@ -285,7 +285,7 @@
       "id": "60a01798",
       "metadata": {},
       "source": [
-        "## 3) Post-trained checkpoint (task heads: BigWig + BED)\n",
         "\n",
         "Post-trained checkpoints add task-specific heads.\n",
         "\n",
@@ -298,7 +298,7 @@
         "- `bed_tracks_logits`\n",
         "- `logits` (MLM)\n",
         "\n",
-        "> If your post-trained checkpoint supports multiple assemblies, the config typically exposes a mapping like `cfg.bigwigs_per_file_assembly`."
       ]
     },
     {

       "id": "024bb8a8",
       "metadata": {},
       "source": [
+        "# 🚀 NTv3 Quickstart — Pre-trained and Post-trained models\n",
         "\n",
         "This notebook demonstrates how to run **quick inference** with both the pre- and post-trained NTv3 checkpoints:\n",
         "\n",
         "2. Run a forward pass on a DNA sequence window\n",
         "3. Inspect key outputs\n",
         "\n",
+        "> 📝 **Note for Google Colab users:** This notebook is compatible with Colab! For faster inference, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended)."
       ]
     },
     {
       "id": "5d58bf1d",
       "metadata": {},
       "source": [
+        "## 0) ⚙️ Colab Setup (if running on Google Colab)\n",
         "\n",
         "This cell detects if you're running on Google Colab and sets up the environment accordingly."
       ]
       "id": "5827af7e",
       "metadata": {},
       "source": [
+        "## 1) 📦 Imports + setup"
       ]
     },
     {
       "id": "82146876",
       "metadata": {},
       "source": [
+        "## 2) 🎯 Pre-trained checkpoint (MLM-focused)\n",
         "\n",
         "This shows the simplest usage: load model + tokenizer, then run a forward pass.\n",
         "\n",
       "id": "60a01798",
       "metadata": {},
       "source": [
+        "## 3) 🧠 Post-trained checkpoint (task heads: BigWig + BED)\n",
         "\n",
         "Post-trained checkpoints add task-specific heads.\n",
         "\n",
         "- `bed_tracks_logits`\n",
         "- `logits` (MLM)\n",
         "\n",
+        "> 💡 If your post-trained checkpoint supports multiple assemblies, the config typically exposes a mapping like `cfg.bigwigs_per_file_assembly`."
       ]
     },
     {

notebooks/01_tracks_prediction.ipynb CHANGED Viewed

@@ -5,13 +5,13 @@
       "id": "7adaa9f8",
       "metadata": {},
       "source": [
-        "# NTv3 Post-Trained Inference on Human Genomic Windows\n",
         "\n",
         "This notebook demonstrates how to use the **NTv3 post-trained model** to predict functional genomics tracks and genomic element annotations from DNA sequences.\n",
         "\n",
-        "> **Note for Google Colab users:** This notebook is compatible with Colab! For faster inference, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended).\n",
         "\n",
-        "## Overview\n",
         "\n",
         "Given a genomic window from the **human genome (hg38)**, the model performs inference and generates:\n",
         "\n",
@@ -19,7 +19,7 @@
         "- **Genomic element annotations** (`bed_tracks_logits`): Classification predictions for genomic elements such as genes, exons, introns, splice sites, promoters, enhancers, and more\n",
         "- **Masked Language Model logits** (`logits`): Standard transformer language model outputs\n",
         "\n",
-        "## Notebook Structure\n",
         "\n",
         "1. **Setup**: Install dependencies and define the genomic window of interest\n",
         "2. **Data Loading**: Download and fetch the chromosome sequence from UCSC\n",
@@ -27,7 +27,7 @@
         "4. **Inference**: Run the model on the genomic window to generate predictions\n",
         "5. **Visualization**: Plot selected functional tracks and genomic element predictions together in a unified view\n",
         "\n",
-        "## Additional Features\n",
         "\n",
         "- Supports multiple NTv3 post-trained models\n",
         "- Supports the 24 species that NTv3 was post-trained on"
@@ -89,7 +89,7 @@
       "id": "19db4774",
       "metadata": {},
       "source": [
-        "## 1) Imports + configuration\n",
         "\n",
         "Set your NTv3 model and genomic window here"
       ]
@@ -162,7 +162,7 @@
       "id": "94b54a99",
       "metadata": {},
       "source": [
-        "## 2) Fetch chromosome sequence for the chosen window"
       ]
     },
     {
@@ -229,7 +229,7 @@
       "id": "9f82945c",
       "metadata": {},
       "source": [
-        "## 3) Load NTv3 model + tokenizers"
       ]
     },
     {
@@ -302,7 +302,7 @@
       "id": "70413b72",
       "metadata": {},
       "source": [
-        "## 4) Tokenize the window and run inference\n",
         "\n",
         "We pass:\n",
         "\n",
@@ -360,7 +360,7 @@
       "id": "b8423e62",
       "metadata": {},
       "source": [
-        "## 5) Plot functional tracks and genome annotation predictions\n",
         "\n",
         "This plots track probabilities for selected functional tracks and genomic elements.\n",
         "\n",
@@ -391,12 +391,12 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 15,
       "id": "717539e2",
       "metadata": {},
       "outputs": [],
       "source": [
-        "### Select functional tracks to plot\n",
         "tracks_to_plot = {\n",
         "    \"K562 RNA-seq\": \"ENCSR056HPM\",\n",
         "    \"K562 DNAse\": \"ENCSR921NMD\",\n",
@@ -416,7 +416,7 @@
         "        f\"Available tracks: {bigwig_names}\"\n",
         "    )\n",
         "    \n",
-        "### Select genomic elements to plot\n",
         "elements_to_plot = [\n",
         "    \"protein_coding_gene\",\n",
         "    \"exon\",\n",
@@ -491,7 +491,7 @@
       "id": "1ce34dc4",
       "metadata": {},
       "source": [
-        "# To improve\n",
         "- Add gene annotation at top"
       ]
     }

       "id": "7adaa9f8",
       "metadata": {},
       "source": [
+        "# 🧬 NTv3 Post-Trained Inference on Human Genomic Windows\n",
         "\n",
         "This notebook demonstrates how to use the **NTv3 post-trained model** to predict functional genomics tracks and genomic element annotations from DNA sequences.\n",
         "\n",
+        "> 📝 **Note for Google Colab users:** This notebook is compatible with Colab! For faster inference, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended).\n",
         "\n",
+        "## 📋 Overview\n",
         "\n",
         "Given a genomic window from the **human genome (hg38)**, the model performs inference and generates:\n",
         "\n",
         "- **Genomic element annotations** (`bed_tracks_logits`): Classification predictions for genomic elements such as genes, exons, introns, splice sites, promoters, enhancers, and more\n",
         "- **Masked Language Model logits** (`logits`): Standard transformer language model outputs\n",
         "\n",
+        "## 📚 Notebook Structure\n",
         "\n",
         "1. **Setup**: Install dependencies and define the genomic window of interest\n",
         "2. **Data Loading**: Download and fetch the chromosome sequence from UCSC\n",
         "4. **Inference**: Run the model on the genomic window to generate predictions\n",
         "5. **Visualization**: Plot selected functional tracks and genomic element predictions together in a unified view\n",
         "\n",
+        "## ✨ Additional Features\n",
         "\n",
         "- Supports multiple NTv3 post-trained models\n",
         "- Supports the 24 species that NTv3 was post-trained on"
       "id": "19db4774",
       "metadata": {},
       "source": [
+        "## 1) 📦 Imports + configuration\n",
         "\n",
         "Set your NTv3 model and genomic window here"
       ]
       "id": "94b54a99",
       "metadata": {},
       "source": [
+        "## 2) 📥 Fetch chromosome sequence for the chosen window"
       ]
     },
     {
       "id": "9f82945c",
       "metadata": {},
       "source": [
+        "## 3) 🤖 Load NTv3 model + tokenizers"
       ]
     },
     {
       "id": "70413b72",
       "metadata": {},
       "source": [
+        "## 4) ⚡ Tokenize the window and run inference\n",
         "\n",
         "We pass:\n",
         "\n",
       "id": "b8423e62",
       "metadata": {},
       "source": [
+        "## 5) 📊 Plot functional tracks and genome annotation predictions\n",
         "\n",
         "This plots track probabilities for selected functional tracks and genomic elements.\n",
         "\n",
     },
     {
       "cell_type": "code",
+      "execution_count": null,
       "id": "717539e2",
       "metadata": {},
       "outputs": [],
       "source": [
+        "### 🎯 Select functional tracks to plot\n",
         "tracks_to_plot = {\n",
         "    \"K562 RNA-seq\": \"ENCSR056HPM\",\n",
         "    \"K562 DNAse\": \"ENCSR921NMD\",\n",
         "        f\"Available tracks: {bigwig_names}\"\n",
         "    )\n",
         "    \n",
+        "### 🧬 Select genomic elements to plot\n",
         "elements_to_plot = [\n",
         "    \"protein_coding_gene\",\n",
         "    \"exon\",\n",
       "id": "1ce34dc4",
       "metadata": {},
       "source": [
+        "# 💡 To improve\n",
         "- Add gene annotation at top"
       ]
     }