Spaces:

InstaDeepAI
/

ntv3

Running

App Files Files Community

bernardo-de-almeida commited on Dec 24, 2025

Commit

3b6e7d5

1 Parent(s): 507be8b

feat: add annotation notebook

Browse files

Files changed (5) hide show

notebooks_tutorials/{02_fine_tuning_pretrained_model.ipynb → 02_fine_tuning_pretrained_model_biwig.ipynb} +3 -4
notebooks_tutorials/{03_fine_tuning_posttrained_model.ipynb → 03_fine_tuning_posttrained_model_biwig.ipynb} +1 -1
notebooks_tutorials/04_fine_tuning_pretrained_model_annotation.ipynb +0 -0
notebooks_tutorials/{04_model_interpretation.ipynb → 05_model_interpretation.ipynb} +0 -0
tabs/home.html +19 -15

notebooks_tutorials/{02_fine_tuning_pretrained_model.ipynb → 02_fine_tuning_pretrained_model_biwig.ipynb} RENAMED Viewed

@@ -10,10 +10,9 @@
     "\n",
     "📊 We provide access to the NTv3-benchmark data that we released on our Hugging Face dataset: `InstaDeepAI/NTv3_benchmark_dataset`. In this repository, you will find ready-to-use genome FASTA files, Bigwig tracks, metadata, but also the splits that were used for the benchmark.\n",
     "\n",
-    "**🔧 Main Simplifications**: Compared to the full supervised tracks pipeline, this notebook simplifies several aspects to enable faster iteration:\n",
-    "- **Random sequence sampling**: The dataset randomly samples sequences from chromosomes/regions on-the-fly, rather than using pre-computed sliding windows\n",
     "- **Constant learning rate**: Uses a fixed learning rate throughout training without learning rate scheduling\n",
-    "- **No gradient accumulation**: Implements simple step-based training without gradient accumulation, making the training loop more straightforward\n",
     "\n",
     "**⚡ Key Advantage**: This simplified pipeline achieves close performance to more complex training approaches while enabling fast fine-tuning: on a H100 GPU and using 16 workers for data loading, it takes ~15min to reach acceptable performances for a 32kb functional tracks prediction task on **NTv3_8M_pre** model. The training speed benefits from the efficient NTv3 model architecture, but of course depends on your hardware capabilities (GPU acceleration and multi-worker data loading significantly reduce training time)."
    ]
@@ -24,7 +23,7 @@
    "source": [
     "## 💻 A note on hardware\n",
     "\n",
-    "While this pipeline is designed to run on limited resources (e.g., Google Colab with a T4 GPU and 2CPUs), the mentioned training time or displayed performances (see **Test evaluation** section) was obtained on a more powerful setup. If you want to reach similar performance levels, you should be aware that you'll need **significant hardware resources** (high-end GPUs with substantial memory and multiple data loading workers). Training times will vary significantly based on your hardware configuration.\n",
     "\n",
     "📝 Note for Google Colab users: This notebook is compatible with Colab and designed to work with limited resources! For faster training, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended)."
    ]

     "\n",
     "📊 We provide access to the NTv3-benchmark data that we released on our Hugging Face dataset: `InstaDeepAI/NTv3_benchmark_dataset`. In this repository, you will find ready-to-use genome FASTA files, Bigwig tracks, metadata, but also the splits that were used for the benchmark.\n",
     "\n",
+    "**🔧 Main Simplifications**: Compared to the full supervised tracks pipeline used in the paper, this notebook simplifies several aspects to enable faster experimentation with limited resources for users:\n",
     "- **Constant learning rate**: Uses a fixed learning rate throughout training without learning rate scheduling\n",
+    "- **No gradient accumulation**: Implements simple step-based training without gradient accumulation, making the training loop more straightforward but changing the effective batch size compared with the full pipeline\n",
     "\n",
     "**⚡ Key Advantage**: This simplified pipeline achieves close performance to more complex training approaches while enabling fast fine-tuning: on a H100 GPU and using 16 workers for data loading, it takes ~15min to reach acceptable performances for a 32kb functional tracks prediction task on **NTv3_8M_pre** model. The training speed benefits from the efficient NTv3 model architecture, but of course depends on your hardware capabilities (GPU acceleration and multi-worker data loading significantly reduce training time)."
    ]
    "source": [
     "## 💻 A note on hardware\n",
     "\n",
+    "While this pipeline is designed to run on limited resources (e.g., Google Colab with a T4 GPU and 2CPUs), the mentioned training time or displayed performances (see **Test evaluation** section) was obtained on a more powerful setup and is shown just as a reference. If you want to reach similar performance levels or the ones reported in the paper, you should be aware that you'll need **significant hardware resources** (high-end GPUs with substantial memory and multiple data loading workers). Training times will vary significantly based on your hardware configuration.\n",
     "\n",
     "📝 Note for Google Colab users: This notebook is compatible with Colab and designed to work with limited resources! For faster training, make sure to enable GPU: Runtime → Change runtime type → GPU (T4 or better recommended)."
    ]

notebooks_tutorials/{03_fine_tuning_posttrained_model.ipynb → 03_fine_tuning_posttrained_model_biwig.ipynb} RENAMED Viewed

@@ -10,7 +10,7 @@
     "\n",
     "**🎯 Notebook purpose:**\n",
     "This notebook is configured to train the `NTv3_650M_post` model on the `human` species from the NTv3 benchmark dataset. To run this training, you will need a large GPU (either A100 or H100).\n",
-    "For a simplified version of this notebook that uses the `NTv3_8M_pre` model and runs on a CPU, please see the [02_fine_tuning_pretrained_model.ipynb](https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/02_fine_tuning_pretrained_model.ipynb) notebook.\n",
     "The notebook uses the same \"simplified setup\" as described there. \n",
     "\n",
     "📝 Note for Google Colab users: This notebook is compatible with Colab! This notebook is designed to be run on a high-performance GPU. The default parameters can be used with a H100 with 80GB of HBM."

     "\n",
     "**🎯 Notebook purpose:**\n",
     "This notebook is configured to train the `NTv3_650M_post` model on the `human` species from the NTv3 benchmark dataset. To run this training, you will need a large GPU (either A100 or H100).\n",
+    "For a simplified version of this notebook that uses the `NTv3_8M_pre` model and runs on a CPU, please see the [02_fine_tuning_pretrained_model_biwig.ipynb](https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/02_fine_tuning_pretrained_model_biwig.ipynb) notebook.\n",
     "The notebook uses the same \"simplified setup\" as described there. \n",
     "\n",
     "📝 Note for Google Colab users: This notebook is compatible with Colab! This notebook is designed to be run on a high-performance GPU. The default parameters can be used with a H100 with 80GB of HBM."

notebooks_tutorials/04_fine_tuning_pretrained_model_annotation.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks_tutorials/{04_model_interpretation.ipynb → 05_model_interpretation.ipynb} RENAMED Viewed

File without changes

tabs/home.html CHANGED Viewed

@@ -84,11 +84,12 @@
       <ul>
         <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/00_quickstart_inference.ipynb" target="_blank" rel="noopener noreferrer">🚀 00 — Quickstart inference</a></li>
         <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/01_tracks_prediction.ipynb" target="_blank" rel="noopener noreferrer">📊 01 — Tracks prediction</a></li>
-        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/02_fine_tuning_pretrained_model.ipynb" target="_blank" rel="noopener noreferrer">🎯 02 — Fine-tune a pre-trained model on bigwig tracks</a></li>
-        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/03_fine_tuning_posttrained_model.ipynb" target="_blank" rel="noopener noreferrer">🎯 03 — Fine-tune a post-trained model on bigwig tracks</a></li>
-        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/04_model_interpretation.ipynb" target="_blank" rel="noopener noreferrer">🔍 04 — Model interpretation</a></li>
-        <li>🧪 05 — Training NTv3-generative <em>(coming soon)</em></li>
-        <li>🪰 06 — Generating enhancer sequences <em>(coming soon)</em></li>
       </ul>
     </div>
     <div class="card">
@@ -111,10 +112,11 @@
     </div>
   </div>
-  <div class="card">
-    <h2>🤖 Load a pre-trained model</h2>
-    <p>Here is an example of how to load and use a pre-trained NTv3 model.</p>
-    <div class="code"><pre><code class="language-python">from transformers import AutoTokenizer, AutoModelForMaskedLM
 model_name = "InstaDeepAI/NTv3_650M_pre"
@@ -131,11 +133,12 @@ out = model(**batch)
 # Print output shapes
 print(out.logits.shape)       # (B, L, V = 11)
 </code></pre></div>
-    <p>Model embeddings can be used for fine-tuning on downstream tasks.</p>
-    <h2 style="margin-top: 40px;">🔍 Model interpretation</h2>
-    <p>Here is an example of how to use the interpretation pipeline on the NTv3 post-trained model for multi-scale analysis of DNA sequences:</p>
-    <div class="code"><pre><code class="language-python">from transformers import pipeline
 import torch
 import matplotlib.pyplot as plt
@@ -168,7 +171,8 @@ plt.show()
 result.plot_saliency(window_size=128)
 plt.show()
 </code></pre></div>
-<img src="assets/saliency_example.png" alt="Output tracks visualization" style="max-width: 100%; margin-top: 20px;" />
   </div>
   <div class="card">

       <ul>
         <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/00_quickstart_inference.ipynb" target="_blank" rel="noopener noreferrer">🚀 00 — Quickstart inference</a></li>
         <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/01_tracks_prediction.ipynb" target="_blank" rel="noopener noreferrer">📊 01 — Tracks prediction</a></li>
+        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/02_fine_tuning_pretrained_model_biwig.ipynb" target="_blank" rel="noopener noreferrer">🎯 02 — Fine-tune a pre-trained model on bigwig tracks</a></li>
+        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/03_fine_tuning_posttrained_model_biwig.ipynb" target="_blank" rel="noopener noreferrer">🎯 03 — Fine-tune a post-trained model on bigwig tracks</a></li>
+        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/04_fine_tuning_pretrained_model_annotation.ipynb" target="_blank" rel="noopener noreferrer">🏷️ 04 — Fine-tune a pre-trained model on annotations</a></li>
+        <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks_tutorials/05_model_interpretation.ipynb" target="_blank" rel="noopener noreferrer">🔍 05 — Model interpretation</a></li>
+        <li>🧪 06 — Training NTv3-generative <em>(coming soon)</em></li>
+        <li>🪰 07 — Generating enhancer sequences <em>(coming soon)</em></li>
       </ul>
     </div>
     <div class="card">
     </div>
   </div>
+  <div class="card-stack">
+    <div class="card">
+      <h2>🤖 Load a pre-trained model</h2>
+      <p>Here is an example of how to load and use a pre-trained NTv3 model.</p>
+      <div class="code"><pre><code class="language-python">from transformers import AutoTokenizer, AutoModelForMaskedLM
 model_name = "InstaDeepAI/NTv3_650M_pre"
 # Print output shapes
 print(out.logits.shape)       # (B, L, V = 11)
 </code></pre></div>
+      <p>Model embeddings can be used for fine-tuning on downstream tasks.</p>
+    </div>
+    <div class="card">
+      <h2>🔍 Model interpretation</h2>
+      <p>Here is an example of how to use the interpretation pipeline on the NTv3 post-trained model for multi-scale analysis of DNA sequences:</p>
+      <div class="code"><pre><code class="language-python">from transformers import pipeline
 import torch
 import matplotlib.pyplot as plt
 result.plot_saliency(window_size=128)
 plt.show()
 </code></pre></div>
+      <img src="assets/saliency_example.png" alt="Output tracks visualization" style="max-width: 100%; margin-top: 20px;" />
+    </div>
   </div>
   <div class="card">