Spaces:
Running
Running
Commit
·
82d9d50
1
Parent(s):
8e984cf
feat: improve main page
Browse files- .gitattributes +1 -0
- assets/output_tracks.png +3 -0
- index.html +31 -3
.gitattributes
CHANGED
|
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
assets/paper_summary.jpg filter=lfs diff=lfs merge=lfs -text
|
| 37 |
assets/paper_summary.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
assets/paper_summary.jpg filter=lfs diff=lfs merge=lfs -text
|
| 37 |
assets/paper_summary.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
assets/output_tracks.png filter=lfs diff=lfs merge=lfs -text
|
assets/output_tracks.png
ADDED
|
Git LFS Details
|
index.html
CHANGED
|
@@ -233,6 +233,11 @@
|
|
| 233 |
</p>
|
| 234 |
</div>
|
| 235 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
<div class="why-ntv3">
|
| 237 |
<h2>✨ Why NTv3?</h2>
|
| 238 |
<ul>
|
|
@@ -348,11 +353,14 @@ print(out.logits.shape) # (B, L, V = 11)
|
|
| 348 |
print(len(out.hidden_states)) # convs + transformers + deconvs
|
| 349 |
print(len(out.attentions)) # equals transformer layers = 12
|
| 350 |
</code></pre></div>
|
|
|
|
|
|
|
|
|
|
| 351 |
</div>
|
| 352 |
|
| 353 |
<div class="card">
|
| 354 |
<h2>💻 Use a post-trained model</h2>
|
| 355 |
-
<p>Here is a quick example of how to use the post-trained NTv3 650M model
|
| 356 |
<div class="code"><pre><code class="language-python">from transformers import AutoConfig
|
| 357 |
|
| 358 |
model_name = "InstaDeepAI/NTv3_650M"
|
|
@@ -375,13 +383,33 @@ out = pipe(
|
|
| 375 |
print(out.bigwig_tracks_logits.shape) # functional track predictions
|
| 376 |
print(out.bed_tracks_logits.shape) # genome annotation predictions
|
| 377 |
print(out.mlm_logits.shape) # MLM logits: (B, L, V = 11)</code></pre></div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 378 |
</div>
|
| 379 |
</div>
|
| 380 |
|
| 381 |
-
<div class="paper-summary">
|
| 382 |
<h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
|
| 383 |
<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
|
| 384 |
-
</div>
|
| 385 |
|
| 386 |
<p class="footer">
|
| 387 |
© instadeep-ai — NTv3 companion Space.
|
|
|
|
| 233 |
</p>
|
| 234 |
</div>
|
| 235 |
|
| 236 |
+
<div class="paper-summary">
|
| 237 |
+
<!-- <h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2> -->
|
| 238 |
+
<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
|
| 239 |
+
</div>
|
| 240 |
+
|
| 241 |
<div class="why-ntv3">
|
| 242 |
<h2>✨ Why NTv3?</h2>
|
| 243 |
<ul>
|
|
|
|
| 353 |
print(len(out.hidden_states)) # convs + transformers + deconvs
|
| 354 |
print(len(out.attentions)) # equals transformer layers = 12
|
| 355 |
</code></pre></div>
|
| 356 |
+
<p>Model embeddings can be used for fine-tuning on downstream tasks.</p>
|
| 357 |
+
|
| 358 |
+
<p style="margin-top: 40px;">TO DO: add pipeline for fine-tuning on functional tracks or genome annotation.</p>
|
| 359 |
</div>
|
| 360 |
|
| 361 |
<div class="card">
|
| 362 |
<h2>💻 Use a post-trained model</h2>
|
| 363 |
+
<p>Here is a quick example of how to use the post-trained NTv3 650M model to predict tracks for a human genomic window.</p>
|
| 364 |
<div class="code"><pre><code class="language-python">from transformers import AutoConfig
|
| 365 |
|
| 366 |
model_name = "InstaDeepAI/NTv3_650M"
|
|
|
|
| 383 |
print(out.bigwig_tracks_logits.shape) # functional track predictions
|
| 384 |
print(out.bed_tracks_logits.shape) # genome annotation predictions
|
| 385 |
print(out.mlm_logits.shape) # MLM logits: (B, L, V = 11)</code></pre></div>
|
| 386 |
+
<p>Predictions can also be plotted for a subset of functional tracks and genomic elements:</p>
|
| 387 |
+
<div class="code"><pre><code class="language-python">tracks_to_plot = {
|
| 388 |
+
"K562 RNA-seq": "ENCSR056HPM",
|
| 389 |
+
"K562 DNAse": "ENCSR921NMD",
|
| 390 |
+
"K562 H3k4me3": "ENCSR000DWD",
|
| 391 |
+
"K562 CTCF": "ENCSR000AKO",
|
| 392 |
+
"HepG2 RNA-seq": "ENCSR561FEE_P",
|
| 393 |
+
"HepG2 DNAse": "ENCSR000EJV",
|
| 394 |
+
"HepG2 H3k4me3": "ENCSR000AMP",
|
| 395 |
+
"HepG2 CTCF": "ENCSR000BIE",
|
| 396 |
+
}
|
| 397 |
+
elements_to_plot = ["protein_coding_gene", "exon", "intron", "splice_donor", "splice_acceptor"]
|
| 398 |
+
|
| 399 |
+
out = pipe(
|
| 400 |
+
{"chrom": "chr19", "start": 6_700_000, "end": 6_831_072, "species": "human"},
|
| 401 |
+
plot=True,
|
| 402 |
+
tracks_to_plot=tracks_to_plot,
|
| 403 |
+
elements_to_plot=elements_to_plot,
|
| 404 |
+
)</code></pre></div>
|
| 405 |
+
<img src="assets/output_tracks.png" alt="Output tracks visualization" style="max-width: 100%; margin-top: 20px;" />
|
| 406 |
</div>
|
| 407 |
</div>
|
| 408 |
|
| 409 |
+
<!-- <div class="paper-summary">
|
| 410 |
<h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
|
| 411 |
<img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
|
| 412 |
+
</div> -->
|
| 413 |
|
| 414 |
<p class="footer">
|
| 415 |
© instadeep-ai — NTv3 companion Space.
|