bernardo-de-almeida commited on
Commit
c7d398c
·
1 Parent(s): 1fb2a3c

feat: improve main page

Browse files
Files changed (1) hide show
  1. index.html +166 -29
index.html CHANGED
@@ -54,9 +54,49 @@
54
  border-radius: var(--radius);
55
  box-shadow: 0 6px 18px rgba(0,0,0,0.22);
56
  }
 
 
 
 
 
 
 
 
 
 
57
  .card h2 { margin: 0 0 10px 0; font-size: 16px; letter-spacing: 0.01em; }
58
  .card ul { margin: 0; padding-left: 18px; color: var(--muted); }
59
  .card li { margin: 8px 0; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  a { color: var(--link); text-decoration: none; }
61
  a:hover { text-decoration: underline; }
62
  .pillrow { display: flex; gap: 8px; flex-wrap: wrap; margin-top: 8px; }
@@ -116,6 +156,30 @@
116
  .summary p:last-child {
117
  margin-bottom: 0;
118
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  .paper-summary {
120
  margin-top: 12px;
121
  padding: 24px;
@@ -169,6 +233,17 @@
169
  </p>
170
  </div>
171
 
 
 
 
 
 
 
 
 
 
 
 
172
  <div class="grid">
173
  <div class="card">
174
  <h2>🤖 Models (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>)</h2>
@@ -187,22 +262,92 @@
187
  </div>
188
  </li>
189
  </ul>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  </div>
191
 
192
- <div class="card">
193
- <h2>📓 Notebooks (browse <a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">folder</a>)</h2>
194
- <ul>
195
- <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">🚀 00 — Quickstart inference</a></li>
196
- <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">📊 01Tracks prediction</a></li>
197
- <li>🏷️ 02 Genome annotation / segmentation</li>
198
- <li>🎯 03Fine-tune on bigwig tracks</li>
199
- <li>🔍 04Model interpretation</li>
200
- <li>🧪 05Sequence generation</li>
201
- </ul>
 
 
 
 
 
 
 
 
 
 
 
 
202
  </div>
203
 
204
  <div class="card">
205
- <h2>💻 Model usage</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
206
  <p>Here is a quick example of how to use the post-trained NTv3 650M model on a human genomic window.</p>
207
  <div class="code"><pre><code class="language-python">from transformers import AutoConfig
208
 
@@ -214,29 +359,21 @@ pipe = cfg.load_tracks_pipeline(model_name, device="auto") # or "cpu"/"cuda"/"m
214
 
215
  # Run track prediction
216
  out = pipe(
217
- {
218
- "chrom": "chr19",
219
- "start": 6_700_000,
220
- "end": 6_831_072,
221
- "species": "human"
222
- }
223
  )
224
 
 
225
  print(out.bigwig_tracks_logits.shape) # functional track predictions
226
  print(out.bed_tracks_logits.shape) # genome annotation predictions
227
  print(out.mlm_logits.shape) # MLM logits: (B, L, V = 11)</code></pre></div>
228
- </div>
229
-
230
- <div class="card">
231
- <h2>🔗 Links</h2>
232
- <ul>
233
- <li>📄 Paper: (add link)</li>
234
- <li><a href="https://github.com/instadeepai/nucleotide-transformer">💻 JAX model code (GitHub)</a></li>
235
- <li>🏆 NTv3 benchmark leaderboard: (add link)</li>
236
- </ul>
237
- </div>
238
- </div>
239
-
240
  <div class="paper-summary">
241
  <h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
242
  <img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />
 
54
  border-radius: var(--radius);
55
  box-shadow: 0 6px 18px rgba(0,0,0,0.22);
56
  }
57
+ .card-stack {
58
+ grid-column: span 6;
59
+ display: flex;
60
+ flex-direction: column;
61
+ gap: 14px;
62
+ }
63
+ .card-stack .card {
64
+ grid-column: span 1;
65
+ margin: 0;
66
+ }
67
  .card h2 { margin: 0 0 10px 0; font-size: 16px; letter-spacing: 0.01em; }
68
  .card ul { margin: 0; padding-left: 18px; color: var(--muted); }
69
  .card li { margin: 8px 0; }
70
+ .card table {
71
+ width: 100%;
72
+ margin-top: 12px;
73
+ border-collapse: collapse;
74
+ font-size: 13px;
75
+ }
76
+ .card table th {
77
+ text-align: left;
78
+ padding: 10px 12px;
79
+ border-bottom: 2px solid var(--border);
80
+ color: var(--text);
81
+ font-weight: 600;
82
+ font-size: 12px;
83
+ text-transform: uppercase;
84
+ letter-spacing: 0.05em;
85
+ }
86
+ .card table td {
87
+ padding: 10px 12px;
88
+ border-bottom: 1px solid var(--border);
89
+ color: var(--muted);
90
+ }
91
+ .card table tr:last-child td {
92
+ border-bottom: none;
93
+ }
94
+ .card table tr:hover {
95
+ background: rgba(255, 255, 255, 0.02);
96
+ }
97
+ .card table td .checkmark {
98
+ color: #4ade80 !important;
99
+ }
100
  a { color: var(--link); text-decoration: none; }
101
  a:hover { text-decoration: underline; }
102
  .pillrow { display: flex; gap: 8px; flex-wrap: wrap; margin-top: 8px; }
 
156
  .summary p:last-child {
157
  margin-bottom: 0;
158
  }
159
+ .why-ntv3 {
160
+ margin-top: 18px;
161
+ padding: 24px;
162
+ border: 1px solid var(--border);
163
+ background: var(--card);
164
+ border-radius: var(--radius);
165
+ box-shadow: var(--shadow);
166
+ }
167
+ .why-ntv3 h2 {
168
+ margin: 0 0 16px 0;
169
+ font-size: 18px;
170
+ letter-spacing: 0.01em;
171
+ }
172
+ .why-ntv3 ul {
173
+ margin: 0;
174
+ padding-left: 0;
175
+ list-style: none;
176
+ color: var(--muted);
177
+ }
178
+ .why-ntv3 li {
179
+ margin: 12px 0;
180
+ padding-left: 0;
181
+ line-height: 1.7;
182
+ }
183
  .paper-summary {
184
  margin-top: 12px;
185
  padding: 24px;
 
233
  </p>
234
  </div>
235
 
236
+ <div class="why-ntv3">
237
+ <h2>✨ Why NTv3?</h2>
238
+ <ul>
239
+ <li>📏 <strong>1 Mb long context at nucleotide resolution</strong> — ~100× longer than typical genomics models.</li>
240
+ <li>🔗 <strong>Unified architecture</strong> for: masked language modeling, functional-track prediction, genome annotation, and sequence generation.</li>
241
+ <li>🌍 <strong>Cross-species generalization</strong> across 24 animals + plants with a shared conditioned representation space.</li>
242
+ <li>⚡ <strong>U-Net–style architecture</strong> improves stability and GPU efficiency on very long sequences.</li>
243
+ <li>🎯 <strong>Controllable generative modeling</strong>, enabling targeted enhancer/promoter engineering validated by experimental assays.</li>
244
+ </ul>
245
+ </div>
246
+
247
  <div class="grid">
248
  <div class="card">
249
  <h2>🤖 Models (see <a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">collection</a>)</h2>
 
262
  </div>
263
  </li>
264
  </ul>
265
+ <table>
266
+ <thead>
267
+ <tr>
268
+ <th>Model</th>
269
+ <th>Size</th>
270
+ <th>Pre-training</th>
271
+ <th>Post-training</th>
272
+ <th>Tasks</th>
273
+ </tr>
274
+ </thead>
275
+ <tbody>
276
+ <tr>
277
+ <td><strong>NTv3-8M</strong></td>
278
+ <td>8M params</td>
279
+ <td>MLM</td>
280
+ <td>❌</td>
281
+ <td>Embeddings, light inference</td>
282
+ </tr>
283
+ <tr>
284
+ <td><strong>NTv3-100M</strong></td>
285
+ <td>100M params</td>
286
+ <td>MLM</td>
287
+ <td><span class="checkmark">✅</span></td>
288
+ <td>Tracks, annotation</td>
289
+ </tr>
290
+ <tr>
291
+ <td><strong>NTv3-650M</strong></td>
292
+ <td>650M params</td>
293
+ <td>MLM</td>
294
+ <td><span class="checkmark">✅</span></td>
295
+ <td>Tracks, annotation, best accuracy</td>
296
+ </tr>
297
+ </tbody>
298
+ </table>
299
  </div>
300
 
301
+ <div class="card-stack">
302
+ <div class="card">
303
+ <h2>📓 Notebooks (browse <a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">folder</a>)</h2>
304
+ <ul>
305
+ <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/00_quickstart_inference.ipynb" target="_blank" rel="noopener">🚀 00Quickstart inference</a></li>
306
+ <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/blob/main/notebooks/01_tracks_prediction.ipynb" target="_blank" rel="noopener">📊 01 Tracks prediction</a></li>
307
+ <li>🏷️ 02Genome annotation / segmentation</li>
308
+ <li>🎯 03Fine-tune on bigwig tracks</li>
309
+ <li>🔍 04Model interpretation</li>
310
+ <li>🧪 05 — Sequence generation</li>
311
+ </ul>
312
+ </div>
313
+ <div class="card">
314
+ <h2>🔗 Links</h2>
315
+ <ul>
316
+ <li>📄 Paper: (add link)</li>
317
+ <li><a href="https://github.com/instadeepai/nucleotide-transformer">💻 JAX model code (GitHub)</a></li>
318
+ <li><a href="https://huggingface.co/collections/InstaDeepAI/nucleotide-transformer-v3" target="_blank" rel="noopener">🎯 HF Model Collection (all NTv3 models)</a></li>
319
+ <li><a href="https://huggingface.co/spaces/InstaDeepAI/ntv3/tree/main/notebooks" target="_blank" rel="noopener">📓 All notebooks</a></li>
320
+ <li>🏆 NTv3 benchmark leaderboard: (add link)</li>
321
+ </ul>
322
+ </div>
323
  </div>
324
 
325
  <div class="card">
326
+ <h2>🤖 Load a pre-trained model</h2>
327
+ <p>Here is an example of how to load and use a pre-trained NTv3 model.</p>
328
+ <div class="code"><pre><code class="language-python">from transformers import AutoTokenizer, AutoModelForMaskedLM
329
+
330
+ model_name = "InstaDeepAI/NTv3_650M_pre"
331
+
332
+ # Load model and tokenizer
333
+ model = AutoModelForMaskedLM.from_pretrained(model_name, trust_remote_code=True)
334
+ tok = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
335
+
336
+ # Tokenize input sequences
337
+ batch = tok(["ATCGNATCG", "ACGT"], add_special_tokens=False, padding=True, pad_to_multiple_of=128, return_tensors="pt")
338
+
339
+ # Run model
340
+ out = model(**batch, output_hidden_states=True, output_attentions=True)
341
+
342
+ # Print output shapes
343
+ print(out.logits.shape) # (B, L, V = 11)
344
+ print(len(out.hidden_states)) # convs + transformers + deconvs
345
+ print(len(out.attentions)) # equals transformer layers = 12
346
+ </code></pre></div>
347
+ </div>
348
+
349
+ <div class="card">
350
+ <h2>💻 Use a post-trained model</h2>
351
  <p>Here is a quick example of how to use the post-trained NTv3 650M model on a human genomic window.</p>
352
  <div class="code"><pre><code class="language-python">from transformers import AutoConfig
353
 
 
359
 
360
  # Run track prediction
361
  out = pipe(
362
+ {
363
+ "chrom": "chr19",
364
+ "start": 6_700_000,
365
+ "end": 6_831_072,
366
+ "species": "human"
367
+ }
368
  )
369
 
370
+ # Print output shapes
371
  print(out.bigwig_tracks_logits.shape) # functional track predictions
372
  print(out.bed_tracks_logits.shape) # genome annotation predictions
373
  print(out.mlm_logits.shape) # MLM logits: (B, L, V = 11)</code></pre></div>
374
+ </div>
375
+ </div>
376
+
 
 
 
 
 
 
 
 
 
377
  <div class="paper-summary">
378
  <h2>📄 A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction</h2>
379
  <img src="assets/paper_summary.png" alt="NTv3 Paper Summary" />