tfrere HF Staff Cursor commited on
Commit
7ec25a8
·
1 Parent(s): 5c11b40

§5 Folding + §6 UMAP: editorial polish for prod

Browse files

§5 Folding
- AA block laid out as two columns (Carbon left, Reference right)
mirroring the 3D viewer grid below, with chips and length tags
on each label so the visitor reads "131 aa carbon vs 147 aa
reference, 96 mismatches" at a glance
- Reference row highlights the same positions as Carbon plus the
trailing residues that Carbon dropped (length asymmetry), in the
same red as §1 mismatches
- mRNA-info strip simplified — the duplicate REFERENCE chip moved
out, Carbon/Reference labels in the columns now disambiguate
- Soft-wrap tightened from 60 to 40 chars so the two columns line
up row-by-row at typical viewport widths

§6 UMAP
- Six-kingdom species palette (27 species) replacing the synthetic
24-species one
- Biotype palette rebalanced: dominant protein_coding desaturated
to sage so minority biotypes (lncRNA, snRNA, misc_RNA) read
through the overplot
- Continuous gc_content gradient replaces the codon-phase pill
- Deterministic Fisher-Yates shuffle on the points before GPU
upload so no category systematically occludes the others
- Premultiplied alpha blending fix in the fragment shader, base
alpha lowered to 0.22 for a softer cloud
- Paper-tone canvas background (#fbfaf6) consistent with the rest
of the demo
- Pan/zoom clamped: min scale = 1, translation bounded to data
extent so the cloud always fills the viewport

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (1) hide show
  1. demo.html +552 -126
demo.html CHANGED
@@ -296,8 +296,11 @@
296
  position: relative;
297
  width: 100%;
298
  aspect-ratio: 16 / 10;
299
- background: #fff;
300
- border: 1px solid #eee;
 
 
 
301
  overflow: hidden;
302
  }
303
  .umap-canvas {
@@ -361,6 +364,17 @@
361
  align-items: center;
362
  cursor: default;
363
  }
 
 
 
 
 
 
 
 
 
 
 
364
 
365
  /* --- Gene-completion specifics (§1) --- */
366
  .gene-info {
@@ -393,6 +407,25 @@
393
  display: inline-block; width: 8px; height: 8px; vertical-align: middle;
394
  margin-right: 4px; border-radius: 1px;
395
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
396
  .stat-row {
397
  display: flex; flex-wrap: wrap; gap: 24px;
398
  margin-top: 14px; padding-top: 12px; border-top: 1px solid #eee;
@@ -448,6 +481,12 @@
448
  }
449
  .fold-viewer.running .fold-overlay { display: flex; }
450
  .fold-viewer.running canvas { opacity: 0.28; }
 
 
 
 
 
 
451
  .fold-legend {
452
  font-family: "JetBrains Mono", monospace;
453
  font-size: 9px; color: #888; text-transform: uppercase; letter-spacing: 1.2px;
@@ -470,8 +509,43 @@
470
  padding: 1px 6px;
471
  border-radius: 2px;
472
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
473
  @media (max-width: 720px) {
474
  .fold-grid { grid-template-columns: 1fr; }
 
475
  }
476
 
477
  /* Mismatch highlighting in reference row */
@@ -1091,27 +1165,39 @@
1091
  </p>
1092
 
1093
  <div class="demo" id="demoFold">
 
 
 
 
 
1094
  <div class="demo-toolbar">
1095
  <span>gene</span>
1096
  <span id="dfold-pills" class="pills"></span>
1097
- <span class="spacer"></span>
1098
- <span>prompt</span>
1099
- <span id="dfold-prefix-pills" class="pills">
1100
- <button class="pill" data-prefix="100">100</button>
1101
- <button class="pill active" data-prefix="200">200</button>
1102
- <button class="pill" data-prefix="400">400</button>
1103
- </span>
1104
- <button id="dfold-go" class="action primary">▶ fold</button>
1105
- <span class="status" id="dfold-status"><span class="dot"></span><span>idle</span></span>
1106
  </div>
1107
 
1108
  <div class="gene-info" id="dfold-info">loading genes…</div>
1109
-
1110
- <div class="seq-label">
1111
- carbon-translated protein
1112
- <span style="color:#b00020">· mismatches vs reference highlighted</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1113
  </div>
1114
- <div class="seq-block" id="dfold-aa">— click fold —</div>
1115
 
1116
  <div class="fold-grid">
1117
  <div class="fold-viewer-col">
@@ -1131,7 +1217,7 @@
1131
  <div class="fold-legend">
1132
  pLDDT
1133
  <span class="fold-legend-bar" aria-hidden="true"></span>
1134
- low → high · drag to rotate · scroll to zoom
1135
  </div>
1136
 
1137
  <div class="stat-row" id="dfold-stats">
@@ -1158,7 +1244,8 @@
1158
  <div class="section-num">§6 · Embedding space</div>
1159
  <div class="section-title">The genome, organized</div>
1160
  <p class="lede">
1161
- Embed half a million sequences from 24 eukaryotes with Carbon, project to 2D
 
1162
  with UMAP, color by anything. Switch the coloring and a completely different
1163
  organization emerges from the same points — the model's embedding space
1164
  carries multiple axes of biology at once, none of which were ever labeled.
@@ -1171,11 +1258,10 @@
1171
  <button class="pill active" data-color="species">species</button>
1172
  <button class="pill" data-color="biotype">biotype</button>
1173
  <button class="pill" data-color="strand">strand</button>
1174
- <button class="pill" data-color="phase">codon phase</button>
1175
  </span>
1176
  <span class="spacer"></span>
1177
- <button id="dumap-reset" class="action">↺ reset view</button>
1178
- <span class="status" id="dumap-status"><span class="dot"></span><span>idle</span></span>
1179
  </div>
1180
 
1181
  <div class="gene-info" id="dumap-info">scroll to zoom · drag to pan · hover for details</div>
@@ -1183,7 +1269,7 @@
1183
  <div class="umap-frame">
1184
  <canvas class="umap-canvas" id="dumap-canvas"></canvas>
1185
  <div class="umap-tooltip" id="dumap-tooltip"></div>
1186
- <div class="umap-status-overlay" id="dumap-overlay">loading 500K points · ~2 MB gzipped</div>
1187
  </div>
1188
 
1189
  <div class="umap-legend" id="dumap-legend"></div>
@@ -1199,10 +1285,12 @@
1199
  <div class="takeaway">
1200
  <strong>What to look for</strong>
1201
  Switch coloring from <em>species</em> to <em>biotype</em>: same points, completely
1202
- different organization emerges. The five rough macro-clusters trace the eukaryotic
1203
- kingdoms — vertebrates, invertebrates, plants, fungi, protozoa — discovered from
1204
- raw sequence alone. <em>The current 500K dataset is synthetic, awaiting the real
1205
- Carbon 3B embeddings.</em>
 
 
1206
  </div>
1207
  </section>
1208
 
@@ -1673,7 +1761,7 @@ function loadGenes() {
1673
 
1674
  function renderInfo() {
1675
  if (!gene) { els.info.textContent = "loading genes…"; return; }
1676
- els.info.innerHTML = `<strong>${gene.symbol}</strong> · ${gene.blurb} · <span style="color:#888">${gene.length.toLocaleString()} bp</span>`;
1677
  }
1678
 
1679
  function basesPerLine() {
@@ -2124,7 +2212,7 @@ function loadGenes() {
2124
  if (!v) return;
2125
  selected = v;
2126
  els.pills.querySelectorAll(".pill").forEach(p => p.classList.toggle("active", p.dataset.rs === rs));
2127
- els.info.innerHTML = `<strong>${v.name}</strong> · ${v.blurb} · <span style="color:#888">chr${v.chrom}:${v.pos.toLocaleString()} · ${v.ref}>${v.alt} (gene strand)</span>`;
2128
  renderWindowDisplay(v, "ref");
2129
  renderResult(v);
2130
  renderForestBars();
@@ -2293,7 +2381,7 @@ function loadGenes() {
2293
  }
2294
 
2295
  els.chart.innerHTML = svg;
2296
- els.bpLabel.textContent = `${scoredLen.toLocaleString()} bp scored`;
2297
  }
2298
 
2299
  function updateStats() {
@@ -2388,7 +2476,7 @@ function loadGenes() {
2388
  if (!g) return;
2389
  gene = g;
2390
  els.pills.querySelectorAll(".pill").forEach(p => p.classList.toggle("active", p.dataset.gene === symbol));
2391
- els.info.innerHTML = `<strong>${gene.symbol}</strong> · ${gene.blurb} · <span style="color:#888">${Math.min(gene.length, MAX_WINDOW).toLocaleString()} bp will be scored${gene.length > MAX_WINDOW ? ` (of ${gene.length.toLocaleString()})` : ""}</span>`;
2392
  scoreData = cache[symbol] || null;
2393
  renderTrack(scoreData ? scoreData.scoredLength : Math.min(gene.length, MAX_WINDOW));
2394
  renderChart();
@@ -2810,7 +2898,11 @@ function loadGenes() {
2810
  pills: document.getElementById("dfold-pills"),
2811
  prefixPills: document.getElementById("dfold-prefix-pills"),
2812
  info: document.getElementById("dfold-info"),
 
2813
  aa: document.getElementById("dfold-aa"),
 
 
 
2814
  go: document.getElementById("dfold-go"),
2815
  status: document.getElementById("dfold-status"),
2816
  statusText: document.querySelector("#dfold-status span:last-child"),
@@ -2822,38 +2914,122 @@ function loadGenes() {
2822
  identity: document.getElementById("dfold-id"),
2823
  };
2824
 
 
 
 
2825
  function setStatus(text, cls) {
 
2826
  els.status.className = "status" + (cls ? " " + cls : "");
2827
- els.statusText.textContent = text;
2828
  }
2829
 
2830
  function renderInfo(extra = "") {
2831
  const g = GENES_LOCAL?.find(x => x.symbol === currentGeneSymbol);
2832
  if (!g) { els.info.textContent = "—"; return; }
2833
  const blurb = g.blurb ? ` · ${g.blurb}` : "";
2834
- const f = geneFeasibility(g);
2835
- const tooLong = !f.feasible
2836
- ? ` · <span class="fold-warn">${f.lastExonEnd.toLocaleString()} bp to last exon — too long for live fold</span>`
2837
- : "";
2838
- els.info.innerHTML = `<strong>${g.symbol}</strong> · ${g.length} bp${blurb}${tooLong}` + (extra ? ` · ${extra}` : "");
2839
- }
2840
-
2841
- // Render Carbon's translated protein with mismatches vs the reference AA
2842
- // highlighted in red — mirrors §1's reference-row mismatch styling so the
2843
- // visual grammar carries over.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2844
  function renderAAComparison(carbonAA, refAA) {
2845
- const n = carbonAA.length;
2846
- const parts = new Array(n);
2847
- for (let i = 0; i < n; i++) {
2848
- const c = carbonAA[i];
2849
- const r = refAA[i];
2850
- if (r === undefined || c !== r) parts[i] = `<span class="ref-mismatch">${c}</span>`;
2851
- else parts[i] = c;
 
 
 
2852
  }
2853
- // Soft-wrap at 60 chars to match the §1 sequence blocks.
2854
- let html = "";
2855
- for (let i = 0; i < n; i += 60) html += parts.slice(i, i + 60).join("") + "\n";
2856
- els.aa.innerHTML = html;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2857
  }
2858
 
2859
  // Hydrate the viewers and stat row from a precomputed `fold_example`
@@ -2862,6 +3038,7 @@ function loadGenes() {
2862
  // can still trigger a fresh run with the ▶ fold button.
2863
  function hydrateFoldExample(ex) {
2864
  if (!ensureViewers()) return false;
 
2865
  renderStructure(viewerCarbon, ex.carbon_pdb);
2866
  renderStructure(viewerRef, ex.ref_pdb);
2867
  els.nRes.textContent = `${ex.carbon_aa.length} / ${ex.ref_aa.length}`;
@@ -2872,18 +3049,25 @@ function loadGenes() {
2872
  el.classList.remove("muted");
2873
  }
2874
  renderAAComparison(ex.carbon_aa, ex.ref_aa);
2875
- setStatus("cached example · click fold to run fresh", "");
2876
  return true;
2877
  }
2878
 
 
 
 
 
 
 
2879
  function resetFoldUI() {
2880
- els.aa.innerHTML = "— click fold —";
2881
  for (const el of [els.nRes, els.plddtC, els.plddtR, els.identity]) {
2882
  el.textContent = "—";
2883
  el.classList.add("muted");
2884
  }
2885
  if (viewerCarbon) { viewerCarbon.removeAllModels(); viewerCarbon.render(); }
2886
  if (viewerRef) { viewerRef.removeAllModels(); viewerRef.render(); }
 
2887
  }
2888
 
2889
  function selectGene(symbol) {
@@ -2892,6 +3076,7 @@ function loadGenes() {
2892
  p.classList.toggle("active", p.dataset.gene === symbol)
2893
  );
2894
  renderInfo();
 
2895
  const g = GENES_LOCAL?.find(x => x.symbol === symbol);
2896
  if (g?.fold_example) {
2897
  // 3Dmol might not be loaded on the very first paint; retry shortly.
@@ -2904,7 +3089,11 @@ function loadGenes() {
2904
  }
2905
  }
2906
 
 
 
 
2907
  function bindPrefixPills() {
 
2908
  els.prefixPills.querySelectorAll(".pill").forEach(p => {
2909
  p.addEventListener("click", () => {
2910
  prefixLen = +p.dataset.prefix;
@@ -2926,14 +3115,80 @@ function loadGenes() {
2926
  function makeViewer(host) {
2927
  if (!window.$3Dmol) return null;
2928
  host.innerHTML = "";
2929
- return $3Dmol.createViewer(host, { backgroundColor: "#fafaf7", antialias: true });
2930
- }
2931
-
2932
- // Create both viewers (idempotent) and link them so a drag/zoom on one
2933
- // propagates to the other. Mirrors the side-by-side "synced cameras"
2934
- // setup PyMOL/ChimeraX use for structure comparison the visitor sees
2935
- // the same orientation of both proteins, which is what makes the
2936
- // visual comparison meaningful.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2937
  let viewersLinked = false;
2938
  function ensureViewers() {
2939
  if (!window.$3Dmol) return false;
@@ -2945,6 +3200,7 @@ function loadGenes() {
2945
  viewerRef.linkViewer(viewerCarbon);
2946
  viewersLinked = true;
2947
  }
 
2948
  return !!(viewerCarbon && viewerRef);
2949
  }
2950
 
@@ -2964,13 +3220,28 @@ function loadGenes() {
2964
  function setRunning(running, label = "computing") {
2965
  for (const host of [els.vCarbon, els.vRef]) {
2966
  host.classList.toggle("running", running);
2967
- const t = host.querySelector(".fold-overlay-label");
2968
- if (t) t.textContent = label;
 
 
2969
  }
2970
  for (const el of [els.nRes, els.plddtC, els.plddtR, els.identity]) {
2971
  el.classList.toggle("muted", running);
2972
  }
2973
- els.go.textContent = running ? "running…" : "▶ fold";
 
 
 
 
 
 
 
 
 
 
 
 
 
2974
  }
2975
 
2976
  // Editorial pLDDT palette. The three anchor colours match the legend
@@ -3032,7 +3303,7 @@ function loadGenes() {
3032
  const f = geneFeasibility(gene);
3033
  if (!f.feasible) {
3034
  setStatus(
3035
- `${gene.symbol} spans ${f.lastExonEnd.toLocaleString()} bp of genomic DNA — ` +
3036
  `outside what Carbon can generate live. Try HBB or INS.`,
3037
  "error"
3038
  );
@@ -3041,7 +3312,7 @@ function loadGenes() {
3041
 
3042
  abortCtrl?.abort();
3043
  abortCtrl = new AbortController();
3044
- els.go.disabled = true;
3045
  ensureViewers(); // overlay must exist before we toggle .running on it
3046
 
3047
  try {
@@ -3105,7 +3376,7 @@ function loadGenes() {
3105
  } finally {
3106
  setRunning(false);
3107
  abortCtrl = null;
3108
- els.go.disabled = false;
3109
  }
3110
  }
3111
 
@@ -3120,7 +3391,7 @@ function loadGenes() {
3120
  );
3121
  selectGene(genes[0].symbol);
3122
  bindPrefixPills();
3123
- els.go.addEventListener("click", runFold);
3124
  }).catch(e => {
3125
  els.info.textContent = "failed to load genes: " + (e.message || e);
3126
  });
@@ -3175,10 +3446,10 @@ function loadGenes() {
3175
 
3176
  const n1 = seq.length;
3177
  const n6 = Math.ceil(seq.length / 6);
3178
- els.oneTok.textContent = n1.toLocaleString();
3179
- els.sixTok.textContent = n6.toLocaleString();
3180
- els.oneAtt.innerHTML = `${(n1*n1).toLocaleString()}<span style="color:#999;font-size:9px;margin-left:3px">L²</span>`;
3181
- els.sixAtt.innerHTML = `${(n6*n6).toLocaleString()}<span style="color:#999;font-size:9px;margin-left:3px">L²</span>`;
3182
 
3183
  // Speedup bars: visualize attention cost ratio
3184
  const maxCost = n1 * n1 || 1;
@@ -3193,7 +3464,7 @@ function loadGenes() {
3193
  svg += `<text x="${padL - 8}" y="${y + 13}" font-family="JetBrains Mono" font-size="11" fill="#333" text-anchor="end">${r.label}</text>`;
3194
  const w = (r.cost / maxCost) * (W - padL - padR);
3195
  svg += `<rect x="${padL}" y="${y}" width="${Math.max(2, w)}" height="${rowH}" fill="${r.color}"/>`;
3196
- svg += `<text x="${padL + w + 6}" y="${y + 13}" font-family="JetBrains Mono" font-size="10" fill="#333">${r.cost.toLocaleString()}</text>`;
3197
  });
3198
  els.bars.setAttribute("viewBox", `0 0 ${W} ${H}`);
3199
  els.bars.style.height = `${H}px`;
@@ -3988,14 +4259,15 @@ function loadGenes() {
3988
  })();
3989
 
3990
  // =========================================================================
3991
- // §6 — UMAP scatter (WebGL, 500K points)
3992
  //
3993
- // Loads a binary-packed scatter (int16 quantized positions + 4 uint8 category
3994
- // columns) and renders it via WebGL gl.POINTS with a 1D palette texture for
3995
- // coloring. Toggle between coloring axes (species / biotype / strand / phase)
3996
- // rebinds a single byte-attribute buffer and swaps the palette texture — no
3997
- // re-upload of the 500K vertex stream. Hover lookup uses a flat grid index
3998
- // so picking stays O(small) regardless of total point count.
 
3999
  // =========================================================================
4000
  (function initDemoUmap() {
4001
  const canvas = document.getElementById("dumap-canvas");
@@ -4005,61 +4277,94 @@ function loadGenes() {
4005
  const info = document.getElementById("dumap-info");
4006
  const legend = document.getElementById("dumap-legend");
4007
  const resetBtn = document.getElementById("dumap-reset");
4008
- const status = document.getElementById("dumap-status");
4009
- const statusText = status.querySelector("span:last-child");
 
 
 
 
 
 
4010
  const colorPills = document.querySelectorAll("#dumap-color-pills .pill");
4011
  const elN = document.getElementById("dumap-n");
4012
  const elNsp = document.getElementById("dumap-nsp");
4013
  const elFps = document.getElementById("dumap-fps");
4014
 
4015
  // ---- Palettes ----------------------------------------------------------
4016
- // 24 species are grouped into 5 kingdoms — each kingdom gets a hue band.
4017
  // Within a band, lightness varies to keep adjacent species distinguishable.
 
4018
  const SPECIES_PALETTE = [
4019
- // vertebrates (9) — blue/indigo band
4020
- [69,117,180],[97,144,200],[125,170,220],[153,194,240],
4021
- [120,90,170],[140,110,190],
4022
- [80,90,150],[100,110,170],[120,130,190],
4023
- // invertebrates (5) — orange band
4024
- [217,95,2],[230,120,30],[240,150,60],[250,180,90],[253,210,120],
4025
  // plants (5) — olive/lime band (intentionally different from Carbon's
4026
  // signal-green #317f3f so the UI chrome doesn't blend with the data)
4027
- [85,140,55],[115,165,75],[145,195,100],[175,220,135],[205,240,170],
4028
- // fungi (3) — magenta/rose band
4029
- [200,40,120],[220,80,140],[240,130,170],
4030
- // protozoa (2) — gold band
4031
- [200,150,30],[230,180,60],
 
 
4032
  ];
 
 
 
 
4033
  const BIOTYPE_PALETTE = [
4034
- [49,127,63], // protein_coding — Carbon green
4035
- [188,46,37], // lncRNA — Carbon red
4036
- [70,90,140], // miRNAslate blue
4037
- [170,170,170], // pseudogene neutral gray
4038
  ];
4039
  const STRAND_PALETTE = [
4040
  [49,127,63], // + (forward)
4041
  [188,46,37], // - (reverse)
4042
  ];
4043
- // 3-step ordinal palette (viridis-ish endpoints) codon phase 0/1/2.
4044
- const PHASE_PALETTE = [
4045
- [68,1,84], [33,144,140], [253,231,37],
4046
- ];
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4047
  const PALETTES = {
4048
  species: SPECIES_PALETTE,
4049
  biotype: BIOTYPE_PALETTE,
4050
  strand: STRAND_PALETTE,
4051
- phase: PHASE_PALETTE,
4052
  };
4053
 
4054
  // ---- State -------------------------------------------------------------
4055
  let gl, program;
4056
  let posBuf; // int16 interleaved x,y
4057
- let catBufs = {}; // { species|biotype|strand|phase: GLBuffer of uint8 }
4058
  let paletteTex;
4059
  let n = 0;
4060
- let labels = null; // { species:[], biotypes:[], strands:[], phases:[], bounds:[xmin,xmax,ymin,ymax] }
4061
  // Raw category bytes — kept on CPU side too for tooltip lookups.
4062
- let cats = { species: null, biotype: null, strand: null, phase: null };
4063
  // World bounds + current colorBy axis.
4064
  let bounds = [0,0,0,0];
4065
  let colorBy = "species";
@@ -4072,6 +4377,7 @@ function loadGenes() {
4072
  let grid = null;
4073
 
4074
  function setStatus(state, text) {
 
4075
  status.classList.remove("streaming", "error");
4076
  if (state === "streaming") status.classList.add("streaming");
4077
  if (state === "error") status.classList.add("error");
@@ -4103,9 +4409,13 @@ function loadGenes() {
4103
  float r = length(d);
4104
  float aa = smoothstep(0.50, 0.42, r);
4105
  if (aa <= 0.001) discard;
 
4106
  float t = (v_cat + 0.5) / u_paletteN;
4107
  vec3 color = texture2D(u_palette, vec2(t, 0.5)).rgb;
4108
- gl_FragColor = vec4(color, aa * u_alpha);
 
 
 
4109
  }
4110
  `;
4111
  function compile(type, src) {
@@ -4136,7 +4446,9 @@ function loadGenes() {
4136
  // the paper background and over each other cleanly at dense overlaps.
4137
  gl.enable(gl.BLEND);
4138
  gl.blendFunc(gl.ONE, gl.ONE_MINUS_SRC_ALPHA);
4139
- gl.clearColor(1, 1, 1, 0);
 
 
4140
 
4141
  paletteTex = gl.createTexture();
4142
  gl.bindTexture(gl.TEXTURE_2D, paletteTex);
@@ -4160,6 +4472,36 @@ function loadGenes() {
4160
  }
4161
 
4162
  // ---- Data load ---------------------------------------------------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4163
  async function loadData() {
4164
  setStatus("streaming", "loading…");
4165
  const t0 = performance.now();
@@ -4171,27 +4513,55 @@ function loadGenes() {
4171
  const buf = await binResp.arrayBuffer();
4172
  labels = await labelsResp.json();
4173
 
4174
- // Parse header (matches scripts/gen_fake_umap.py).
 
 
 
 
 
4175
  const hdrU32 = new Uint32Array(buf, 0, 6);
4176
- const magic = hdrU32[0];
4177
  if (magic !== 0xCAB0FA1D) throw new Error("bad magic: " + magic.toString(16));
4178
  n = hdrU32[1];
4179
- const nSp = hdrU32[2], nBt = hdrU32[3], nSt = hdrU32[4], nPh = hdrU32[5];
4180
- const hdrF32 = new Float32Array(buf, 24, 4);
 
 
4181
  bounds = [hdrF32[0], hdrF32[1], hdrF32[2], hdrF32[3]];
 
 
 
4182
 
4183
- let off = 40;
4184
  const pos16 = new Int16Array(buf, off, n * 2); off += n * 2 * 2;
 
 
 
 
 
 
4185
  cats.species = new Uint8Array(buf, off, n); off += n;
4186
  cats.biotype = new Uint8Array(buf, off, n); off += n;
4187
  cats.strand = new Uint8Array(buf, off, n); off += n;
4188
- cats.phase = new Uint8Array(buf, off, n); off += n;
 
 
 
 
 
 
 
 
 
 
 
 
4189
 
4190
  // Upload to GPU.
4191
  posBuf = gl.createBuffer();
4192
  gl.bindBuffer(gl.ARRAY_BUFFER, posBuf);
4193
  gl.bufferData(gl.ARRAY_BUFFER, pos16, gl.STATIC_DRAW);
4194
- for (const key of ["species", "biotype", "strand", "phase"]) {
4195
  const b = gl.createBuffer();
4196
  gl.bindBuffer(gl.ARRAY_BUFFER, b);
4197
  gl.bufferData(gl.ARRAY_BUFFER, cats[key], gl.STATIC_DRAW);
@@ -4216,7 +4586,7 @@ function loadGenes() {
4216
 
4217
  const ms = (performance.now() - t0) | 0;
4218
  setStatus("idle", `loaded ${(n/1000)|0}k pts · ${ms} ms`);
4219
- info.textContent = `${n.toLocaleString("en-US")} sequences · ${labels.species.length} eukaryotic species · drag to pan, wheel to zoom`;
4220
  overlay.classList.add("hidden");
4221
 
4222
  return pos16;
@@ -4274,8 +4644,9 @@ function loadGenes() {
4274
  // but the dots get visibly bigger when you zoom in.
4275
  const ps = Math.min(8.0, Math.max(1.4, 1.4 + 0.6 * Math.log2(view.scale + 1))) * dpr;
4276
  gl.uniform1f(gl.getUniformLocation(program, "u_pointSize"), ps);
4277
- // Alpha falls off slightly with zoom-out so the dense cloud doesn't burn.
4278
- const alpha = Math.min(0.85, Math.max(0.35, 0.35 + 0.18 * Math.log2(view.scale + 1)));
 
4279
  gl.uniform1f(gl.getUniformLocation(program, "u_alpha"), alpha);
4280
 
4281
  gl.drawArrays(gl.POINTS, 0, n);
@@ -4314,11 +4685,28 @@ function loadGenes() {
4314
  // ---- Legend ------------------------------------------------------------
4315
  function renderLegend() {
4316
  if (!labels) return;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4317
  const palette = PALETTES[colorBy];
4318
  const itemLabels = (colorBy === "species") ? labels.species
4319
  : (colorBy === "biotype") ? labels.biotypes
4320
- : (colorBy === "strand") ? labels.strands
4321
- : labels.phases;
4322
  legend.innerHTML = itemLabels.map((name, i) => {
4323
  const [r, g, b] = palette[i % palette.length];
4324
  return `<span class="item"><span class="swatch" style="background:rgb(${r},${g},${b})"></span>${name}</span>`;
@@ -4326,7 +4714,37 @@ function loadGenes() {
4326
  }
4327
 
4328
  // ---- Pan / zoom / hover ------------------------------------------------
4329
- function resetView() { view = { tx: 0, ty: 0, scale: 1 }; requestRedraw(); }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4330
 
4331
  // Convert a clientX/Y to NDC (-1..1) and to normalized data space ([-1, 1]).
4332
  function clientToNDC(e) {
@@ -4358,6 +4776,8 @@ function loadGenes() {
4358
  const dx = ((e.clientX - panLast.x) / rect.width) * 2;
4359
  const dy = -((e.clientY - panLast.y) / rect.height) * 2;
4360
  view.tx += dx; view.ty += dy;
 
 
4361
  panLast = { x: e.clientX, y: e.clientY };
4362
  requestRedraw();
4363
  } else {
@@ -4379,13 +4799,19 @@ function loadGenes() {
4379
  const ndc = clientToNDC(e);
4380
  // Zoom factor — natural feeling on both trackpad and mouse wheel.
4381
  const factor = Math.exp(-e.deltaY * 0.0018);
4382
- const newScale = Math.min(50, Math.max(0.5, view.scale * factor));
 
 
 
 
4383
  const k = newScale / view.scale;
4384
  // Zoom around the cursor: shift translate so the point under the cursor
4385
  // stays under the cursor.
4386
  view.tx = ndc.x - (ndc.x - view.tx) * k;
4387
  view.ty = ndc.y - (ndc.y - view.ty) * k;
4388
  view.scale = newScale;
 
 
4389
  requestRedraw();
4390
  hideTooltip();
4391
  }, { passive: false });
@@ -4397,11 +4823,11 @@ function loadGenes() {
4397
  const sp = labels.species[cats.species[idx]];
4398
  const bt = labels.biotypes[cats.biotype[idx]];
4399
  const st = labels.strands[cats.strand[idx]];
4400
- const ph = labels.phases[cats.phase[idx]];
4401
  tooltip.innerHTML =
4402
  `<div><span class="t-label">species</span>${sp}</div>` +
4403
  `<div><span class="t-label">biotype</span>${bt}</div>` +
4404
- `<div><span class="t-label">strand</span>${st} &nbsp; <span class="t-label">phase</span>${ph}</div>`;
4405
  tooltip.style.left = x + "px";
4406
  tooltip.style.top = y + "px";
4407
  tooltip.classList.add("visible");
@@ -4468,7 +4894,7 @@ function loadGenes() {
4468
  });
4469
  });
4470
 
4471
- // Defer loading until the umap section is near the viewport — 500K points
4472
  // doesn't need to fight for bandwidth on first paint.
4473
  const io = new IntersectionObserver(async (entries) => {
4474
  if (!entries[0].isIntersecting) return;
 
296
  position: relative;
297
  width: 100%;
298
  aspect-ratio: 16 / 10;
299
+ /* Slight off-white that matches the editorial paper tone (body uses
300
+ #f7f5ee). Pure white made the desaturated minority biotypes vanish
301
+ into the page and made the saturated palette look harsh. */
302
+ background: #fbfaf6;
303
+ border: 1px solid #e5e3da;
304
  overflow: hidden;
305
  }
306
  .umap-canvas {
 
364
  align-items: center;
365
  cursor: default;
366
  }
367
+ .umap-legend .item.gc-grad {
368
+ gap: 8px;
369
+ }
370
+ .umap-legend .item.gc-grad svg {
371
+ border-radius: 2px;
372
+ display: block;
373
+ }
374
+ .umap-legend .item.gc-grad .gc-ticks {
375
+ letter-spacing: 0.5px;
376
+ color: #888;
377
+ }
378
 
379
  /* --- Gene-completion specifics (§1) --- */
380
  .gene-info {
 
407
  display: inline-block; width: 8px; height: 8px; vertical-align: middle;
408
  margin-right: 4px; border-radius: 1px;
409
  }
410
+ /* Inline tag chips used in §5 to disambiguate carbon vs reference rows.
411
+ Same shape/size, different colour band so the eye instantly maps a
412
+ row of AAs to the correct identity without re-reading the full label. */
413
+ .seq-label .seq-tag {
414
+ display: inline-block;
415
+ font-size: 9px; letter-spacing: 1.5px;
416
+ padding: 1px 6px; margin-right: 8px;
417
+ border-radius: 2px;
418
+ text-transform: uppercase;
419
+ font-weight: 600;
420
+ }
421
+ .seq-label .seq-tag.carbon { background: #1f1f1d; color: #f7f5ee; }
422
+ .seq-label .seq-tag.ref { background: #f0eee5; color: #555; border: 1px solid #d8d5c8; }
423
+ .seq-label .aa-len-tag {
424
+ color: #1f1f1d;
425
+ font-variant-numeric: tabular-nums;
426
+ text-transform: none;
427
+ letter-spacing: 0.3px;
428
+ }
429
  .stat-row {
430
  display: flex; flex-wrap: wrap; gap: 24px;
431
  margin-top: 14px; padding-top: 12px; border-top: 1px solid #eee;
 
481
  }
482
  .fold-viewer.running .fold-overlay { display: flex; }
483
  .fold-viewer.running canvas { opacity: 0.28; }
484
+ /* Same overlay reused for genes whose precomputed fixture isn't ready
485
+ yet (HF endpoint downtime, fresh symbol added to the list, etc.).
486
+ Canvas fades almost fully so the empty WebGL frame doesn't read as
487
+ a bug — the overlay carries the explanation instead. */
488
+ .fold-viewer.pending .fold-overlay { display: flex; }
489
+ .fold-viewer.pending canvas { opacity: 0.08; }
490
  .fold-legend {
491
  font-family: "JetBrains Mono", monospace;
492
  font-size: 9px; color: #888; text-transform: uppercase; letter-spacing: 1.2px;
 
509
  padding: 1px 6px;
510
  border-radius: 2px;
511
  }
512
+ /* Materialises the DNA → mRNA → protein arrow under the gene info,
513
+ using the same monospace family/colour family as the rest of the
514
+ metadata strip. The chevron is drawn with → to read as a flow,
515
+ not a list. */
516
+ .mrna-info {
517
+ font-family: "JetBrains Mono", monospace;
518
+ font-size: 11px;
519
+ color: #888;
520
+ margin: 4px 0 16px;
521
+ letter-spacing: 0.3px;
522
+ }
523
+ .mrna-info .arrow { color: #b8b8b6; padding: 0 6px; }
524
+ .mrna-info strong { color: #555; font-weight: 500; }
525
+ .mrna-info .mrna-trunc {
526
+ color: #b00020;
527
+ background: rgba(188, 46, 37, 0.08);
528
+ padding: 0 4px;
529
+ margin-left: 6px;
530
+ border-radius: 2px;
531
+ }
532
+ /* Two-column AA grid: Carbon (left) / Reference (right), mirroring the
533
+ fold-grid below so the eye lines up "carbon prediction → carbon
534
+ fold" on one side and "reference truth → reference fold" on the
535
+ other. Stacks on narrow screens to keep each line readable. */
536
+ .fold-aa-grid {
537
+ display: grid; grid-template-columns: 1fr 1fr; gap: 16px;
538
+ margin-top: 4px;
539
+ }
540
+ .fold-aa-col { display: flex; flex-direction: column; min-width: 0; }
541
+ /* Soft-wrap as a safety net if the wrapped 40-char line ever still
542
+ overflows (very narrow viewport, big font-size override, etc.).
543
+ The JS still inserts \n every 40 chars so Carbon and Reference
544
+ line up row-by-row in the common case. */
545
+ .fold-aa-col .seq-block { white-space: pre-wrap; word-break: break-all; overflow-x: visible; }
546
  @media (max-width: 720px) {
547
  .fold-grid { grid-template-columns: 1fr; }
548
+ .fold-aa-grid { grid-template-columns: 1fr; }
549
  }
550
 
551
  /* Mismatch highlighting in reference row */
 
1165
  </p>
1166
 
1167
  <div class="demo" id="demoFold">
1168
+ <!-- Cached-only UI: live fold UI (prefix selector, ▶ fold button,
1169
+ status indicator) is intentionally not rendered. The pipeline
1170
+ JS (runFold/streamGenerate/postFold) and the backend /fold
1171
+ endpoint are still in place — see commit history or app.py if
1172
+ you want to wire interactivity back in. -->
1173
  <div class="demo-toolbar">
1174
  <span>gene</span>
1175
  <span id="dfold-pills" class="pills"></span>
 
 
 
 
 
 
 
 
 
1176
  </div>
1177
 
1178
  <div class="gene-info" id="dfold-info">loading genes…</div>
1179
+ <!-- Materialises the §5 lede's "DNA → mRNA → protein → 3D" arrow:
1180
+ shows the genomic→mature-mRNA→protein progression for the
1181
+ currently selected gene, so the splicing step (which both the
1182
+ reference and Carbon's continuation go through) isn't invisible. -->
1183
+ <div class="mrna-info" id="dfold-mrna">—</div>
1184
+
1185
+ <div class="fold-aa-grid">
1186
+ <div class="fold-aa-col">
1187
+ <div class="seq-label" id="dfold-aa-label">
1188
+ carbon-translated protein
1189
+ <span style="color:#b00020">· mismatches vs reference highlighted</span>
1190
+ </div>
1191
+ <div class="seq-block" id="dfold-aa">— click fold —</div>
1192
+ </div>
1193
+ <div class="fold-aa-col">
1194
+ <div class="seq-label" id="dfold-ref-aa-label">
1195
+ reference protein
1196
+ <span style="color:#b00020">· same positions highlighted</span>
1197
+ </div>
1198
+ <div class="seq-block" id="dfold-ref-aa">—</div>
1199
+ </div>
1200
  </div>
 
1201
 
1202
  <div class="fold-grid">
1203
  <div class="fold-viewer-col">
 
1217
  <div class="fold-legend">
1218
  pLDDT
1219
  <span class="fold-legend-bar" aria-hidden="true"></span>
1220
+ low → high · drag to rotate
1221
  </div>
1222
 
1223
  <div class="stat-row" id="dfold-stats">
 
1244
  <div class="section-num">§6 · Embedding space</div>
1245
  <div class="section-title">The genome, organized</div>
1246
  <p class="lede">
1247
+ Embed 571,810 sequences from 27 species across six kingdoms vertebrates,
1248
+ invertebrates, plants, fungi, bacteria, viruses — with Carbon, project to 2D
1249
  with UMAP, color by anything. Switch the coloring and a completely different
1250
  organization emerges from the same points — the model's embedding space
1251
  carries multiple axes of biology at once, none of which were ever labeled.
 
1258
  <button class="pill active" data-color="species">species</button>
1259
  <button class="pill" data-color="biotype">biotype</button>
1260
  <button class="pill" data-color="strand">strand</button>
1261
+ <button class="pill" data-color="gc">gc content</button>
1262
  </span>
1263
  <span class="spacer"></span>
1264
+ <button id="dumap-reset" class="action" disabled>↺ reset view</button>
 
1265
  </div>
1266
 
1267
  <div class="gene-info" id="dumap-info">scroll to zoom · drag to pan · hover for details</div>
 
1269
  <div class="umap-frame">
1270
  <canvas class="umap-canvas" id="dumap-canvas"></canvas>
1271
  <div class="umap-tooltip" id="dumap-tooltip"></div>
1272
+ <div class="umap-status-overlay" id="dumap-overlay">loading 571K points · ~5.8 MB gzipped</div>
1273
  </div>
1274
 
1275
  <div class="umap-legend" id="dumap-legend"></div>
 
1285
  <div class="takeaway">
1286
  <strong>What to look for</strong>
1287
  Switch coloring from <em>species</em> to <em>biotype</em>: same points, completely
1288
+ different organization emerges. The macro-clusters trace six kingdoms — vertebrates,
1289
+ invertebrates, plants, fungi, bacteria, viruses — discovered from raw sequence alone.
1290
+ Switch again to <em>gc content</em> and a perpendicular axis appears: AT-rich (cool
1291
+ blue) vs GC-rich (warm amber) regions cut across the species clusters, revealing the
1292
+ composition gradient the model has internalised. <em>Points: 571,810 real Carbon 3B
1293
+ embeddings, projected to 2D via UMAP.</em>
1294
  </div>
1295
  </section>
1296
 
 
1761
 
1762
  function renderInfo() {
1763
  if (!gene) { els.info.textContent = "loading genes…"; return; }
1764
+ els.info.innerHTML = `<strong>${gene.symbol}</strong> · ${gene.blurb} · <span style="color:#888">${gene.length.toLocaleString("en-US")} bp</span>`;
1765
  }
1766
 
1767
  function basesPerLine() {
 
2212
  if (!v) return;
2213
  selected = v;
2214
  els.pills.querySelectorAll(".pill").forEach(p => p.classList.toggle("active", p.dataset.rs === rs));
2215
+ els.info.innerHTML = `<strong>${v.name}</strong> · ${v.blurb} · <span style="color:#888">chr${v.chrom}:${v.pos.toLocaleString("en-US")} · ${v.ref}>${v.alt} (gene strand)</span>`;
2216
  renderWindowDisplay(v, "ref");
2217
  renderResult(v);
2218
  renderForestBars();
 
2381
  }
2382
 
2383
  els.chart.innerHTML = svg;
2384
+ els.bpLabel.textContent = `${scoredLen.toLocaleString("en-US")} bp scored`;
2385
  }
2386
 
2387
  function updateStats() {
 
2476
  if (!g) return;
2477
  gene = g;
2478
  els.pills.querySelectorAll(".pill").forEach(p => p.classList.toggle("active", p.dataset.gene === symbol));
2479
+ els.info.innerHTML = `<strong>${gene.symbol}</strong> · ${gene.blurb} · <span style="color:#888">${Math.min(gene.length, MAX_WINDOW).toLocaleString("en-US")} bp will be scored${gene.length > MAX_WINDOW ? ` (of ${gene.length.toLocaleString("en-US")})` : ""}</span>`;
2480
  scoreData = cache[symbol] || null;
2481
  renderTrack(scoreData ? scoreData.scoredLength : Math.min(gene.length, MAX_WINDOW));
2482
  renderChart();
 
2898
  pills: document.getElementById("dfold-pills"),
2899
  prefixPills: document.getElementById("dfold-prefix-pills"),
2900
  info: document.getElementById("dfold-info"),
2901
+ mrna: document.getElementById("dfold-mrna"),
2902
  aa: document.getElementById("dfold-aa"),
2903
+ aaLabel: document.getElementById("dfold-aa-label"),
2904
+ refAa: document.getElementById("dfold-ref-aa"),
2905
+ refAaLabel: document.getElementById("dfold-ref-aa-label"),
2906
  go: document.getElementById("dfold-go"),
2907
  status: document.getElementById("dfold-status"),
2908
  statusText: document.querySelector("#dfold-status span:last-child"),
 
2914
  identity: document.getElementById("dfold-id"),
2915
  };
2916
 
2917
+ // No-ops gracefully when the status indicator isn't rendered (current
2918
+ // cached-only UI doesn't ship one). All call sites are kept so the
2919
+ // live-fold path stays a drop-in restore.
2920
  function setStatus(text, cls) {
2921
+ if (!els.status) return;
2922
  els.status.className = "status" + (cls ? " " + cls : "");
2923
+ if (els.statusText) els.statusText.textContent = text;
2924
  }
2925
 
2926
  function renderInfo(extra = "") {
2927
  const g = GENES_LOCAL?.find(x => x.symbol === currentGeneSymbol);
2928
  if (!g) { els.info.textContent = "—"; return; }
2929
  const blurb = g.blurb ? ` · ${g.blurb}` : "";
2930
+ els.info.innerHTML = `<strong>${g.symbol}</strong> · ${g.length.toLocaleString("en-US")} bp${blurb}` + (extra ? ` · ${extra}` : "");
2931
+ }
2932
+
2933
+ // Render the "DNA → mRNA → protein" progression for the current gene
2934
+ // by reusing the same splicing + ORF logic the rest of the pipeline
2935
+ // runs on the reference side. The numbers shown are gene-intrinsic
2936
+ // (architecture of the gene + canonical reference protein), so they
2937
+ // hold whether the user has clicked fold yet or not — they materialise
2938
+ // the splicing step that's otherwise invisible between the toolbar
2939
+ // and the AA block.
2940
+ //
2941
+ // Prefix is "reference:" because every number here comes from the canonical
2942
+ // sequence in genes.json, NOT from Carbon's prediction. Without the prefix
2943
+ // it's easy to read the strip, scroll past it, and assume the AA block
2944
+ // below shows that same length — but Carbon's ORF is usually shorter
2945
+ // (e.g. HBB ref 147 aa vs Carbon 131 aa).
2946
+ function renderMRNAInfo() {
2947
+ const g = GENES_LOCAL?.find(x => x.symbol === currentGeneSymbol);
2948
+ if (!g) { els.mrna.textContent = "—"; return; }
2949
+ const mrna = spliceExons(g.seq, g.exons);
2950
+ const orf = findLongestORF(mrna, 30);
2951
+ const genomicBP = g.length;
2952
+ const mrnaBP = mrna.length;
2953
+ const nExons = g.exons.length;
2954
+ if (!orf) {
2955
+ els.mrna.innerHTML =
2956
+ `<strong>${genomicBP.toLocaleString("en-US")} bp</strong> genomic` +
2957
+ ` · <strong>${nExons}</strong> exon${nExons === 1 ? "" : "s"}` +
2958
+ ` <span class="arrow">→</span> <strong>${mrnaBP.toLocaleString("en-US")} bp</strong> mRNA` +
2959
+ ` <span class="arrow">→</span> no ORF ≥30 aa`;
2960
+ return;
2961
+ }
2962
+ const trunc = orf.truncated
2963
+ ? `<span class="mrna-trunc">truncated · no stop codon</span>` : "";
2964
+ els.mrna.innerHTML =
2965
+ `<strong>${genomicBP.toLocaleString("en-US")} bp</strong> genomic` +
2966
+ ` · <strong>${nExons}</strong> exon${nExons === 1 ? "" : "s"}` +
2967
+ ` <span class="arrow">→</span> <strong>${mrnaBP.toLocaleString("en-US")} bp</strong> mRNA` +
2968
+ ` <span class="arrow">→</span> <strong>${orf.aa.length} aa</strong>` +
2969
+ ` from ATG @ ${orf.startBP + 1}${trunc}`;
2970
+ }
2971
+
2972
+ // Render Carbon's translated protein AND the reference protein side by
2973
+ // side, with mismatches highlighted in red on both rows so the visitor
2974
+ // can read the divergence in either direction. Mirrors §1's two-row
2975
+ // model-output / reference layout so the visual grammar carries over.
2976
+ //
2977
+ // Length asymmetry handling:
2978
+ // - When Carbon's ORF is shorter than the reference (typical case),
2979
+ // positions past Carbon's end are highlighted on the reference row
2980
+ // only — they materialise "Carbon stopped early".
2981
+ // - When Carbon's ORF is longer than the reference (rarer), positions
2982
+ // past the reference's end are highlighted on Carbon's row — they
2983
+ // materialise "Carbon kept reading past the real stop codon".
2984
  function renderAAComparison(carbonAA, refAA) {
2985
+ const nC = carbonAA.length;
2986
+ const nR = refAA.length;
2987
+
2988
+ // Carbon row: render every position of carbon, highlight when c[i] != r[i]
2989
+ // (or when ref ran out at i — extra Carbon residue).
2990
+ const cParts = new Array(nC);
2991
+ for (let i = 0; i < nC; i++) {
2992
+ const c = carbonAA[i], r = refAA[i];
2993
+ cParts[i] = (r === undefined || c !== r)
2994
+ ? `<span class="ref-mismatch">${c}</span>` : c;
2995
  }
2996
+ // Reference row: symmetric render every position of ref, highlight
2997
+ // when r[i] != c[i] (or when carbon ran out — Carbon stopped early).
2998
+ const rParts = new Array(nR);
2999
+ for (let i = 0; i < nR; i++) {
3000
+ const r = refAA[i], c = carbonAA[i];
3001
+ rParts[i] = (c === undefined || r !== c)
3002
+ ? `<span class="ref-mismatch">${r}</span>` : r;
3003
+ }
3004
+ // Soft-wrap at 40 chars — the two columns are narrower than §1's
3005
+ // single-column block, so a tighter wrap keeps lines from spilling
3006
+ // and lets the eye scan Carbon ↔ Reference at the same y position.
3007
+ const wrap = parts => {
3008
+ let out = "";
3009
+ for (let i = 0; i < parts.length; i += 40) out += parts.slice(i, i + 40).join("") + "\n";
3010
+ return out;
3011
+ };
3012
+ els.aa.innerHTML = wrap(cParts);
3013
+ els.refAa.innerHTML = wrap(rParts);
3014
+
3015
+ // Length-aware labels — the visitor sees that 131 ≠ 147 at a glance and
3016
+ // doesn't have to cross-reference with the stat row at the bottom.
3017
+ const lenTag = (n, prefix) =>
3018
+ `<span class="aa-len-tag">${prefix}${n} aa</span>`;
3019
+ const mismatches = (() => {
3020
+ const k = Math.min(nC, nR);
3021
+ let m = 0;
3022
+ for (let i = 0; i < k; i++) if (carbonAA[i] !== refAA[i]) m++;
3023
+ return m;
3024
+ })();
3025
+ els.aaLabel.innerHTML =
3026
+ `<span class="seq-tag carbon">carbon</span>translated protein ` +
3027
+ lenTag(nC, "") +
3028
+ ` <span style="color:#b00020">· ${mismatches}/${Math.min(nC,nR)} mismatches highlighted</span>`;
3029
+ els.refAaLabel.innerHTML =
3030
+ `<span class="seq-tag ref">reference</span>protein ` +
3031
+ lenTag(nR, "") +
3032
+ ` <span style="color:#888">· same positions highlighted for alignment</span>`;
3033
  }
3034
 
3035
  // Hydrate the viewers and stat row from a precomputed `fold_example`
 
3038
  // can still trigger a fresh run with the ▶ fold button.
3039
  function hydrateFoldExample(ex) {
3040
  if (!ensureViewers()) return false;
3041
+ setPending(false); // clear any leftover "fixture pending" state
3042
  renderStructure(viewerCarbon, ex.carbon_pdb);
3043
  renderStructure(viewerRef, ex.ref_pdb);
3044
  els.nRes.textContent = `${ex.carbon_aa.length} / ${ex.ref_aa.length}`;
 
3049
  el.classList.remove("muted");
3050
  }
3051
  renderAAComparison(ex.carbon_aa, ex.ref_aa);
3052
+ setStatus("cached example", "");
3053
  return true;
3054
  }
3055
 
3056
+ // Used when a gene has no precomputed fold_example. In the shipped
3057
+ // cached-only build this happens for genes whose fixture is still
3058
+ // queued for precompute (e.g. when the Carbon HF endpoint was in
3059
+ // error during the last `python scripts/precompute.py --folds` run).
3060
+ // We surface that state explicitly via an overlay on both viewers so
3061
+ // it doesn't read as a bug.
3062
  function resetFoldUI() {
3063
+ els.aa.innerHTML = "— fixture pending · precompute hasn't run yet for this gene —";
3064
  for (const el of [els.nRes, els.plddtC, els.plddtR, els.identity]) {
3065
  el.textContent = "—";
3066
  el.classList.add("muted");
3067
  }
3068
  if (viewerCarbon) { viewerCarbon.removeAllModels(); viewerCarbon.render(); }
3069
  if (viewerRef) { viewerRef.removeAllModels(); viewerRef.render(); }
3070
+ if (ensureViewers()) setPending(true, "fixture pending");
3071
  }
3072
 
3073
  function selectGene(symbol) {
 
3076
  p.classList.toggle("active", p.dataset.gene === symbol)
3077
  );
3078
  renderInfo();
3079
+ renderMRNAInfo();
3080
  const g = GENES_LOCAL?.find(x => x.symbol === symbol);
3081
  if (g?.fold_example) {
3082
  // 3Dmol might not be loaded on the very first paint; retry shortly.
 
3089
  }
3090
  }
3091
 
3092
+ // No-ops in the cached-only build — the prefix selector isn't rendered.
3093
+ // Kept here so re-adding the .pills element in the toolbar wires it
3094
+ // back up without a JS change.
3095
  function bindPrefixPills() {
3096
+ if (!els.prefixPills) return;
3097
  els.prefixPills.querySelectorAll(".pill").forEach(p => {
3098
  p.addEventListener("click", () => {
3099
  prefixLen = +p.dataset.prefix;
 
3115
  function makeViewer(host) {
3116
  if (!window.$3Dmol) return null;
3117
  host.innerHTML = "";
3118
+ const v = $3Dmol.createViewer(host, { backgroundColor: "#fafaf7", antialias: true });
3119
+ // 3Dmol installs a wheel listener on its internal canvas that zooms
3120
+ // the camera AND preventDefaults the page scroll. We only want orbit
3121
+ // controls; scroll should keep scrolling the page. Intercept wheel
3122
+ // events at the host in capture phase and stopImmediatePropagation
3123
+ // so 3Dmol never sees them. No preventDefault browser scroll runs.
3124
+ // We also use this hook to bump the idle-rotation timer below so
3125
+ // ambient spin pauses the instant the visitor touches a viewer.
3126
+ host.addEventListener("wheel", (e) => {
3127
+ e.stopImmediatePropagation();
3128
+ bumpInteraction();
3129
+ }, { capture: true, passive: true });
3130
+ for (const ev of ["pointerdown", "touchstart"]) {
3131
+ host.addEventListener(ev, bumpInteraction, { capture: true, passive: true });
3132
+ }
3133
+ return v;
3134
+ }
3135
+
3136
+ // ── Idle auto-rotation ────────────────────────────────────────────
3137
+ // Gentle constant-velocity Y-spin while the visitor isn't interacting,
3138
+ // to give the side-by-side comparison some life without forcing them
3139
+ // to drag every time. Any pointer/wheel input pauses immediately;
3140
+ // after IDLE_DELAY_MS of silence we ramp the spin back in over RAMP_MS
3141
+ // with an ease-in-out so the resume isn't jarring. We rotate only
3142
+ // viewerCarbon — linkViewer mirrors it onto viewerRef in the same
3143
+ // frame, so the two cartoons stay perfectly in sync.
3144
+ const IDLE_ROT_DELAY_MS = 2500;
3145
+ const IDLE_ROT_RAMP_MS = 900;
3146
+ const IDLE_ROT_MAX_DPS = 1; // ~one revolution per minute
3147
+ const PREFERS_REDUCED_MOTION = window.matchMedia
3148
+ ? window.matchMedia("(prefers-reduced-motion: reduce)").matches
3149
+ : false;
3150
+ let lastInteractionAt = performance.now();
3151
+ let idleRotRAF = 0;
3152
+ let idleRotLastT = 0;
3153
+ let idleRotSectionVisible = true;
3154
+ function bumpInteraction() { lastInteractionAt = performance.now(); }
3155
+ function idleRotStep(now) {
3156
+ idleRotRAF = 0;
3157
+ if (!viewerCarbon || !viewerRef) return;
3158
+ const dt = idleRotLastT ? Math.min(100, now - idleRotLastT) : 16;
3159
+ idleRotLastT = now;
3160
+ const idle = now - lastInteractionAt;
3161
+ if (idle >= IDLE_ROT_DELAY_MS && idleRotSectionVisible && !PREFERS_REDUCED_MOTION) {
3162
+ const k = Math.min(1, (idle - IDLE_ROT_DELAY_MS) / IDLE_ROT_RAMP_MS);
3163
+ const eased = k < 0.5 ? 2 * k * k : 1 - Math.pow(-2 * k + 2, 2) / 2;
3164
+ const deg = IDLE_ROT_MAX_DPS * eased * (dt / 1000);
3165
+ if (deg > 0) viewerCarbon.rotate(deg, "y", 0, false);
3166
+ }
3167
+ idleRotRAF = requestAnimationFrame(idleRotStep);
3168
+ }
3169
+ function startIdleRotation() {
3170
+ if (idleRotRAF || PREFERS_REDUCED_MOTION) return;
3171
+ idleRotLastT = 0;
3172
+ idleRotRAF = requestAnimationFrame(idleRotStep);
3173
+ }
3174
+
3175
+ // Pause the rAF loop when the §5 section is offscreen — no point
3176
+ // burning frames on cartoons the visitor can't see.
3177
+ function watchFoldingVisibility() {
3178
+ const section = document.getElementById("folding");
3179
+ if (!section || !window.IntersectionObserver) return;
3180
+ new IntersectionObserver((entries) => {
3181
+ for (const e of entries) idleRotSectionVisible = e.isIntersecting;
3182
+ }, { threshold: 0.01 }).observe(section);
3183
+ }
3184
+ watchFoldingVisibility();
3185
+
3186
+ // Create both viewers (idempotent) and link them so an orbit drag on
3187
+ // one propagates to the other. Mirrors the side-by-side "synced
3188
+ // cameras" setup PyMOL/ChimeraX use for structure comparison — the
3189
+ // visitor sees the same orientation of both proteins, which is what
3190
+ // makes the visual comparison meaningful. (Wheel zoom is intentionally
3191
+ // disabled in makeViewer so scroll keeps scrolling the page.)
3192
  let viewersLinked = false;
3193
  function ensureViewers() {
3194
  if (!window.$3Dmol) return false;
 
3200
  viewerRef.linkViewer(viewerCarbon);
3201
  viewersLinked = true;
3202
  }
3203
+ startIdleRotation();
3204
  return !!(viewerCarbon && viewerRef);
3205
  }
3206
 
 
3220
  function setRunning(running, label = "computing") {
3221
  for (const host of [els.vCarbon, els.vRef]) {
3222
  host.classList.toggle("running", running);
3223
+ if (running) {
3224
+ const t = host.querySelector(".fold-overlay-label");
3225
+ if (t) t.textContent = label;
3226
+ }
3227
  }
3228
  for (const el of [els.nRes, els.plddtC, els.plddtR, els.identity]) {
3229
  el.classList.toggle("muted", running);
3230
  }
3231
+ if (els.go) els.go.textContent = running ? "running…" : "▶ fold";
3232
+ }
3233
+
3234
+ // Mirror of setRunning for the "fixture not ready" state. Reuses the
3235
+ // same overlay markup but a different CSS class, so the two states
3236
+ // can never visually conflict.
3237
+ function setPending(pending, label = "fixture pending") {
3238
+ for (const host of [els.vCarbon, els.vRef]) {
3239
+ host.classList.toggle("pending", pending);
3240
+ if (pending) {
3241
+ const t = host.querySelector(".fold-overlay-label");
3242
+ if (t) t.textContent = label;
3243
+ }
3244
+ }
3245
  }
3246
 
3247
  // Editorial pLDDT palette. The three anchor colours match the legend
 
3303
  const f = geneFeasibility(gene);
3304
  if (!f.feasible) {
3305
  setStatus(
3306
+ `${gene.symbol} spans ${f.lastExonEnd.toLocaleString("en-US")} bp of genomic DNA — ` +
3307
  `outside what Carbon can generate live. Try HBB or INS.`,
3308
  "error"
3309
  );
 
3312
 
3313
  abortCtrl?.abort();
3314
  abortCtrl = new AbortController();
3315
+ if (els.go) els.go.disabled = true;
3316
  ensureViewers(); // overlay must exist before we toggle .running on it
3317
 
3318
  try {
 
3376
  } finally {
3377
  setRunning(false);
3378
  abortCtrl = null;
3379
+ if (els.go) els.go.disabled = false;
3380
  }
3381
  }
3382
 
 
3391
  );
3392
  selectGene(genes[0].symbol);
3393
  bindPrefixPills();
3394
+ els.go?.addEventListener("click", runFold);
3395
  }).catch(e => {
3396
  els.info.textContent = "failed to load genes: " + (e.message || e);
3397
  });
 
3446
 
3447
  const n1 = seq.length;
3448
  const n6 = Math.ceil(seq.length / 6);
3449
+ els.oneTok.textContent = n1.toLocaleString("en-US");
3450
+ els.sixTok.textContent = n6.toLocaleString("en-US");
3451
+ els.oneAtt.innerHTML = `${(n1*n1).toLocaleString("en-US")}<span style="color:#999;font-size:9px;margin-left:3px">L²</span>`;
3452
+ els.sixAtt.innerHTML = `${(n6*n6).toLocaleString("en-US")}<span style="color:#999;font-size:9px;margin-left:3px">L²</span>`;
3453
 
3454
  // Speedup bars: visualize attention cost ratio
3455
  const maxCost = n1 * n1 || 1;
 
3464
  svg += `<text x="${padL - 8}" y="${y + 13}" font-family="JetBrains Mono" font-size="11" fill="#333" text-anchor="end">${r.label}</text>`;
3465
  const w = (r.cost / maxCost) * (W - padL - padR);
3466
  svg += `<rect x="${padL}" y="${y}" width="${Math.max(2, w)}" height="${rowH}" fill="${r.color}"/>`;
3467
+ svg += `<text x="${padL + w + 6}" y="${y + 13}" font-family="JetBrains Mono" font-size="10" fill="#333">${r.cost.toLocaleString("en-US")}</text>`;
3468
  });
3469
  els.bars.setAttribute("viewBox", `0 0 ${W} ${H}`);
3470
  els.bars.style.height = `${H}px`;
 
4259
  })();
4260
 
4261
  // =========================================================================
4262
+ // §6 — UMAP scatter (WebGL, 571K points)
4263
  //
4264
+ // Loads a binary-packed scatter (int16 quantized 2D positions + 4 uint8 category
4265
+ // columns — species, biotype, strand, gc_content) and renders it via WebGL
4266
+ // gl.POINTS with a 1D palette texture for coloring. Toggle between coloring axes
4267
+ // (species / biotype / strand / gc) rebinds a single byte-attribute buffer and
4268
+ // swaps the palette texture — no re-upload of the 571K vertex stream. Hover
4269
+ // lookup uses a flat grid index so picking stays O(small) regardless of total
4270
+ // point count.
4271
  // =========================================================================
4272
  (function initDemoUmap() {
4273
  const canvas = document.getElementById("dumap-canvas");
 
4277
  const info = document.getElementById("dumap-info");
4278
  const legend = document.getElementById("dumap-legend");
4279
  const resetBtn = document.getElementById("dumap-reset");
4280
+ // The UMAP toolbar used to ship a `<span class="status">` indicator that
4281
+ // showed "loading…" / "loaded 571k pts · 1274 ms" / "error" next to the
4282
+ // pills. Removed because (a) loading is already explained by the fullscreen
4283
+ // overlay, (b) the post-load metric was telemetry-grade detail not visitor-
4284
+ // grade insight. Calls into setStatus below survive as no-ops so the live
4285
+ // load path doesn't have to be rewritten.
4286
+ const status = null;
4287
+ const statusText = null;
4288
  const colorPills = document.querySelectorAll("#dumap-color-pills .pill");
4289
  const elN = document.getElementById("dumap-n");
4290
  const elNsp = document.getElementById("dumap-nsp");
4291
  const elFps = document.getElementById("dumap-fps");
4292
 
4293
  // ---- Palettes ----------------------------------------------------------
4294
+ // 27 species grouped into 6 kingdoms — each kingdom gets a hue band.
4295
  // Within a band, lightness varies to keep adjacent species distinguishable.
4296
+ // Order MUST match labels.species (= the order from scripts/build_real_umap.py).
4297
  const SPECIES_PALETTE = [
4298
+ // vertebrates (10) — blue/indigo/violet band
4299
+ [40,80,160], [60,100,180], [80,120,195], [100,140,210], [120,160,225],
4300
+ [140,100,200], [160,120,215], [125,90,170], [105,75,150], [85,60,130],
4301
+ // invertebrates (2) — orange band
4302
+ [220,110,30], [240,160,70],
 
4303
  // plants (5) — olive/lime band (intentionally different from Carbon's
4304
  // signal-green #317f3f so the UI chrome doesn't blend with the data)
4305
+ [85,140,55], [115,170,75], [145,200,100], [175,220,135], [205,240,170],
4306
+ // fungi (5) — magenta/rose band
4307
+ [180,40,110], [200,70,140], [220,100,160], [235,130,175], [245,160,190],
4308
+ // bacteria (3) — ochre/amber band
4309
+ [180,140,40], [200,160,60], [220,180,80],
4310
+ // viruses (2) — deep red band (outliers, intentionally dramatic)
4311
+ [160,30,40], [200,50,55],
4312
  ];
4313
+ // protein_coding is ~80% of the points — using a saturated colour for it
4314
+ // floods the canvas and erases the three minority biotypes. We give it a
4315
+ // washed-out sage instead (still readable as "the green class") and crank
4316
+ // the saturation on the rare classes so they pop on top of the carpet.
4317
  const BIOTYPE_PALETTE = [
4318
+ [180,205,180], // protein_coding — washed sage (volume class)
4319
+ [210,55,45], // lncRNA — vivid Carbon red
4320
+ [40,100,200], // snRNAvivid blue
4321
+ [240,160,30], // misc_RNA amber (was gray, invisible)
4322
  ];
4323
  const STRAND_PALETTE = [
4324
  [49,127,63], // + (forward)
4325
  [188,46,37], // - (reverse)
4326
  ];
4327
+ // Continuous gradient for gc_content (uint8 0..255 [0, 1]).
4328
+ // 3-stop: low GC (AT-rich) reads as cool steel, mid as neutral, high
4329
+ // GC (GC-rich) as warm amber — natural "density" feel without
4330
+ // colliding with the categorical palettes.
4331
+ function buildGCPalette() {
4332
+ const out = [];
4333
+ for (let i = 0; i < 256; i++) {
4334
+ const t = i / 255;
4335
+ let r, g, b;
4336
+ if (t < 0.5) {
4337
+ const u = t * 2;
4338
+ r = Math.round(60 + (170 - 60) * u);
4339
+ g = Math.round(90 + (170 - 90) * u);
4340
+ b = Math.round(160 + (170 - 160) * u);
4341
+ } else {
4342
+ const u = (t - 0.5) * 2;
4343
+ r = Math.round(170 + (230 - 170) * u);
4344
+ g = Math.round(170 + (190 - 170) * u);
4345
+ b = Math.round(170 + (50 - 170) * u);
4346
+ }
4347
+ out.push([r, g, b]);
4348
+ }
4349
+ return out;
4350
+ }
4351
+ const GC_PALETTE = buildGCPalette();
4352
  const PALETTES = {
4353
  species: SPECIES_PALETTE,
4354
  biotype: BIOTYPE_PALETTE,
4355
  strand: STRAND_PALETTE,
4356
+ gc: GC_PALETTE,
4357
  };
4358
 
4359
  // ---- State -------------------------------------------------------------
4360
  let gl, program;
4361
  let posBuf; // int16 interleaved x,y
4362
+ let catBufs = {}; // { species|biotype|strand|gc: GLBuffer of uint8 }
4363
  let paletteTex;
4364
  let n = 0;
4365
+ let labels = null; // { species:[], biotypes:[], strands:[] } see scripts/build_real_umap.py for the full schema
4366
  // Raw category bytes — kept on CPU side too for tooltip lookups.
4367
+ let cats = { species: null, biotype: null, strand: null, gc: null };
4368
  // World bounds + current colorBy axis.
4369
  let bounds = [0,0,0,0];
4370
  let colorBy = "species";
 
4377
  let grid = null;
4378
 
4379
  function setStatus(state, text) {
4380
+ if (!status) return;
4381
  status.classList.remove("streaming", "error");
4382
  if (state === "streaming") status.classList.add("streaming");
4383
  if (state === "error") status.classList.add("error");
 
4409
  float r = length(d);
4410
  float aa = smoothstep(0.50, 0.42, r);
4411
  if (aa <= 0.001) discard;
4412
+ float a = aa * u_alpha;
4413
  float t = (v_cat + 0.5) / u_paletteN;
4414
  vec3 color = texture2D(u_palette, vec2(t, 0.5)).rgb;
4415
+ // Pre-multiplied output matches blendFunc(ONE, ONE_MINUS_SRC_ALPHA)
4416
+ // and prevents the dense-overlap brightening you get with straight
4417
+ // alpha (which would need blendFunc(SRC_ALPHA, ONE_MINUS_SRC_ALPHA)).
4418
+ gl_FragColor = vec4(color * a, a);
4419
  }
4420
  `;
4421
  function compile(type, src) {
 
4446
  // the paper background and over each other cleanly at dense overlaps.
4447
  gl.enable(gl.BLEND);
4448
  gl.blendFunc(gl.ONE, gl.ONE_MINUS_SRC_ALPHA);
4449
+ // Transparent clear — the .umap-frame CSS background (paper tone) shows
4450
+ // through, keeping the canvas in tune with the rest of the page.
4451
+ gl.clearColor(0, 0, 0, 0);
4452
 
4453
  paletteTex = gl.createTexture();
4454
  gl.bindTexture(gl.TEXTURE_2D, paletteTex);
 
4472
  }
4473
 
4474
  // ---- Data load ---------------------------------------------------------
4475
+ // Mulberry32: tiny seeded PRNG, ~10 lines, good enough for visual shuffling.
4476
+ // Picked over Math.random() because we want the same layout across reloads
4477
+ // (so users can describe what they see and we can reproduce it).
4478
+ function mulberry32(seed) {
4479
+ return function() {
4480
+ seed = (seed + 0x6D2B79F5) | 0;
4481
+ let t = Math.imul(seed ^ (seed >>> 15), 1 | seed);
4482
+ t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
4483
+ return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
4484
+ };
4485
+ }
4486
+
4487
+ // Fisher-Yates over N parallel arrays: pos16 (2 entries / point, x then y)
4488
+ // and catArrays (1 entry / point, e.g. species / biotype / strand / gc).
4489
+ // Mutating the typed arrays in place avoids allocating a 16 MB reshuffled
4490
+ // buffer — important at 571 K points.
4491
+ function shuffleParallel(pos16, catArrays, n, seed) {
4492
+ const rand = mulberry32(seed);
4493
+ for (let i = n - 1; i > 0; i--) {
4494
+ const j = (rand() * (i + 1)) | 0;
4495
+ if (i === j) continue;
4496
+ const xi = pos16[2*i], yi = pos16[2*i + 1];
4497
+ pos16[2*i] = pos16[2*j]; pos16[2*i + 1] = pos16[2*j + 1];
4498
+ pos16[2*j] = xi; pos16[2*j + 1] = yi;
4499
+ for (const a of catArrays) {
4500
+ const t = a[i]; a[i] = a[j]; a[j] = t;
4501
+ }
4502
+ }
4503
+ }
4504
+
4505
  async function loadData() {
4506
  setStatus("streaming", "loading…");
4507
  const t0 = performance.now();
 
4513
  const buf = await binResp.arrayBuffer();
4514
  labels = await labelsResp.json();
4515
 
4516
+ // Parse header (matches scripts/build_real_umap.py — 64-byte header).
4517
+ // Layout:
4518
+ // u32 [magic, n_points, n_species, n_biotypes, n_strands, flags] (24 b)
4519
+ // f32 [x2d_min, x2d_max, y2d_min, y2d_max] (16 b)
4520
+ // f32 [x3d_min, x3d_max, y3d_min, y3d_max, z3d_min, z3d_max] (24 b)
4521
+ // flags bit0 = has_3D positions, bit1 = has gc_content column.
4522
  const hdrU32 = new Uint32Array(buf, 0, 6);
4523
+ const magic = hdrU32[0];
4524
  if (magic !== 0xCAB0FA1D) throw new Error("bad magic: " + magic.toString(16));
4525
  n = hdrU32[1];
4526
+ const flags = hdrU32[5];
4527
+ const has3D = (flags & 0b01) !== 0;
4528
+ const hasGC = (flags & 0b10) !== 0;
4529
+ const hdrF32 = new Float32Array(buf, 24, 10);
4530
  bounds = [hdrF32[0], hdrF32[1], hdrF32[2], hdrF32[3]];
4531
+ // bounds_3d (hdrF32[4..10]) is parsed but unused — the v1 viewer
4532
+ // renders the 2D projection only. Kept in the binary so a future
4533
+ // 3D mode can switch attribute streams without re-fetching.
4534
 
4535
+ let off = 64;
4536
  const pos16 = new Int16Array(buf, off, n * 2); off += n * 2 * 2;
4537
+ if (has3D) {
4538
+ // Skip pos_3d (int16 × 3 × n). Loaded into RAM is unnecessary
4539
+ // for v1 — the binary stays small enough that re-fetching for
4540
+ // a 3D mode is fine, and skipping keeps GPU memory tight.
4541
+ off += n * 3 * 2;
4542
+ }
4543
  cats.species = new Uint8Array(buf, off, n); off += n;
4544
  cats.biotype = new Uint8Array(buf, off, n); off += n;
4545
  cats.strand = new Uint8Array(buf, off, n); off += n;
4546
+ if (hasGC) {
4547
+ cats.gc = new Uint8Array(buf, off, n); off += n;
4548
+ }
4549
+ const catKeys = ["species", "biotype", "strand"];
4550
+ if (hasGC) catKeys.push("gc");
4551
+
4552
+ // Deterministic shuffle of the parallel arrays. The binary is sorted by
4553
+ // species (= order of viz.csv), so without this protein_coding (≈80% of
4554
+ // points) systematically lands on top of the minority biotypes/rare
4555
+ // species and visually erases them. A fixed seed keeps the layout stable
4556
+ // across reloads — same dot in the same place every time. Mulberry32 is
4557
+ // good enough and one line; Fisher-Yates over 571 K entries is ~30 ms.
4558
+ shuffleParallel(pos16, catKeys.map(k => cats[k]), n, 0xC4B0FA1D);
4559
 
4560
  // Upload to GPU.
4561
  posBuf = gl.createBuffer();
4562
  gl.bindBuffer(gl.ARRAY_BUFFER, posBuf);
4563
  gl.bufferData(gl.ARRAY_BUFFER, pos16, gl.STATIC_DRAW);
4564
+ for (const key of catKeys) {
4565
  const b = gl.createBuffer();
4566
  gl.bindBuffer(gl.ARRAY_BUFFER, b);
4567
  gl.bufferData(gl.ARRAY_BUFFER, cats[key], gl.STATIC_DRAW);
 
4586
 
4587
  const ms = (performance.now() - t0) | 0;
4588
  setStatus("idle", `loaded ${(n/1000)|0}k pts · ${ms} ms`);
4589
+ info.textContent = `${n.toLocaleString("en-US")} sequences · ${labels.species.length} species · drag to pan, wheel to zoom`;
4590
  overlay.classList.add("hidden");
4591
 
4592
  return pos16;
 
4644
  // but the dots get visibly bigger when you zoom in.
4645
  const ps = Math.min(8.0, Math.max(1.4, 1.4 + 0.6 * Math.log2(view.scale + 1))) * dpr;
4646
  gl.uniform1f(gl.getUniformLocation(program, "u_pointSize"), ps);
4647
+ // Alpha rises with zoom so individual dots stay readable, but starts low
4648
+ // so the dense 571 K cloud doesn't blow out at zoom 1.
4649
+ const alpha = Math.min(0.85, Math.max(0.22, 0.22 + 0.20 * Math.log2(view.scale + 1)));
4650
  gl.uniform1f(gl.getUniformLocation(program, "u_alpha"), alpha);
4651
 
4652
  gl.drawArrays(gl.POINTS, 0, n);
 
4685
  // ---- Legend ------------------------------------------------------------
4686
  function renderLegend() {
4687
  if (!labels) return;
4688
+ // gc_content is continuous — render a horizontal gradient bar with
4689
+ // 0.0 / 0.5 / 1.0 ticks instead of one swatch per value (would be
4690
+ // 256 entries, useless visually).
4691
+ if (colorBy === "gc") {
4692
+ const stops = GC_PALETTE
4693
+ .filter((_, i) => i % 8 === 0) // 32 stops is plenty for a 1D bar
4694
+ .map((c, i, a) => `<stop offset="${(i / (a.length - 1)) * 100}%" stop-color="rgb(${c[0]},${c[1]},${c[2]})"/>`)
4695
+ .join("");
4696
+ legend.innerHTML =
4697
+ `<span class="item gc-grad">
4698
+ <svg width="160" height="10" aria-hidden="true">
4699
+ <defs><linearGradient id="umap-gc-grad" x1="0" x2="1">${stops}</linearGradient></defs>
4700
+ <rect width="160" height="10" fill="url(#umap-gc-grad)"/>
4701
+ </svg>
4702
+ <span class="gc-ticks">0.0 &middot; 0.5 &middot; 1.0</span>
4703
+ </span>`;
4704
+ return;
4705
+ }
4706
  const palette = PALETTES[colorBy];
4707
  const itemLabels = (colorBy === "species") ? labels.species
4708
  : (colorBy === "biotype") ? labels.biotypes
4709
+ : labels.strands;
 
4710
  legend.innerHTML = itemLabels.map((name, i) => {
4711
  const [r, g, b] = palette[i % palette.length];
4712
  return `<span class="item"><span class="swatch" style="background:rgb(${r},${g},${b})"></span>${name}</span>`;
 
4714
  }
4715
 
4716
  // ---- Pan / zoom / hover ------------------------------------------------
4717
+ // Reset is a no-op when we're already at the fit-the-data view, so the
4718
+ // button switches to a disabled state in that case — same affordance as
4719
+ // a back-button greying out at the top of the history stack. Avoids a
4720
+ // distracting always-active control on first paint.
4721
+ function updateResetEnabled() {
4722
+ if (!resetBtn) return;
4723
+ const atDefault = view.tx === 0 && view.ty === 0 && view.scale === 1;
4724
+ resetBtn.disabled = atDefault;
4725
+ }
4726
+ function resetView() {
4727
+ view = { tx: 0, ty: 0, scale: 1 };
4728
+ updateResetEnabled();
4729
+ requestRedraw();
4730
+ }
4731
+
4732
+ // Keep the viewport always full of data. The data spans [-0.92, 0.92]·scale
4733
+ // in world space; the viewport spans [-1, 1]. As long as 0.92·scale ≥ 1
4734
+ // (zoom ≥ ~1.087), there's "slack" we can pan within: |tx| ≤ 0.92·scale-1.
4735
+ // Below that — i.e. at minimum zoom where the UMAP fits the viewport with
4736
+ // margin — we snap to (0, 0) so the data stays centered and no white edge
4737
+ // creeps in. Paired with the scale clamp in the wheel handler, this means
4738
+ // "fully zoomed out" = "UMAP exactly fit, perfectly centered".
4739
+ function clampPan() {
4740
+ const m = Math.max(0, 0.92 * view.scale - 1);
4741
+ if (m === 0) {
4742
+ view.tx = 0; view.ty = 0;
4743
+ } else {
4744
+ view.tx = Math.max(-m, Math.min(m, view.tx));
4745
+ view.ty = Math.max(-m, Math.min(m, view.ty));
4746
+ }
4747
+ }
4748
 
4749
  // Convert a clientX/Y to NDC (-1..1) and to normalized data space ([-1, 1]).
4750
  function clientToNDC(e) {
 
4776
  const dx = ((e.clientX - panLast.x) / rect.width) * 2;
4777
  const dy = -((e.clientY - panLast.y) / rect.height) * 2;
4778
  view.tx += dx; view.ty += dy;
4779
+ clampPan();
4780
+ updateResetEnabled();
4781
  panLast = { x: e.clientX, y: e.clientY };
4782
  requestRedraw();
4783
  } else {
 
4799
  const ndc = clientToNDC(e);
4800
  // Zoom factor — natural feeling on both trackpad and mouse wheel.
4801
  const factor = Math.exp(-e.deltaY * 0.0018);
4802
+ // Min scale = 1 means "fully zoomed out = UMAP fits the viewport". We
4803
+ // intentionally don't let the visitor zoom out further: there's no
4804
+ // information past the data bounds, and the empty margin makes the
4805
+ // dataset feel small. Max 50× keeps individual points pickable.
4806
+ const newScale = Math.min(50, Math.max(1, view.scale * factor));
4807
  const k = newScale / view.scale;
4808
  // Zoom around the cursor: shift translate so the point under the cursor
4809
  // stays under the cursor.
4810
  view.tx = ndc.x - (ndc.x - view.tx) * k;
4811
  view.ty = ndc.y - (ndc.y - view.ty) * k;
4812
  view.scale = newScale;
4813
+ clampPan();
4814
+ updateResetEnabled();
4815
  requestRedraw();
4816
  hideTooltip();
4817
  }, { passive: false });
 
4823
  const sp = labels.species[cats.species[idx]];
4824
  const bt = labels.biotypes[cats.biotype[idx]];
4825
  const st = labels.strands[cats.strand[idx]];
4826
+ const gc = cats.gc ? (cats.gc[idx] / 255).toFixed(2) : "—";
4827
  tooltip.innerHTML =
4828
  `<div><span class="t-label">species</span>${sp}</div>` +
4829
  `<div><span class="t-label">biotype</span>${bt}</div>` +
4830
+ `<div><span class="t-label">strand</span>${st} &nbsp; <span class="t-label">gc</span>${gc}</div>`;
4831
  tooltip.style.left = x + "px";
4832
  tooltip.style.top = y + "px";
4833
  tooltip.classList.add("visible");
 
4894
  });
4895
  });
4896
 
4897
+ // Defer loading until the umap section is near the viewport — 571K points
4898
  // doesn't need to fight for bandwidth on first paint.
4899
  const io = new IntersectionObserver(async (entries) => {
4900
  if (!entries[0].isIntersecting) return;