lvwerra HF Staff Claude Opus 4.7 (1M context) commited on
Commit
0659e8b
·
1 Parent(s): d835cbf

Pareto chart: rebuild as inline SVG, add Dataset link to banner

Browse files

Replaces /img/pareto.png with a self-contained SVG built from
pareto/pareto_data.csv. Geometry mirrors the matplotlib reference
(log-scale throughput, linear win-rate %, family badges, 275×
speedup arrow, "better/faster" indicator) but the chrome is pulled
back to fit the editorial blog tone: hairline frame + tick lines,
JetBrains Mono tabular tick labels, mono-uppercase indicator
eyebrows, and plain text data labels with a paint-order halo
instead of pill boxes. Carbon points scale up + use a heavier
label per the source script's HIGHLIGHT_LOGO_SCALE.

Also adds a "Dataset" link (HuggingFaceBio/carbon-pretraining-corpus)
to the banner resources row, between Models and Tech report so the
two HF-hub resources sit together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (4) hide show
  1. assets/styles/section-intro.css +145 -0
  2. demo.html +161 -3
  3. img/arc.webp +0 -0
  4. img/generator.webp +0 -0
assets/styles/section-intro.css CHANGED
@@ -444,3 +444,148 @@
444
  @media (max-width: 720px) {
445
  .cd-mols { grid-template-columns: repeat(2, 1fr); }
446
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
444
  @media (max-width: 720px) {
445
  .cd-mols { grid-template-columns: repeat(2, 1fr); }
446
  }
447
+
448
+ /* ------------------------------------------------------------------ */
449
+ /* §0 release lede · native Pareto chart. */
450
+ /* Replaces /img/pareto.png with an inline SVG built from */
451
+ /* pareto/pareto_data.csv. Geometry mirrors the matplotlib reference, */
452
+ /* but the chrome is pulled back to fit the editorial blog tone: */
453
+ /* hairline frame + tick lines instead of the 3px black box, mono */
454
+ /* tabular tick labels, plain text data labels with a paper-coloured */
455
+ /* paint-order halo (no pill box around each marker), and the */
456
+ /* "better/faster" indicator styled as a small mono uppercase eyebrow */
457
+ /* the same way as the section labels elsewhere on the page. Carbon */
458
+ /* points still scale up + use a bolder label so the eye lands on */
459
+ /* them first. */
460
+ /* ------------------------------------------------------------------ */
461
+ .tab-lede__figure--pareto {
462
+ /* Wider than the default tab-lede__figure so the long x-axis
463
+ (decade ticks 200 → 200k) doesn't squash the right-edge labels. */
464
+ max-width: 760px;
465
+ }
466
+ .pareto-chart {
467
+ display: block;
468
+ width: 100%;
469
+ height: auto;
470
+ background: #ffffff;
471
+ border: 1px solid #cfcdbf;
472
+ }
473
+ .pareto-bg {
474
+ fill: #ffffff;
475
+ }
476
+
477
+ /* Hairline frame at the same weight as the rest of the demo's
478
+ section borders — the chart reads as another paper card rather
479
+ than a heavy matplotlib export. */
480
+ .pareto-frame {
481
+ fill: none;
482
+ stroke: #cfcdbf;
483
+ stroke-width: 1;
484
+ }
485
+
486
+ /* Tick marks at the same hairline weight. Tick labels in JetBrains
487
+ Mono with tabular nums so the decade ticks line up tabularly and
488
+ the chart picks up the page's technical-mono register. Dimmed so
489
+ they read as scale references, not primary content. */
490
+ .pareto-axis line {
491
+ stroke: #cfcdbf;
492
+ stroke-width: 1;
493
+ }
494
+ .pareto-axis text {
495
+ font-family: "JetBrains Mono", ui-monospace, monospace;
496
+ font-size: 13px;
497
+ fill: var(--ink-soft);
498
+ font-feature-settings: "tnum";
499
+ }
500
+ .pareto-axis--y text {
501
+ text-anchor: end;
502
+ dominant-baseline: middle;
503
+ }
504
+ .pareto-axis--x text {
505
+ text-anchor: middle;
506
+ dominant-baseline: hanging;
507
+ }
508
+
509
+ /* Axis titles in Inter to match the page body; italic subtitle under
510
+ "Throughput" carries the units in the muted ink-soft tone. */
511
+ .pareto-axis-title {
512
+ font-family: "Inter", "Helvetica Neue", sans-serif;
513
+ font-size: 18px;
514
+ font-weight: 600;
515
+ fill: var(--ink);
516
+ text-anchor: middle;
517
+ }
518
+ .pareto-axis-subtitle {
519
+ font-family: "Inter", "Helvetica Neue", sans-serif;
520
+ font-size: 13px;
521
+ font-style: italic;
522
+ fill: var(--ink-soft);
523
+ text-anchor: middle;
524
+ }
525
+
526
+ /* "Better/faster" axes-of-improvement indicator in the lower-left.
527
+ Arrows in muted ink, labels in the same mono-uppercase eyebrow
528
+ style as the section labels (banner-links, section-num, etc.)
529
+ so the chart's chrome doesn't read as a foreign matplotlib glyph. */
530
+ .pareto-indicator line {
531
+ stroke: var(--ink-faint);
532
+ stroke-width: 1.5;
533
+ stroke-linecap: round;
534
+ }
535
+ .pareto-indicator polygon {
536
+ fill: var(--ink-faint);
537
+ }
538
+ .pareto-indicator-text {
539
+ font-family: "JetBrains Mono", ui-monospace, monospace;
540
+ font-size: 10px;
541
+ font-weight: 500;
542
+ letter-spacing: 0.14em;
543
+ text-transform: uppercase;
544
+ fill: var(--ink-faint);
545
+ text-anchor: middle;
546
+ dominant-baseline: middle;
547
+ }
548
+
549
+ /* 275× speedup arrow — the editorial headline. Solid ink, slightly
550
+ thinner than before so it doesn't overpower the chart. The label
551
+ gets a paper-coloured paint-order halo so it reads cleanly where
552
+ it crosses the arrow line behind it. */
553
+ .pareto-speedup line {
554
+ stroke: var(--ink);
555
+ stroke-width: 2.5;
556
+ stroke-linecap: round;
557
+ }
558
+ .pareto-speedup polygon {
559
+ fill: var(--ink);
560
+ }
561
+ .pareto-speedup-label {
562
+ font-family: "Inter", "Helvetica Neue", sans-serif;
563
+ font-size: 26px;
564
+ font-weight: 700;
565
+ fill: var(--ink);
566
+ text-anchor: middle;
567
+ paint-order: stroke;
568
+ stroke: #ffffff;
569
+ stroke-width: 6px;
570
+ stroke-linejoin: round;
571
+ }
572
+
573
+ /* Data labels: plain text, no pill box. The paint-order stroke acts
574
+ as a paper-coloured halo so the text always reads cleanly — even
575
+ when it sits next to a logo or crosses a tick line. Carbon labels
576
+ step up in size + weight so the highlighted models still pop. */
577
+ .pareto-label {
578
+ font-family: "Inter", "Helvetica Neue", sans-serif;
579
+ font-size: 13px;
580
+ fill: var(--ink);
581
+ text-anchor: middle;
582
+ dominant-baseline: middle;
583
+ paint-order: stroke;
584
+ stroke: #ffffff;
585
+ stroke-width: 4px;
586
+ stroke-linejoin: round;
587
+ }
588
+ .pareto-point--highlight .pareto-label {
589
+ font-size: 15px;
590
+ font-weight: 600;
591
+ }
demo.html CHANGED
@@ -199,6 +199,11 @@
199
  Models<span class="arrow" aria-hidden="true">↗</span>
200
  </a>
201
  </li>
 
 
 
 
 
202
  <li>
203
  <a href="#" target="_blank" rel="noopener">
204
  Tech report<span class="arrow" aria-hidden="true">↗</span>
@@ -275,8 +280,161 @@
275
  shipping with the full training code, the data pipeline, and the model weights.
276
  Everything is open source on the Hugging Face Hub.
277
  </p>
278
- <figure class="tab-lede__figure">
279
- <img src="/img/pareto.png" alt="Throughput vs win rate pareto frontier: Carbon 3B/8B sit at high win rate and ~275× the throughput of Arc Evo2 7B, well ahead of GENERator-v2.">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
280
  <figcaption>Throughput (base pairs per second, log scale) vs win rate across open DNA foundation models. Carbon 3B matches Evo2 7B's win rate at roughly 275× the throughput.</figcaption>
281
  </figure>
282
  </div>
@@ -486,7 +644,7 @@
486
  <div class="section--two-col intro-subsection">
487
  <div class="section-narrative">
488
  <div class="section-num">§6 · Applications</div>
489
- <div class="section-title">From bases to outcomes</div>
490
  <p class="lede">
491
  A model that understands and writes DNA is useful wherever DNA is the
492
  input or the output. There are three interesting use-cases for such
 
199
  Models<span class="arrow" aria-hidden="true">↗</span>
200
  </a>
201
  </li>
202
+ <li>
203
+ <a href="https://huggingface.co/datasets/HuggingFaceBio/carbon-pretraining-corpus" target="_blank" rel="noopener">
204
+ Dataset<span class="arrow" aria-hidden="true">↗</span>
205
+ </a>
206
+ </li>
207
  <li>
208
  <a href="#" target="_blank" rel="noopener">
209
  Tech report<span class="arrow" aria-hidden="true">↗</span>
 
280
  shipping with the full training code, the data pipeline, and the model weights.
281
  Everything is open source on the Hugging Face Hub.
282
  </p>
283
+ <!-- Pareto chart, drawn natively as inline SVG so the figure scales
284
+ sharply, picks up the page's typography, and can be tuned in
285
+ CSS without a matplotlib re-export. Source data lives in
286
+ pareto/pareto_data.csv; geometry mirrors the matplotlib
287
+ reference (scratch/plot_pareto_winrate_throughput_8b_32k_hf.py):
288
+ log-scale throughput on x, linear win-rate % on y, family
289
+ badges sitting on each data point with a plain text label
290
+ below. Chrome is pulled back to match the editorial blog
291
+ tone — hairline frame + tick lines, mono tabular tick
292
+ labels, mono-uppercase "better/faster" eyebrow indicator —
293
+ and the data labels use a paint-order halo (see
294
+ .pareto-label in section-intro.css) instead of pill boxes.
295
+ Carbon points scale up + use a heavier label per the source
296
+ script's HIGHLIGHT_LOGO_SCALE so the eye lands on them. -->
297
+ <figure class="tab-lede__figure tab-lede__figure--pareto">
298
+ <svg
299
+ class="pareto-chart"
300
+ viewBox="0 0 1000 600"
301
+ xmlns="http://www.w3.org/2000/svg"
302
+ role="img"
303
+ aria-labelledby="pareto-title pareto-desc"
304
+ >
305
+ <title id="pareto-title">Throughput vs win rate across open DNA foundation models</title>
306
+ <desc id="pareto-desc">Log-scale throughput in base pairs per second on the x-axis and win-rate percentage on the y-axis. Carbon 3B and 8B sit at roughly 275 times the throughput of Arc Evo2 7B at comparable or better win rates.</desc>
307
+
308
+ <!-- Plot interior. -->
309
+ <rect class="pareto-bg" x="100" y="30" width="870" height="470"/>
310
+
311
+ <!-- Y axis: linear win-rate %, ticks at 0/20/40/60/80/100. The
312
+ plot range runs −12..108 (matches matplotlib padding) so
313
+ the data points have headroom above 100 and below 0 for
314
+ labels; only the canonical 0..100 ticks are drawn. -->
315
+ <g class="pareto-axis pareto-axis--y">
316
+ <line x1="94" y1="61.3" x2="100" y2="61.3"/>
317
+ <line x1="94" y1="139.7" x2="100" y2="139.7"/>
318
+ <line x1="94" y1="218.0" x2="100" y2="218.0"/>
319
+ <line x1="94" y1="296.3" x2="100" y2="296.3"/>
320
+ <line x1="94" y1="374.7" x2="100" y2="374.7"/>
321
+ <line x1="94" y1="453.0" x2="100" y2="453.0"/>
322
+ <text x="86" y="61.3">100</text>
323
+ <text x="86" y="139.7">80</text>
324
+ <text x="86" y="218.0">60</text>
325
+ <text x="86" y="296.3">40</text>
326
+ <text x="86" y="374.7">20</text>
327
+ <text x="86" y="453.0">0</text>
328
+ </g>
329
+
330
+ <!-- X axis: log10 base pairs/s. x-range chosen to mirror the
331
+ matplotlib auto-padding (left_pad/right_pad in the source);
332
+ ticks drop at decade + half-decade boundaries that fall
333
+ inside the range. -->
334
+ <g class="pareto-axis pareto-axis--x">
335
+ <line x1="163.4" y1="500" x2="163.4" y2="506"/>
336
+ <line x1="263.9" y1="500" x2="263.9" y2="506"/>
337
+ <line x1="339.9" y1="500" x2="339.9" y2="506"/>
338
+ <line x1="415.9" y1="500" x2="415.9" y2="506"/>
339
+ <line x1="516.4" y1="500" x2="516.4" y2="506"/>
340
+ <line x1="592.4" y1="500" x2="592.4" y2="506"/>
341
+ <line x1="668.5" y1="500" x2="668.5" y2="506"/>
342
+ <line x1="768.9" y1="500" x2="768.9" y2="506"/>
343
+ <line x1="844.9" y1="500" x2="844.9" y2="506"/>
344
+ <line x1="920.9" y1="500" x2="920.9" y2="506"/>
345
+ <text x="163.4" y="520">200</text>
346
+ <text x="263.9" y="520">500</text>
347
+ <text x="339.9" y="520">1k</text>
348
+ <text x="415.9" y="520">2k</text>
349
+ <text x="516.4" y="520">5k</text>
350
+ <text x="592.4" y="520">10k</text>
351
+ <text x="668.5" y="520">20k</text>
352
+ <text x="768.9" y="520">50k</text>
353
+ <text x="844.9" y="520">100k</text>
354
+ <text x="920.9" y="520">200k</text>
355
+ </g>
356
+
357
+ <!-- Plot frame drawn after the axis grid so the thick black
358
+ border sits cleanly on top of the tick lines. -->
359
+ <rect class="pareto-frame" x="100" y="30" width="870" height="470"/>
360
+
361
+ <!-- Axes-of-improvement indicator: a small ⌐ of grey arrows in
362
+ the lower-left labelled "better"/"faster", same as the
363
+ matplotlib reference. Placed at the 0-winrate gridline,
364
+ just inside the y-axis. -->
365
+ <g class="pareto-indicator" transform="translate(170 450)">
366
+ <line x1="0" y1="0" x2="0" y2="-70"/>
367
+ <polygon points="0,-78 -7,-66 7,-66"/>
368
+ <text class="pareto-indicator-text" transform="translate(-14 -35) rotate(-90)">better</text>
369
+ <line x1="0" y1="0" x2="70" y2="0"/>
370
+ <polygon points="78,0 66,-7 66,7"/>
371
+ <text class="pareto-indicator-text" x="35" y="20">faster</text>
372
+ </g>
373
+
374
+ <!-- 275× speedup arrow: starts just right of the Evo2 7B label
375
+ pill and lands just left of the Carbon 3B logo. y placed
376
+ between the two points (Evo2 7B at 64.3%, Carbon 3B at
377
+ 59.5%) so it reads as level with both. -->
378
+ <g class="pareto-speedup">
379
+ <line x1="290" y1="215" x2="822" y2="215"/>
380
+ <polygon points="836,215 820,206 820,224"/>
381
+ <text class="pareto-speedup-label" x="556" y="200">275×</text>
382
+ </g>
383
+
384
+ <!-- Data points. Coordinates baked in from pareto_data.csv:
385
+ x = 100 + (log10(T) − 2.0499) / 3.4452 × 870
386
+ y = 500 − (win_rate + 12) × 3.9167
387
+ Logos sit centered on each point (32×32 for non-highlight,
388
+ 43×43 for Carbon). Labels are pinned below the logo. -->
389
+
390
+ <!-- Evo2 20B · 177.5 bp/s, 95.24% -->
391
+ <g class="pareto-point">
392
+ <image href="/img/arc.webp" x="134.3" y="64.0" width="32" height="32"/>
393
+ <text class="pareto-label" x="150.3" y="110">Evo2 20B</text>
394
+ </g>
395
+
396
+ <!-- Evo2 7B · 453.8 bp/s, 64.29% -->
397
+ <g class="pareto-point">
398
+ <image href="/img/arc.webp" x="237.3" y="185.2" width="32" height="32"/>
399
+ <text class="pareto-label" x="253.3" y="231">Evo2 7B</text>
400
+ </g>
401
+
402
+ <!-- Evo2 1B · 1342.5 bp/s, 2.38% -->
403
+ <g class="pareto-point">
404
+ <image href="/img/arc.webp" x="356.2" y="427.7" width="32" height="32"/>
405
+ <text class="pareto-label" x="372.2" y="473">Evo2 1B</text>
406
+ </g>
407
+
408
+ <!-- GENERator-v2 3B · 98494.4 bp/s, 35.71% -->
409
+ <g class="pareto-point">
410
+ <image href="/img/generator.webp" x="828.7" y="297.1" width="32" height="32"/>
411
+ <text class="pareto-label" x="844.7" y="343">GENERator-v2 3B</text>
412
+ </g>
413
+
414
+ <!-- GENERator-v2 1.2B · 123219.2 bp/s, 14.29% -->
415
+ <g class="pareto-point">
416
+ <image href="/img/generator.webp" x="853.3" y="381.0" width="32" height="32"/>
417
+ <text class="pareto-label" x="869.3" y="427">GENERator-v2 1.2B</text>
418
+ </g>
419
+
420
+ <!-- Carbon 8B · 76582.7 bp/s, 78.57% (highlighted) -->
421
+ <g class="pareto-point pareto-point--highlight">
422
+ <image href="/img/logo.svg" x="795.6" y="123.7" width="43" height="43"/>
423
+ <text class="pareto-label" x="817.1" y="180">Carbon 8B</text>
424
+ </g>
425
+
426
+ <!-- Carbon 3B · 125130.8 bp/s, 59.52% (highlighted) -->
427
+ <g class="pareto-point pareto-point--highlight">
428
+ <image href="/img/logo.svg" x="849.5" y="198.3" width="43" height="43"/>
429
+ <text class="pareto-label" x="871.0" y="255">Carbon 3B</text>
430
+ </g>
431
+
432
+ <!-- Axis titles. Y title rotated -90 along the left margin, X
433
+ title + italic "Base pairs per second" subtitle below. -->
434
+ <text class="pareto-axis-title" transform="translate(34 265) rotate(-90)">Win rate (%)</text>
435
+ <text class="pareto-axis-title" x="535" y="558">Throughput</text>
436
+ <text class="pareto-axis-subtitle" x="535" y="582">Base pairs per second</text>
437
+ </svg>
438
  <figcaption>Throughput (base pairs per second, log scale) vs win rate across open DNA foundation models. Carbon 3B matches Evo2 7B's win rate at roughly 275× the throughput.</figcaption>
439
  </figure>
440
  </div>
 
644
  <div class="section--two-col intro-subsection">
645
  <div class="section-narrative">
646
  <div class="section-num">§6 · Applications</div>
647
+ <div class="section-title">What can the model do in the real world?</div>
648
  <p class="lede">
649
  A model that understands and writes DNA is useful wherever DNA is the
650
  input or the output. There are three interesting use-cases for such
img/arc.webp ADDED
img/generator.webp ADDED