Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
3e302cb
1
Parent(s): ee5b51e
change from mermaid diagram to d3 diagram for more flexibility
Browse files
app/src/content/chapters/experiments.mdx
CHANGED
|
@@ -10,6 +10,7 @@ import FigRef from "../../components/FigRef.astro";
|
|
| 10 |
{/* TODO: Integrate decay experiment as another analysis for proxy */}
|
| 11 |
{/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
|
| 12 |
{/* TODO: brainstorm better banner, be artsy */}
|
|
|
|
| 13 |
{/* TODO: final configuration for finephrase at the end of infra section: visualization of how many pages (500 tokens) (use page emojis flying from left to right) we can generate (real time), user can configure with a slider the number of GPUs */}
|
| 14 |
{/* TODO: only explain datatrove additions when we need them (for generating the final finephrase) */}
|
| 15 |
{/* TODO: move infrastructure section after analyses as precursor and explanation for finephrase */}
|
|
|
|
| 10 |
{/* TODO: Integrate decay experiment as another analysis for proxy */}
|
| 11 |
{/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
|
| 12 |
{/* TODO: brainstorm better banner, be artsy */}
|
| 13 |
+
{/* TODO: improve the diagram for the infrastructure at the start of the section */}
|
| 14 |
{/* TODO: final configuration for finephrase at the end of infra section: visualization of how many pages (500 tokens) (use page emojis flying from left to right) we can generate (real time), user can configure with a slider the number of GPUs */}
|
| 15 |
{/* TODO: only explain datatrove additions when we need them (for generating the final finephrase) */}
|
| 16 |
{/* TODO: move infrastructure section after analyses as precursor and explanation for finephrase */}
|
app/src/content/chapters/infrastructure.mdx
CHANGED
|
@@ -16,68 +16,15 @@ We made major extensions to [DataTrove](https://github.com/huggingface/datatrove
|
|
| 16 |
|
| 17 |
In this section we show how DataTrove can be used to generate billions of tokens across several model scales, ranging from 100 million to 1 trillion parameters. <FigRef target="datatrove-pipeline" /> gives an overview of the pipeline. Let's dive in!
|
| 18 |
|
| 19 |
-
<
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
flowchart TB
|
| 25 |
-
subgraph Input["📥 Input"]
|
| 26 |
-
HF_IN["`HF Hub Dataset`"]
|
| 27 |
-
end
|
| 28 |
-
|
| 29 |
-
subgraph Pipeline["⚙️ DataTrove Pipeline"]
|
| 30 |
-
direction TB
|
| 31 |
-
READ["`**Read**
|
| 32 |
-
HuggingFaceDatasetReader`"]
|
| 33 |
-
TRANSFORM["`**Transform**
|
| 34 |
-
InferenceRunner`"]
|
| 35 |
-
WRITE["`**Write**
|
| 36 |
-
ParquetWriter`"]
|
| 37 |
-
READ --> TRANSFORM --> WRITE
|
| 38 |
-
end
|
| 39 |
-
|
| 40 |
-
subgraph Execution["🖥️ Execution Mode"]
|
| 41 |
-
direction TB
|
| 42 |
-
LOCAL["`**Local**
|
| 43 |
-
single node, multi-GPU`"]
|
| 44 |
-
SLURM["`**Slurm Cluster**
|
| 45 |
-
multi-node, auto-scaling`"]
|
| 46 |
-
end
|
| 47 |
-
|
| 48 |
-
subgraph Inference["🚀 Inference Engine"]
|
| 49 |
-
direction TB
|
| 50 |
-
ROLLOUT["`**Custom Rollout**
|
| 51 |
-
async callable`"]
|
| 52 |
-
VLLM["`**vLLM / SGLang**
|
| 53 |
-
Server`"]
|
| 54 |
-
ROLLOUT -- "generate(payload)" --> VLLM
|
| 55 |
-
end
|
| 56 |
-
|
| 57 |
-
subgraph Output["📤 Output"]
|
| 58 |
-
direction LR
|
| 59 |
-
HF_OUT["`HF Hub Dataset`"]
|
| 60 |
-
CARD["`**Dataset Card**
|
| 61 |
-
+ Metrics`"]
|
| 62 |
-
MONITOR["`Progress Monitor`"]
|
| 63 |
-
end
|
| 64 |
-
|
| 65 |
-
HF_IN --> READ
|
| 66 |
-
HF_IN ~~~ LOCAL
|
| 67 |
-
TRANSFORM --> ROLLOUT
|
| 68 |
-
Pipeline --> Execution
|
| 69 |
-
WRITE --> HF_OUT
|
| 70 |
-
WRITE --> CARD
|
| 71 |
-
WRITE --> MONITOR
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
<figcaption>Overview of the DataTrove synthetic data generation pipeline. Documents flow through a three-stage pipeline (Read, Transform, Write), with the InferenceRunner dispatching rollout functions to vLLM/SGLang. The system supports local and Slurm-based execution with automatic upload and progress monitoring.</figcaption>
|
| 75 |
-
</figure>
|
| 76 |
-
</Wide>
|
| 77 |
|
| 78 |
### Generating synthetic data at scale
|
| 79 |
|
| 80 |
-
At the core is `examples/inference/benchmark/generate_data.py`, a Typer-powered entry point that orchestrates the full synthetic data loop:
|
| 81 |
|
| 82 |
1. **Read**: pull any split/config from the Hugging Face Hub via `HuggingFaceDatasetReader`.
|
| 83 |
1. **Transform**: stream examples through `InferenceRunner`, which talks to vLLM (or another server type) and handles chunking, retries, and metric logging.
|
|
|
|
| 16 |
|
| 17 |
In this section we show how DataTrove can be used to generate billions of tokens across several model scales, ranging from 100 million to 1 trillion parameters. <FigRef target="datatrove-pipeline" /> gives an overview of the pipeline. Let's dive in!
|
| 18 |
|
| 19 |
+
<HtmlEmbed
|
| 20 |
+
id="datatrove-pipeline"
|
| 21 |
+
src="d3-pipeline.html"
|
| 22 |
+
caption="Overview of the DataTrove synthetic data generation pipeline. Documents flow through a three-stage pipeline (Read, Transform, Write), with the InferenceRunner dispatching rollout functions to vLLM/SGLang. The system supports local and Slurm-based execution with automatic upload and progress monitoring."
|
| 23 |
+
/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
### Generating synthetic data at scale
|
| 26 |
|
| 27 |
+
At the core is `examples/inference/benchmark/generate_data.py`, a [Typer](https://typer.tiangolo.com/)-powered entry point that orchestrates the full synthetic data loop:
|
| 28 |
|
| 29 |
1. **Read**: pull any split/config from the Hugging Face Hub via `HuggingFaceDatasetReader`.
|
| 30 |
1. **Transform**: stream examples through `InferenceRunner`, which talks to vLLM (or another server type) and handles chunking, retries, and metric logging.
|
app/src/content/embeds/d3-pipeline.html
ADDED
|
@@ -0,0 +1,341 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-pipeline"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-pipeline {
|
| 4 |
+
position: relative;
|
| 5 |
+
width: 100%;
|
| 6 |
+
margin: 0;
|
| 7 |
+
container-type: inline-size;
|
| 8 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
|
| 9 |
+
}
|
| 10 |
+
.d3-pipeline .node-group { cursor: default; }
|
| 11 |
+
.d3-pipeline .node-card { transition: filter .15s ease; }
|
| 12 |
+
.d3-pipeline .node-group:hover .node-card { filter: brightness(1.05); }
|
| 13 |
+
.d3-pipeline .node-title { font-weight: 700; fill: var(--text-color); }
|
| 14 |
+
.d3-pipeline .node-subtitle { fill: var(--muted-color); }
|
| 15 |
+
.d3-pipeline .group-label { font-weight: 700; fill: var(--muted-color); letter-spacing: 0.02em; }
|
| 16 |
+
.d3-pipeline .edge-path { fill: none; stroke-linecap: round; }
|
| 17 |
+
.d3-pipeline .d3-tooltip {
|
| 18 |
+
position: absolute; top: 0; left: 0;
|
| 19 |
+
transform: translate(-9999px, -9999px);
|
| 20 |
+
pointer-events: none; padding: 8px 12px; border-radius: 8px;
|
| 21 |
+
font-size: 12px; line-height: 1.4;
|
| 22 |
+
border: 1px solid var(--border-color); background: var(--surface-bg);
|
| 23 |
+
color: var(--text-color); box-shadow: 0 4px 20px rgba(0,0,0,.15);
|
| 24 |
+
opacity: 0; transition: opacity .12s ease; max-width: 260px; z-index: 100;
|
| 25 |
+
}
|
| 26 |
+
.d3-pipeline .d3-tooltip strong { display: block; margin-bottom: 2px; font-size: 13px; }
|
| 27 |
+
</style>
|
| 28 |
+
<script>
|
| 29 |
+
(() => {
|
| 30 |
+
const ensureD3 = (cb) => {
|
| 31 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 32 |
+
let s = document.getElementById('d3-cdn-script');
|
| 33 |
+
if (!s) { s = document.createElement('script'); s.id = 'd3-cdn-script'; s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js'; document.head.appendChild(s); }
|
| 34 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 35 |
+
s.addEventListener('load', onReady, { once: true });
|
| 36 |
+
if (window.d3) onReady();
|
| 37 |
+
};
|
| 38 |
+
|
| 39 |
+
const bootstrap = () => {
|
| 40 |
+
const scriptEl = document.currentScript;
|
| 41 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 42 |
+
if (!(container && container.classList && container.classList.contains('d3-pipeline'))) {
|
| 43 |
+
const cs = Array.from(document.querySelectorAll('.d3-pipeline')).filter(el => !(el.dataset && el.dataset.mounted === 'true'));
|
| 44 |
+
container = cs[cs.length - 1] || null;
|
| 45 |
+
}
|
| 46 |
+
if (!container) return;
|
| 47 |
+
if (container.dataset) { if (container.dataset.mounted === 'true') return; container.dataset.mounted = 'true'; }
|
| 48 |
+
container.style.position = container.style.position || 'relative';
|
| 49 |
+
|
| 50 |
+
const tip = document.createElement('div');
|
| 51 |
+
tip.className = 'd3-tooltip';
|
| 52 |
+
const tipInner = document.createElement('div');
|
| 53 |
+
tip.appendChild(tipInner);
|
| 54 |
+
container.appendChild(tip);
|
| 55 |
+
|
| 56 |
+
function showTip(ev, html) {
|
| 57 |
+
tipInner.innerHTML = html;
|
| 58 |
+
tip.style.opacity = '1';
|
| 59 |
+
const r = container.getBoundingClientRect();
|
| 60 |
+
const x = ev.clientX - r.left + 14, y = ev.clientY - r.top - 10;
|
| 61 |
+
tip.style.transform = `translate(${x}px, ${y}px)`;
|
| 62 |
+
}
|
| 63 |
+
function hideTip() { tip.style.opacity = '0'; tip.style.transform = 'translate(-9999px,-9999px)'; }
|
| 64 |
+
|
| 65 |
+
const svg = d3.select(container).append('svg').attr('width', '100%').style('display', 'block');
|
| 66 |
+
const defs = svg.append('defs');
|
| 67 |
+
defs.append('marker').attr('id', 'pl-arrow').attr('viewBox', '0 0 10 8')
|
| 68 |
+
.attr('refX', 9).attr('refY', 4).attr('markerWidth', 7).attr('markerHeight', 5.5)
|
| 69 |
+
.attr('orient', 'auto').append('path').attr('d', 'M0,1 L8,4 L0,7 Z');
|
| 70 |
+
|
| 71 |
+
const gRoot = svg.append('g');
|
| 72 |
+
const gGroups = gRoot.append('g');
|
| 73 |
+
const gEdges = gRoot.append('g');
|
| 74 |
+
const gNodes = gRoot.append('g');
|
| 75 |
+
|
| 76 |
+
const nodes = [
|
| 77 |
+
{ id: 'hf_in', label: 'HF Hub Dataset', sub: '', group: 'input', tip: 'Source dataset from the Hugging Face Hub. Any split or config.' },
|
| 78 |
+
{ id: 'read', label: 'Read', sub: 'HuggingFaceDatasetReader', group: 'pipeline', tip: 'Reads documents from the Hub and streams them into the pipeline.' },
|
| 79 |
+
{ id: 'transform', label: 'Transform', sub: 'InferenceRunner', group: 'pipeline', tip: 'Orchestrates LLM inference: batching, retries, metric logging.' },
|
| 80 |
+
{ id: 'write', label: 'Write', sub: 'ParquetWriter', group: 'pipeline', tip: 'Writes generated outputs as Parquet files with checkpointing.' },
|
| 81 |
+
{ id: 'local', label: 'Local', sub: 'single node, multi-GPU', group: 'execution', tip: 'Run on a single machine with multiple workers for development.' },
|
| 82 |
+
{ id: 'slurm', label: 'Slurm Cluster', sub: 'multi-node, auto-scaling', group: 'execution', tip: 'Distribute across nodes for large-scale production workloads.' },
|
| 83 |
+
{ id: 'rollout', label: 'Custom Rollout', sub: 'async callable', group: 'inference', tip: 'Your rollout function: orchestrates one or many generate() calls.' },
|
| 84 |
+
{ id: 'vllm', label: 'vLLM / SGLang', sub: 'Server', group: 'inference', tip: 'High-throughput inference engine with prefix caching and batching.' },
|
| 85 |
+
{ id: 'hf_out', label: 'HF Hub Dataset', sub: '', group: 'output', tip: 'Generated dataset uploaded continuously to the Hugging Face Hub.' },
|
| 86 |
+
{ id: 'card', label: 'Dataset Card', sub: '+ Metrics', group: 'output', tip: 'Auto-generated dataset card with throughput stats.' },
|
| 87 |
+
{ id: 'monitor', label: 'Progress Monitor', sub: '', group: 'output', tip: 'Live progress bar and ETA on the dataset card during inference.' },
|
| 88 |
+
];
|
| 89 |
+
|
| 90 |
+
const groups = [
|
| 91 |
+
{ id: 'input', label: 'Input', icon: '📥' },
|
| 92 |
+
{ id: 'pipeline', label: 'DataTrove Pipeline', icon: '⚙️' },
|
| 93 |
+
{ id: 'execution', label: 'Execution Mode', icon: '🖥️' },
|
| 94 |
+
{ id: 'inference', label: 'Inference Engine', icon: '🚀' },
|
| 95 |
+
{ id: 'output', label: 'Output', icon: '📤' },
|
| 96 |
+
];
|
| 97 |
+
|
| 98 |
+
const edges = [
|
| 99 |
+
{ from: 'hf_in', to: 'read' },
|
| 100 |
+
{ from: 'read', to: 'transform' },
|
| 101 |
+
{ from: 'transform', to: 'write' },
|
| 102 |
+
{ from: 'transform', to: 'rollout' },
|
| 103 |
+
{ from: 'rollout', to: 'vllm' },
|
| 104 |
+
{ from: 'write', to: 'hf_out' },
|
| 105 |
+
{ from: 'write', to: 'card' },
|
| 106 |
+
{ from: 'write', to: 'monitor' },
|
| 107 |
+
];
|
| 108 |
+
|
| 109 |
+
function isDark() { return document.documentElement.getAttribute('data-theme') === 'dark'; }
|
| 110 |
+
|
| 111 |
+
function colors() {
|
| 112 |
+
const dk = isDark();
|
| 113 |
+
const primary = window.ColorPalettes ? window.ColorPalettes.getPrimary() : (dk ? '#7c6ff7' : '#6366f1');
|
| 114 |
+
return {
|
| 115 |
+
nodeBg: dk ? 'rgba(255,255,255,0.055)' : 'rgba(255,255,255,0.92)',
|
| 116 |
+
nodeBd: dk ? 'rgba(255,255,255,0.10)' : 'rgba(0,0,0,0.09)',
|
| 117 |
+
groupBg: dk ? 'rgba(255,255,255,0.025)' : 'rgba(0,0,0,0.022)',
|
| 118 |
+
groupBd: dk ? 'rgba(255,255,255,0.07)' : 'rgba(0,0,0,0.055)',
|
| 119 |
+
pipeBg: dk ? 'rgba(99,102,241,0.055)' : 'rgba(99,102,241,0.04)',
|
| 120 |
+
pipeBd: dk ? 'rgba(99,102,241,0.14)' : 'rgba(99,102,241,0.11)',
|
| 121 |
+
edge: dk ? 'rgba(255,255,255,0.22)' : 'rgba(0,0,0,0.18)',
|
| 122 |
+
arrow: dk ? 'rgba(255,255,255,0.30)' : 'rgba(0,0,0,0.25)',
|
| 123 |
+
primary,
|
| 124 |
+
};
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
// Compute layout positions for a given container width
|
| 128 |
+
function computeLayout() {
|
| 129 |
+
const W = container.clientWidth || 820;
|
| 130 |
+
const s = Math.min(1, W / 820);
|
| 131 |
+
|
| 132 |
+
const nw = Math.round(166 * s), nh = Math.round(48 * s);
|
| 133 |
+
const nr = Math.round(10 * s);
|
| 134 |
+
const gp = Math.round(10 * s); // group padding
|
| 135 |
+
const gr = Math.round(10 * s); // group corner radius
|
| 136 |
+
const glh = Math.round(22 * s); // group label height
|
| 137 |
+
const ng = Math.round(8 * s); // node gap within group
|
| 138 |
+
const cg = Math.round(28 * s); // column gap
|
| 139 |
+
const rg = Math.round(16 * s); // row gap between groups
|
| 140 |
+
|
| 141 |
+
// Three columns: left (exec + inference), center (input + pipeline), right (output)
|
| 142 |
+
const leftW = nw + gp * 2;
|
| 143 |
+
const centerW = nw + gp * 2;
|
| 144 |
+
const rightW = nw + gp * 2;
|
| 145 |
+
const totalW = leftW + centerW + rightW + cg * 2;
|
| 146 |
+
const offsetX = Math.max(0, (W - totalW) / 2);
|
| 147 |
+
|
| 148 |
+
const leftX = offsetX;
|
| 149 |
+
const centerX = offsetX + leftW + cg;
|
| 150 |
+
const rightX = offsetX + leftW + cg + centerW + cg;
|
| 151 |
+
|
| 152 |
+
// -- Center column: Input (1 node) + Pipeline (3 nodes)
|
| 153 |
+
let y = Math.round(6 * s);
|
| 154 |
+
const inputNode = nodes.find(n => n.id === 'hf_in');
|
| 155 |
+
inputNode._x = centerX + gp; inputNode._y = y + glh + gp;
|
| 156 |
+
inputNode._w = nw; inputNode._h = nh; inputNode._r = nr;
|
| 157 |
+
const inputGroup = groups.find(g => g.id === 'input');
|
| 158 |
+
inputGroup._x = centerX; inputGroup._y = y;
|
| 159 |
+
inputGroup._w = centerW; inputGroup._h = glh + gp * 2 + nh; inputGroup._r = gr;
|
| 160 |
+
|
| 161 |
+
y += inputGroup._h + rg;
|
| 162 |
+
const pipeTop = y;
|
| 163 |
+
const pipeNodes = ['read', 'transform', 'write'].map(id => nodes.find(n => n.id === id));
|
| 164 |
+
pipeNodes.forEach((n, i) => {
|
| 165 |
+
n._x = centerX + gp;
|
| 166 |
+
n._y = pipeTop + glh + gp + i * (nh + ng);
|
| 167 |
+
n._w = nw; n._h = nh; n._r = nr;
|
| 168 |
+
});
|
| 169 |
+
const pipeH = glh + gp * 2 + 3 * nh + 2 * ng;
|
| 170 |
+
const pipeGroup = groups.find(g => g.id === 'pipeline');
|
| 171 |
+
pipeGroup._x = centerX; pipeGroup._y = pipeTop;
|
| 172 |
+
pipeGroup._w = centerW; pipeGroup._h = pipeH; pipeGroup._r = gr;
|
| 173 |
+
|
| 174 |
+
// -- Left column: Execution + Inference
|
| 175 |
+
// Vertically center the left column with the pipeline
|
| 176 |
+
const execNodes = ['local', 'slurm'].map(id => nodes.find(n => n.id === id));
|
| 177 |
+
const execH = glh + gp * 2 + execNodes.length * nh + (execNodes.length - 1) * ng;
|
| 178 |
+
const inferNodes = ['rollout', 'vllm'].map(id => nodes.find(n => n.id === id));
|
| 179 |
+
const inferH = glh + gp * 2 + inferNodes.length * nh + (inferNodes.length - 1) * ng;
|
| 180 |
+
const leftTotalH = execH + rg + inferH;
|
| 181 |
+
const leftCenterY = pipeTop + pipeH / 2;
|
| 182 |
+
const leftTop = Math.max(pipeTop, leftCenterY - leftTotalH / 2);
|
| 183 |
+
|
| 184 |
+
const execTop = leftTop;
|
| 185 |
+
execNodes.forEach((n, i) => {
|
| 186 |
+
n._x = leftX + gp; n._y = execTop + glh + gp + i * (nh + ng);
|
| 187 |
+
n._w = nw; n._h = nh; n._r = nr;
|
| 188 |
+
});
|
| 189 |
+
const execGroup = groups.find(g => g.id === 'execution');
|
| 190 |
+
execGroup._x = leftX; execGroup._y = execTop;
|
| 191 |
+
execGroup._w = leftW; execGroup._h = execH; execGroup._r = gr;
|
| 192 |
+
|
| 193 |
+
const inferTop = execTop + execH + rg;
|
| 194 |
+
inferNodes.forEach((n, i) => {
|
| 195 |
+
n._x = leftX + gp; n._y = inferTop + glh + gp + i * (nh + ng);
|
| 196 |
+
n._w = nw; n._h = nh; n._r = nr;
|
| 197 |
+
});
|
| 198 |
+
const inferGroup = groups.find(g => g.id === 'inference');
|
| 199 |
+
inferGroup._x = leftX; inferGroup._y = inferTop;
|
| 200 |
+
inferGroup._w = leftW; inferGroup._h = inferH; inferGroup._r = gr;
|
| 201 |
+
|
| 202 |
+
// -- Right column: Output (vertically centered with pipeline)
|
| 203 |
+
const outNodes = ['hf_out', 'card', 'monitor'].map(id => nodes.find(n => n.id === id));
|
| 204 |
+
const outH = glh + gp * 2 + outNodes.length * nh + (outNodes.length - 1) * ng;
|
| 205 |
+
const outTop = pipeTop + (pipeH - outH) / 2;
|
| 206 |
+
outNodes.forEach((n, i) => {
|
| 207 |
+
n._x = rightX + gp; n._y = outTop + glh + gp + i * (nh + ng);
|
| 208 |
+
n._w = nw; n._h = nh; n._r = nr;
|
| 209 |
+
});
|
| 210 |
+
const outGroup = groups.find(g => g.id === 'output');
|
| 211 |
+
outGroup._x = rightX; outGroup._y = outTop;
|
| 212 |
+
outGroup._w = rightW; outGroup._h = outH; outGroup._r = gr;
|
| 213 |
+
|
| 214 |
+
const maxY = Math.max(
|
| 215 |
+
...nodes.map(n => n._y + n._h + gp),
|
| 216 |
+
...groups.map(g => g._y + g._h)
|
| 217 |
+
);
|
| 218 |
+
svg.attr('height', maxY + Math.round(8 * s));
|
| 219 |
+
|
| 220 |
+
return s;
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
function anchor(n, side) {
|
| 224 |
+
if (side === 'top') return { x: n._x + n._w / 2, y: n._y };
|
| 225 |
+
if (side === 'bottom') return { x: n._x + n._w / 2, y: n._y + n._h };
|
| 226 |
+
if (side === 'left') return { x: n._x, y: n._y + n._h / 2 };
|
| 227 |
+
if (side === 'right') return { x: n._x + n._w, y: n._y + n._h / 2 };
|
| 228 |
+
}
|
| 229 |
+
|
| 230 |
+
function bezier(a, b, orient) {
|
| 231 |
+
if (orient === 'v') {
|
| 232 |
+
const d = (b.y - a.y) * 0.45;
|
| 233 |
+
return `M${a.x},${a.y} C${a.x},${a.y + d} ${b.x},${b.y - d} ${b.x},${b.y}`;
|
| 234 |
+
}
|
| 235 |
+
const d = (b.x - a.x) * 0.4;
|
| 236 |
+
return `M${a.x},${a.y} C${a.x + d},${a.y} ${b.x - d},${b.y} ${b.x},${b.y}`;
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
function edgePath(e) {
|
| 240 |
+
const f = nodes.find(n => n.id === e.from);
|
| 241 |
+
const t = nodes.find(n => n.id === e.to);
|
| 242 |
+
if (!f || !t) return '';
|
| 243 |
+
|
| 244 |
+
// Explicit routing for each edge
|
| 245 |
+
if (e.from === 'hf_in' && e.to === 'read') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
|
| 246 |
+
if (e.from === 'read' && e.to === 'transform') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
|
| 247 |
+
if (e.from === 'transform' && e.to === 'write') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
|
| 248 |
+
if (e.from === 'transform' && e.to === 'rollout') return bezier(anchor(f, 'left'), anchor(t, 'right'), 'h');
|
| 249 |
+
if (e.from === 'rollout' && e.to === 'vllm') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
|
| 250 |
+
if (e.from === 'write' && e.to === 'hf_out') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
|
| 251 |
+
if (e.from === 'write' && e.to === 'card') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
|
| 252 |
+
if (e.from === 'write' && e.to === 'monitor') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
|
| 253 |
+
|
| 254 |
+
// Fallback
|
| 255 |
+
return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
|
| 256 |
+
}
|
| 257 |
+
|
| 258 |
+
function render() {
|
| 259 |
+
const s = computeLayout();
|
| 260 |
+
const c = colors();
|
| 261 |
+
|
| 262 |
+
const fs = Math.max(11, Math.round(13 * s));
|
| 263 |
+
const fsSub = Math.max(10, Math.round(11 * s));
|
| 264 |
+
const fsGrp = Math.max(10, Math.round(11 * s));
|
| 265 |
+
const fsIcon = Math.max(12, Math.round(14 * s));
|
| 266 |
+
|
| 267 |
+
defs.select('#pl-arrow path').attr('fill', c.arrow);
|
| 268 |
+
|
| 269 |
+
// Groups
|
| 270 |
+
const gSel = gGroups.selectAll('g.grp').data(groups, d => d.id);
|
| 271 |
+
const gE = gSel.enter().append('g').attr('class', 'grp');
|
| 272 |
+
gE.append('rect');
|
| 273 |
+
gE.append('text').attr('class', 'grp-icon');
|
| 274 |
+
gE.append('text').attr('class', 'group-label');
|
| 275 |
+
const gM = gE.merge(gSel);
|
| 276 |
+
gM.select('rect')
|
| 277 |
+
.attr('x', d => d._x).attr('y', d => d._y)
|
| 278 |
+
.attr('width', d => d._w).attr('height', d => d._h)
|
| 279 |
+
.attr('rx', d => d._r).attr('ry', d => d._r)
|
| 280 |
+
.attr('fill', d => d.id === 'pipeline' ? c.pipeBg : c.groupBg)
|
| 281 |
+
.attr('stroke', d => d.id === 'pipeline' ? c.pipeBd : c.groupBd)
|
| 282 |
+
.attr('stroke-width', 1);
|
| 283 |
+
gM.select('.grp-icon')
|
| 284 |
+
.attr('x', d => d._x + Math.round(10 * s))
|
| 285 |
+
.attr('y', d => d._y + Math.round(19 * s))
|
| 286 |
+
.style('font-size', fsIcon + 'px')
|
| 287 |
+
.text(d => d.icon);
|
| 288 |
+
gM.select('.group-label')
|
| 289 |
+
.attr('x', d => d._x + Math.round(10 * s) + fsIcon + Math.round(3 * s))
|
| 290 |
+
.attr('y', d => d._y + Math.round(19 * s))
|
| 291 |
+
.style('font-size', fsGrp + 'px')
|
| 292 |
+
.text(d => d.label);
|
| 293 |
+
gSel.exit().remove();
|
| 294 |
+
|
| 295 |
+
// Edges
|
| 296 |
+
const eSel = gEdges.selectAll('path.edge-path').data(edges, d => d.from + d.to);
|
| 297 |
+
eSel.enter().append('path').attr('class', 'edge-path')
|
| 298 |
+
.attr('marker-end', 'url(#pl-arrow)')
|
| 299 |
+
.merge(eSel)
|
| 300 |
+
.attr('d', edgePath)
|
| 301 |
+
.attr('stroke', c.edge)
|
| 302 |
+
.attr('stroke-width', Math.max(1.5, 1.8 * s));
|
| 303 |
+
eSel.exit().remove();
|
| 304 |
+
|
| 305 |
+
// Nodes
|
| 306 |
+
const nSel = gNodes.selectAll('g.node-group').data(nodes, d => d.id);
|
| 307 |
+
const nE = nSel.enter().append('g').attr('class', 'node-group');
|
| 308 |
+
nE.append('rect').attr('class', 'node-card');
|
| 309 |
+
nE.append('text').attr('class', 'node-title');
|
| 310 |
+
nE.append('text').attr('class', 'node-subtitle');
|
| 311 |
+
const nM = nE.merge(nSel);
|
| 312 |
+
nM.attr('transform', d => `translate(${d._x},${d._y})`);
|
| 313 |
+
nM.select('.node-card')
|
| 314 |
+
.attr('width', d => d._w).attr('height', d => d._h)
|
| 315 |
+
.attr('rx', d => d._r).attr('ry', d => d._r)
|
| 316 |
+
.attr('fill', c.nodeBg).attr('stroke', c.nodeBd).attr('stroke-width', 1);
|
| 317 |
+
nM.select('.node-title')
|
| 318 |
+
.attr('x', d => d._w / 2).attr('y', d => d.sub ? d._h * 0.38 : d._h / 2)
|
| 319 |
+
.attr('text-anchor', 'middle').attr('dominant-baseline', 'middle')
|
| 320 |
+
.style('font-size', fs + 'px').text(d => d.label);
|
| 321 |
+
nM.select('.node-subtitle')
|
| 322 |
+
.attr('x', d => d._w / 2).attr('y', d => d._h * 0.68)
|
| 323 |
+
.attr('text-anchor', 'middle').attr('dominant-baseline', 'middle')
|
| 324 |
+
.style('font-size', fsSub + 'px').text(d => d.sub || '');
|
| 325 |
+
nM.on('mouseenter', (ev, d) => { if (d.tip) showTip(ev, `<strong>${d.label}</strong>${d.tip}`); })
|
| 326 |
+
.on('mousemove', (ev, d) => { if (d.tip) showTip(ev, `<strong>${d.label}</strong>${d.tip}`); })
|
| 327 |
+
.on('mouseleave', hideTip);
|
| 328 |
+
nSel.exit().remove();
|
| 329 |
+
}
|
| 330 |
+
|
| 331 |
+
render();
|
| 332 |
+
if (window.ResizeObserver) { new ResizeObserver(() => render()).observe(container); }
|
| 333 |
+
else { window.addEventListener('resize', render); }
|
| 334 |
+
new MutationObserver(() => render()).observe(document.documentElement, { attributes: true, attributeFilter: ['data-theme'] });
|
| 335 |
+
};
|
| 336 |
+
|
| 337 |
+
if (document.readyState === 'loading') {
|
| 338 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 339 |
+
} else { ensureD3(bootstrap); }
|
| 340 |
+
})();
|
| 341 |
+
</script>
|