joelniklaus HF Staff commited on
Commit
3e302cb
·
1 Parent(s): ee5b51e

change from mermaid diagram to d3 diagram for more flexibility

Browse files
app/src/content/chapters/experiments.mdx CHANGED
@@ -10,6 +10,7 @@ import FigRef from "../../components/FigRef.astro";
10
  {/* TODO: Integrate decay experiment as another analysis for proxy */}
11
  {/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
12
  {/* TODO: brainstorm better banner, be artsy */}
 
13
  {/* TODO: final configuration for finephrase at the end of infra section: visualization of how many pages (500 tokens) (use page emojis flying from left to right) we can generate (real time), user can configure with a slider the number of GPUs */}
14
  {/* TODO: only explain datatrove additions when we need them (for generating the final finephrase) */}
15
  {/* TODO: move infrastructure section after analyses as precursor and explanation for finephrase */}
 
10
  {/* TODO: Integrate decay experiment as another analysis for proxy */}
11
  {/* TODO: share on a bunch of discords/slacks/hackernews/locallama */}
12
  {/* TODO: brainstorm better banner, be artsy */}
13
+ {/* TODO: improve the diagram for the infrastructure at the start of the section */}
14
  {/* TODO: final configuration for finephrase at the end of infra section: visualization of how many pages (500 tokens) (use page emojis flying from left to right) we can generate (real time), user can configure with a slider the number of GPUs */}
15
  {/* TODO: only explain datatrove additions when we need them (for generating the final finephrase) */}
16
  {/* TODO: move infrastructure section after analyses as precursor and explanation for finephrase */}
app/src/content/chapters/infrastructure.mdx CHANGED
@@ -16,68 +16,15 @@ We made major extensions to [DataTrove](https://github.com/huggingface/datatrove
16
 
17
  In this section we show how DataTrove can be used to generate billions of tokens across several model scales, ranging from 100 million to 1 trillion parameters. <FigRef target="datatrove-pipeline" /> gives an overview of the pipeline. Let's dive in!
18
 
19
- <Wide>
20
- <figure id="datatrove-pipeline">
21
-
22
- ```mermaid
23
- %%{init: {"flowchart": {"diagramPadding": 12, "padding": 12, "nodeSpacing": 22, "rankSpacing": 52, "subGraphTitleMargin": {"top": 8, "bottom": 24}}, "themeVariables": {"fontSize": "18px"}} }%%
24
- flowchart TB
25
- subgraph Input["📥 Input"]
26
- HF_IN["`HF Hub Dataset`"]
27
- end
28
-
29
- subgraph Pipeline["⚙️ DataTrove Pipeline"]
30
- direction TB
31
- READ["`**Read**
32
- HuggingFaceDatasetReader`"]
33
- TRANSFORM["`**Transform**
34
- InferenceRunner`"]
35
- WRITE["`**Write**
36
- ParquetWriter`"]
37
- READ --> TRANSFORM --> WRITE
38
- end
39
-
40
- subgraph Execution["🖥️ Execution Mode"]
41
- direction TB
42
- LOCAL["`**Local**
43
- single node, multi-GPU`"]
44
- SLURM["`**Slurm Cluster**
45
- multi-node, auto-scaling`"]
46
- end
47
-
48
- subgraph Inference["🚀 Inference Engine"]
49
- direction TB
50
- ROLLOUT["`**Custom Rollout**
51
- async callable`"]
52
- VLLM["`**vLLM / SGLang**
53
- Server`"]
54
- ROLLOUT -- "generate(payload)" --> VLLM
55
- end
56
-
57
- subgraph Output["📤 Output"]
58
- direction LR
59
- HF_OUT["`HF Hub Dataset`"]
60
- CARD["`**Dataset Card**
61
- + Metrics`"]
62
- MONITOR["`Progress Monitor`"]
63
- end
64
-
65
- HF_IN --> READ
66
- HF_IN ~~~ LOCAL
67
- TRANSFORM --> ROLLOUT
68
- Pipeline --> Execution
69
- WRITE --> HF_OUT
70
- WRITE --> CARD
71
- WRITE --> MONITOR
72
- ```
73
-
74
- <figcaption>Overview of the DataTrove synthetic data generation pipeline. Documents flow through a three-stage pipeline (Read, Transform, Write), with the InferenceRunner dispatching rollout functions to vLLM/SGLang. The system supports local and Slurm-based execution with automatic upload and progress monitoring.</figcaption>
75
- </figure>
76
- </Wide>
77
 
78
  ### Generating synthetic data at scale
79
 
80
- At the core is `examples/inference/benchmark/generate_data.py`, a Typer-powered entry point that orchestrates the full synthetic data loop:
81
 
82
  1. **Read**: pull any split/config from the Hugging Face Hub via `HuggingFaceDatasetReader`.
83
  1. **Transform**: stream examples through `InferenceRunner`, which talks to vLLM (or another server type) and handles chunking, retries, and metric logging.
 
16
 
17
  In this section we show how DataTrove can be used to generate billions of tokens across several model scales, ranging from 100 million to 1 trillion parameters. <FigRef target="datatrove-pipeline" /> gives an overview of the pipeline. Let's dive in!
18
 
19
+ <HtmlEmbed
20
+ id="datatrove-pipeline"
21
+ src="d3-pipeline.html"
22
+ caption="Overview of the DataTrove synthetic data generation pipeline. Documents flow through a three-stage pipeline (Read, Transform, Write), with the InferenceRunner dispatching rollout functions to vLLM/SGLang. The system supports local and Slurm-based execution with automatic upload and progress monitoring."
23
+ />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Generating synthetic data at scale
26
 
27
+ At the core is `examples/inference/benchmark/generate_data.py`, a [Typer](https://typer.tiangolo.com/)-powered entry point that orchestrates the full synthetic data loop:
28
 
29
  1. **Read**: pull any split/config from the Hugging Face Hub via `HuggingFaceDatasetReader`.
30
  1. **Transform**: stream examples through `InferenceRunner`, which talks to vLLM (or another server type) and handles chunking, retries, and metric logging.
app/src/content/embeds/d3-pipeline.html ADDED
@@ -0,0 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-pipeline"></div>
2
+ <style>
3
+ .d3-pipeline {
4
+ position: relative;
5
+ width: 100%;
6
+ margin: 0;
7
+ container-type: inline-size;
8
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
9
+ }
10
+ .d3-pipeline .node-group { cursor: default; }
11
+ .d3-pipeline .node-card { transition: filter .15s ease; }
12
+ .d3-pipeline .node-group:hover .node-card { filter: brightness(1.05); }
13
+ .d3-pipeline .node-title { font-weight: 700; fill: var(--text-color); }
14
+ .d3-pipeline .node-subtitle { fill: var(--muted-color); }
15
+ .d3-pipeline .group-label { font-weight: 700; fill: var(--muted-color); letter-spacing: 0.02em; }
16
+ .d3-pipeline .edge-path { fill: none; stroke-linecap: round; }
17
+ .d3-pipeline .d3-tooltip {
18
+ position: absolute; top: 0; left: 0;
19
+ transform: translate(-9999px, -9999px);
20
+ pointer-events: none; padding: 8px 12px; border-radius: 8px;
21
+ font-size: 12px; line-height: 1.4;
22
+ border: 1px solid var(--border-color); background: var(--surface-bg);
23
+ color: var(--text-color); box-shadow: 0 4px 20px rgba(0,0,0,.15);
24
+ opacity: 0; transition: opacity .12s ease; max-width: 260px; z-index: 100;
25
+ }
26
+ .d3-pipeline .d3-tooltip strong { display: block; margin-bottom: 2px; font-size: 13px; }
27
+ </style>
28
+ <script>
29
+ (() => {
30
+ const ensureD3 = (cb) => {
31
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
32
+ let s = document.getElementById('d3-cdn-script');
33
+ if (!s) { s = document.createElement('script'); s.id = 'd3-cdn-script'; s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js'; document.head.appendChild(s); }
34
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
35
+ s.addEventListener('load', onReady, { once: true });
36
+ if (window.d3) onReady();
37
+ };
38
+
39
+ const bootstrap = () => {
40
+ const scriptEl = document.currentScript;
41
+ let container = scriptEl ? scriptEl.previousElementSibling : null;
42
+ if (!(container && container.classList && container.classList.contains('d3-pipeline'))) {
43
+ const cs = Array.from(document.querySelectorAll('.d3-pipeline')).filter(el => !(el.dataset && el.dataset.mounted === 'true'));
44
+ container = cs[cs.length - 1] || null;
45
+ }
46
+ if (!container) return;
47
+ if (container.dataset) { if (container.dataset.mounted === 'true') return; container.dataset.mounted = 'true'; }
48
+ container.style.position = container.style.position || 'relative';
49
+
50
+ const tip = document.createElement('div');
51
+ tip.className = 'd3-tooltip';
52
+ const tipInner = document.createElement('div');
53
+ tip.appendChild(tipInner);
54
+ container.appendChild(tip);
55
+
56
+ function showTip(ev, html) {
57
+ tipInner.innerHTML = html;
58
+ tip.style.opacity = '1';
59
+ const r = container.getBoundingClientRect();
60
+ const x = ev.clientX - r.left + 14, y = ev.clientY - r.top - 10;
61
+ tip.style.transform = `translate(${x}px, ${y}px)`;
62
+ }
63
+ function hideTip() { tip.style.opacity = '0'; tip.style.transform = 'translate(-9999px,-9999px)'; }
64
+
65
+ const svg = d3.select(container).append('svg').attr('width', '100%').style('display', 'block');
66
+ const defs = svg.append('defs');
67
+ defs.append('marker').attr('id', 'pl-arrow').attr('viewBox', '0 0 10 8')
68
+ .attr('refX', 9).attr('refY', 4).attr('markerWidth', 7).attr('markerHeight', 5.5)
69
+ .attr('orient', 'auto').append('path').attr('d', 'M0,1 L8,4 L0,7 Z');
70
+
71
+ const gRoot = svg.append('g');
72
+ const gGroups = gRoot.append('g');
73
+ const gEdges = gRoot.append('g');
74
+ const gNodes = gRoot.append('g');
75
+
76
+ const nodes = [
77
+ { id: 'hf_in', label: 'HF Hub Dataset', sub: '', group: 'input', tip: 'Source dataset from the Hugging Face Hub. Any split or config.' },
78
+ { id: 'read', label: 'Read', sub: 'HuggingFaceDatasetReader', group: 'pipeline', tip: 'Reads documents from the Hub and streams them into the pipeline.' },
79
+ { id: 'transform', label: 'Transform', sub: 'InferenceRunner', group: 'pipeline', tip: 'Orchestrates LLM inference: batching, retries, metric logging.' },
80
+ { id: 'write', label: 'Write', sub: 'ParquetWriter', group: 'pipeline', tip: 'Writes generated outputs as Parquet files with checkpointing.' },
81
+ { id: 'local', label: 'Local', sub: 'single node, multi-GPU', group: 'execution', tip: 'Run on a single machine with multiple workers for development.' },
82
+ { id: 'slurm', label: 'Slurm Cluster', sub: 'multi-node, auto-scaling', group: 'execution', tip: 'Distribute across nodes for large-scale production workloads.' },
83
+ { id: 'rollout', label: 'Custom Rollout', sub: 'async callable', group: 'inference', tip: 'Your rollout function: orchestrates one or many generate() calls.' },
84
+ { id: 'vllm', label: 'vLLM / SGLang', sub: 'Server', group: 'inference', tip: 'High-throughput inference engine with prefix caching and batching.' },
85
+ { id: 'hf_out', label: 'HF Hub Dataset', sub: '', group: 'output', tip: 'Generated dataset uploaded continuously to the Hugging Face Hub.' },
86
+ { id: 'card', label: 'Dataset Card', sub: '+ Metrics', group: 'output', tip: 'Auto-generated dataset card with throughput stats.' },
87
+ { id: 'monitor', label: 'Progress Monitor', sub: '', group: 'output', tip: 'Live progress bar and ETA on the dataset card during inference.' },
88
+ ];
89
+
90
+ const groups = [
91
+ { id: 'input', label: 'Input', icon: '📥' },
92
+ { id: 'pipeline', label: 'DataTrove Pipeline', icon: '⚙️' },
93
+ { id: 'execution', label: 'Execution Mode', icon: '🖥️' },
94
+ { id: 'inference', label: 'Inference Engine', icon: '🚀' },
95
+ { id: 'output', label: 'Output', icon: '📤' },
96
+ ];
97
+
98
+ const edges = [
99
+ { from: 'hf_in', to: 'read' },
100
+ { from: 'read', to: 'transform' },
101
+ { from: 'transform', to: 'write' },
102
+ { from: 'transform', to: 'rollout' },
103
+ { from: 'rollout', to: 'vllm' },
104
+ { from: 'write', to: 'hf_out' },
105
+ { from: 'write', to: 'card' },
106
+ { from: 'write', to: 'monitor' },
107
+ ];
108
+
109
+ function isDark() { return document.documentElement.getAttribute('data-theme') === 'dark'; }
110
+
111
+ function colors() {
112
+ const dk = isDark();
113
+ const primary = window.ColorPalettes ? window.ColorPalettes.getPrimary() : (dk ? '#7c6ff7' : '#6366f1');
114
+ return {
115
+ nodeBg: dk ? 'rgba(255,255,255,0.055)' : 'rgba(255,255,255,0.92)',
116
+ nodeBd: dk ? 'rgba(255,255,255,0.10)' : 'rgba(0,0,0,0.09)',
117
+ groupBg: dk ? 'rgba(255,255,255,0.025)' : 'rgba(0,0,0,0.022)',
118
+ groupBd: dk ? 'rgba(255,255,255,0.07)' : 'rgba(0,0,0,0.055)',
119
+ pipeBg: dk ? 'rgba(99,102,241,0.055)' : 'rgba(99,102,241,0.04)',
120
+ pipeBd: dk ? 'rgba(99,102,241,0.14)' : 'rgba(99,102,241,0.11)',
121
+ edge: dk ? 'rgba(255,255,255,0.22)' : 'rgba(0,0,0,0.18)',
122
+ arrow: dk ? 'rgba(255,255,255,0.30)' : 'rgba(0,0,0,0.25)',
123
+ primary,
124
+ };
125
+ }
126
+
127
+ // Compute layout positions for a given container width
128
+ function computeLayout() {
129
+ const W = container.clientWidth || 820;
130
+ const s = Math.min(1, W / 820);
131
+
132
+ const nw = Math.round(166 * s), nh = Math.round(48 * s);
133
+ const nr = Math.round(10 * s);
134
+ const gp = Math.round(10 * s); // group padding
135
+ const gr = Math.round(10 * s); // group corner radius
136
+ const glh = Math.round(22 * s); // group label height
137
+ const ng = Math.round(8 * s); // node gap within group
138
+ const cg = Math.round(28 * s); // column gap
139
+ const rg = Math.round(16 * s); // row gap between groups
140
+
141
+ // Three columns: left (exec + inference), center (input + pipeline), right (output)
142
+ const leftW = nw + gp * 2;
143
+ const centerW = nw + gp * 2;
144
+ const rightW = nw + gp * 2;
145
+ const totalW = leftW + centerW + rightW + cg * 2;
146
+ const offsetX = Math.max(0, (W - totalW) / 2);
147
+
148
+ const leftX = offsetX;
149
+ const centerX = offsetX + leftW + cg;
150
+ const rightX = offsetX + leftW + cg + centerW + cg;
151
+
152
+ // -- Center column: Input (1 node) + Pipeline (3 nodes)
153
+ let y = Math.round(6 * s);
154
+ const inputNode = nodes.find(n => n.id === 'hf_in');
155
+ inputNode._x = centerX + gp; inputNode._y = y + glh + gp;
156
+ inputNode._w = nw; inputNode._h = nh; inputNode._r = nr;
157
+ const inputGroup = groups.find(g => g.id === 'input');
158
+ inputGroup._x = centerX; inputGroup._y = y;
159
+ inputGroup._w = centerW; inputGroup._h = glh + gp * 2 + nh; inputGroup._r = gr;
160
+
161
+ y += inputGroup._h + rg;
162
+ const pipeTop = y;
163
+ const pipeNodes = ['read', 'transform', 'write'].map(id => nodes.find(n => n.id === id));
164
+ pipeNodes.forEach((n, i) => {
165
+ n._x = centerX + gp;
166
+ n._y = pipeTop + glh + gp + i * (nh + ng);
167
+ n._w = nw; n._h = nh; n._r = nr;
168
+ });
169
+ const pipeH = glh + gp * 2 + 3 * nh + 2 * ng;
170
+ const pipeGroup = groups.find(g => g.id === 'pipeline');
171
+ pipeGroup._x = centerX; pipeGroup._y = pipeTop;
172
+ pipeGroup._w = centerW; pipeGroup._h = pipeH; pipeGroup._r = gr;
173
+
174
+ // -- Left column: Execution + Inference
175
+ // Vertically center the left column with the pipeline
176
+ const execNodes = ['local', 'slurm'].map(id => nodes.find(n => n.id === id));
177
+ const execH = glh + gp * 2 + execNodes.length * nh + (execNodes.length - 1) * ng;
178
+ const inferNodes = ['rollout', 'vllm'].map(id => nodes.find(n => n.id === id));
179
+ const inferH = glh + gp * 2 + inferNodes.length * nh + (inferNodes.length - 1) * ng;
180
+ const leftTotalH = execH + rg + inferH;
181
+ const leftCenterY = pipeTop + pipeH / 2;
182
+ const leftTop = Math.max(pipeTop, leftCenterY - leftTotalH / 2);
183
+
184
+ const execTop = leftTop;
185
+ execNodes.forEach((n, i) => {
186
+ n._x = leftX + gp; n._y = execTop + glh + gp + i * (nh + ng);
187
+ n._w = nw; n._h = nh; n._r = nr;
188
+ });
189
+ const execGroup = groups.find(g => g.id === 'execution');
190
+ execGroup._x = leftX; execGroup._y = execTop;
191
+ execGroup._w = leftW; execGroup._h = execH; execGroup._r = gr;
192
+
193
+ const inferTop = execTop + execH + rg;
194
+ inferNodes.forEach((n, i) => {
195
+ n._x = leftX + gp; n._y = inferTop + glh + gp + i * (nh + ng);
196
+ n._w = nw; n._h = nh; n._r = nr;
197
+ });
198
+ const inferGroup = groups.find(g => g.id === 'inference');
199
+ inferGroup._x = leftX; inferGroup._y = inferTop;
200
+ inferGroup._w = leftW; inferGroup._h = inferH; inferGroup._r = gr;
201
+
202
+ // -- Right column: Output (vertically centered with pipeline)
203
+ const outNodes = ['hf_out', 'card', 'monitor'].map(id => nodes.find(n => n.id === id));
204
+ const outH = glh + gp * 2 + outNodes.length * nh + (outNodes.length - 1) * ng;
205
+ const outTop = pipeTop + (pipeH - outH) / 2;
206
+ outNodes.forEach((n, i) => {
207
+ n._x = rightX + gp; n._y = outTop + glh + gp + i * (nh + ng);
208
+ n._w = nw; n._h = nh; n._r = nr;
209
+ });
210
+ const outGroup = groups.find(g => g.id === 'output');
211
+ outGroup._x = rightX; outGroup._y = outTop;
212
+ outGroup._w = rightW; outGroup._h = outH; outGroup._r = gr;
213
+
214
+ const maxY = Math.max(
215
+ ...nodes.map(n => n._y + n._h + gp),
216
+ ...groups.map(g => g._y + g._h)
217
+ );
218
+ svg.attr('height', maxY + Math.round(8 * s));
219
+
220
+ return s;
221
+ }
222
+
223
+ function anchor(n, side) {
224
+ if (side === 'top') return { x: n._x + n._w / 2, y: n._y };
225
+ if (side === 'bottom') return { x: n._x + n._w / 2, y: n._y + n._h };
226
+ if (side === 'left') return { x: n._x, y: n._y + n._h / 2 };
227
+ if (side === 'right') return { x: n._x + n._w, y: n._y + n._h / 2 };
228
+ }
229
+
230
+ function bezier(a, b, orient) {
231
+ if (orient === 'v') {
232
+ const d = (b.y - a.y) * 0.45;
233
+ return `M${a.x},${a.y} C${a.x},${a.y + d} ${b.x},${b.y - d} ${b.x},${b.y}`;
234
+ }
235
+ const d = (b.x - a.x) * 0.4;
236
+ return `M${a.x},${a.y} C${a.x + d},${a.y} ${b.x - d},${b.y} ${b.x},${b.y}`;
237
+ }
238
+
239
+ function edgePath(e) {
240
+ const f = nodes.find(n => n.id === e.from);
241
+ const t = nodes.find(n => n.id === e.to);
242
+ if (!f || !t) return '';
243
+
244
+ // Explicit routing for each edge
245
+ if (e.from === 'hf_in' && e.to === 'read') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
246
+ if (e.from === 'read' && e.to === 'transform') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
247
+ if (e.from === 'transform' && e.to === 'write') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
248
+ if (e.from === 'transform' && e.to === 'rollout') return bezier(anchor(f, 'left'), anchor(t, 'right'), 'h');
249
+ if (e.from === 'rollout' && e.to === 'vllm') return bezier(anchor(f, 'bottom'), anchor(t, 'top'), 'v');
250
+ if (e.from === 'write' && e.to === 'hf_out') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
251
+ if (e.from === 'write' && e.to === 'card') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
252
+ if (e.from === 'write' && e.to === 'monitor') return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
253
+
254
+ // Fallback
255
+ return bezier(anchor(f, 'right'), anchor(t, 'left'), 'h');
256
+ }
257
+
258
+ function render() {
259
+ const s = computeLayout();
260
+ const c = colors();
261
+
262
+ const fs = Math.max(11, Math.round(13 * s));
263
+ const fsSub = Math.max(10, Math.round(11 * s));
264
+ const fsGrp = Math.max(10, Math.round(11 * s));
265
+ const fsIcon = Math.max(12, Math.round(14 * s));
266
+
267
+ defs.select('#pl-arrow path').attr('fill', c.arrow);
268
+
269
+ // Groups
270
+ const gSel = gGroups.selectAll('g.grp').data(groups, d => d.id);
271
+ const gE = gSel.enter().append('g').attr('class', 'grp');
272
+ gE.append('rect');
273
+ gE.append('text').attr('class', 'grp-icon');
274
+ gE.append('text').attr('class', 'group-label');
275
+ const gM = gE.merge(gSel);
276
+ gM.select('rect')
277
+ .attr('x', d => d._x).attr('y', d => d._y)
278
+ .attr('width', d => d._w).attr('height', d => d._h)
279
+ .attr('rx', d => d._r).attr('ry', d => d._r)
280
+ .attr('fill', d => d.id === 'pipeline' ? c.pipeBg : c.groupBg)
281
+ .attr('stroke', d => d.id === 'pipeline' ? c.pipeBd : c.groupBd)
282
+ .attr('stroke-width', 1);
283
+ gM.select('.grp-icon')
284
+ .attr('x', d => d._x + Math.round(10 * s))
285
+ .attr('y', d => d._y + Math.round(19 * s))
286
+ .style('font-size', fsIcon + 'px')
287
+ .text(d => d.icon);
288
+ gM.select('.group-label')
289
+ .attr('x', d => d._x + Math.round(10 * s) + fsIcon + Math.round(3 * s))
290
+ .attr('y', d => d._y + Math.round(19 * s))
291
+ .style('font-size', fsGrp + 'px')
292
+ .text(d => d.label);
293
+ gSel.exit().remove();
294
+
295
+ // Edges
296
+ const eSel = gEdges.selectAll('path.edge-path').data(edges, d => d.from + d.to);
297
+ eSel.enter().append('path').attr('class', 'edge-path')
298
+ .attr('marker-end', 'url(#pl-arrow)')
299
+ .merge(eSel)
300
+ .attr('d', edgePath)
301
+ .attr('stroke', c.edge)
302
+ .attr('stroke-width', Math.max(1.5, 1.8 * s));
303
+ eSel.exit().remove();
304
+
305
+ // Nodes
306
+ const nSel = gNodes.selectAll('g.node-group').data(nodes, d => d.id);
307
+ const nE = nSel.enter().append('g').attr('class', 'node-group');
308
+ nE.append('rect').attr('class', 'node-card');
309
+ nE.append('text').attr('class', 'node-title');
310
+ nE.append('text').attr('class', 'node-subtitle');
311
+ const nM = nE.merge(nSel);
312
+ nM.attr('transform', d => `translate(${d._x},${d._y})`);
313
+ nM.select('.node-card')
314
+ .attr('width', d => d._w).attr('height', d => d._h)
315
+ .attr('rx', d => d._r).attr('ry', d => d._r)
316
+ .attr('fill', c.nodeBg).attr('stroke', c.nodeBd).attr('stroke-width', 1);
317
+ nM.select('.node-title')
318
+ .attr('x', d => d._w / 2).attr('y', d => d.sub ? d._h * 0.38 : d._h / 2)
319
+ .attr('text-anchor', 'middle').attr('dominant-baseline', 'middle')
320
+ .style('font-size', fs + 'px').text(d => d.label);
321
+ nM.select('.node-subtitle')
322
+ .attr('x', d => d._w / 2).attr('y', d => d._h * 0.68)
323
+ .attr('text-anchor', 'middle').attr('dominant-baseline', 'middle')
324
+ .style('font-size', fsSub + 'px').text(d => d.sub || '');
325
+ nM.on('mouseenter', (ev, d) => { if (d.tip) showTip(ev, `<strong>${d.label}</strong>${d.tip}`); })
326
+ .on('mousemove', (ev, d) => { if (d.tip) showTip(ev, `<strong>${d.label}</strong>${d.tip}`); })
327
+ .on('mouseleave', hideTip);
328
+ nSel.exit().remove();
329
+ }
330
+
331
+ render();
332
+ if (window.ResizeObserver) { new ResizeObserver(() => render()).observe(container); }
333
+ else { window.addEventListener('resize', render); }
334
+ new MutationObserver(() => render()).observe(document.documentElement, { attributes: true, attributeFilter: ['data-theme'] });
335
+ };
336
+
337
+ if (document.readyState === 'loading') {
338
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
339
+ } else { ensureD3(bootstrap); }
340
+ })();
341
+ </script>