WaveCut commited on
Commit
98ad5d3
·
verified ·
1 Parent(s): f3d279e

Add RTX 4090 SDNQ vs NF4 follow-up benchmark

Browse files
.gitattributes CHANGED
@@ -42,3 +42,4 @@ assets/benchmarks/opensource.png filter=lfs diff=lfs merge=lfs -text
42
  assets/benchmarks/opensource2.png filter=lfs diff=lfs merge=lfs -text
43
  assets/samples/collage_landscape.jpg filter=lfs diff=lfs merge=lfs -text
44
  assets/comparison_matrix.webp filter=lfs diff=lfs merge=lfs -text
 
 
42
  assets/benchmarks/opensource2.png filter=lfs diff=lfs merge=lfs -text
43
  assets/samples/collage_landscape.jpg filter=lfs diff=lfs merge=lfs -text
44
  assets/comparison_matrix.webp filter=lfs diff=lfs merge=lfs -text
45
+ assets/sdnq_vs_nf4_4090_side_by_side.webp filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -110,9 +110,19 @@ The matrix below contains the 10 original FP8 generations followed by the 10 SDN
110
  - `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks.
111
  - `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components.
112
 
113
- ## Follow-up
 
 
 
 
 
 
 
 
 
 
 
114
 
115
- A separate follow-up run will compare this SDNQ UInt4 checkpoint against the official `ideogram-ai/ideogram-4-nf4` checkpoint on an RTX 3090/4090-class pod and append the full-pipeline results here.
116
 
117
  ## License
118
 
 
110
  - `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks.
111
  - `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components.
112
 
113
+ ## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4
114
+
115
+ Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader.
116
+
117
+ | Variant | Cases | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Hot max s | Gen peak reserved MB | Gen peak nvidia MB |
118
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
119
+ | sdnq | 10.00 | 211.61 | 14124.00 | 14466.00 | 59.65 | 37.05 | 37.57 | 19768.00 | 20521.00 |
120
+ | nf4 | 10.00 | 269.31 | 15370.00 | 15766.00 | 36.57 | 36.31 | 36.77 | 21012.00 | 21801.00 |
121
+
122
+ ![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_side_by_side.webp)
123
+
124
+ Raw follow-up metrics are in `benchmark/summary_4090_sdnq_vs_nf4.json`, `benchmark/sdnq_4090_metrics.*`, and `benchmark/nf4_4090_metrics.*`. The exact runner used for the follow-up is `benchmark/followup_runner.py`.
125
 
 
126
 
127
  ## License
128
 
assets/sdnq_vs_nf4_4090_side_by_side.webp ADDED

Git LFS Details

  • SHA256: a5d1510d45e62ea94c8ab9ee3b2b61e7808642d6b4710a28e05db6ffaf833a82
  • Pointer size: 132 Bytes
  • Size of remote file: 4.22 MB
benchmark/followup_runner.py ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import csv
5
+ import gc
6
+ import json
7
+ import os
8
+ import shutil
9
+ import subprocess
10
+ import sys
11
+ import threading
12
+ import time
13
+ from pathlib import Path
14
+ from typing import Any, Callable
15
+
16
+ import torch
17
+ from huggingface_hub import hf_hub_download, snapshot_download
18
+ from PIL import Image, ImageDraw, ImageFont
19
+
20
+ from ideogram4 import Ideogram4Pipeline, Ideogram4PipelineConfig, PRESETS
21
+
22
+
23
+ SDNQ_REPO = "WaveCut/ideogram-4-sdnq-uint4"
24
+ NF4_REPO = "ideogram-ai/ideogram-4-nf4"
25
+ DTYPE = torch.bfloat16
26
+
27
+
28
+ def read_json(path: Path) -> Any:
29
+ with path.open("r", encoding="utf-8") as f:
30
+ return json.load(f)
31
+
32
+
33
+ def write_json(path: Path, payload: Any) -> None:
34
+ path.parent.mkdir(parents=True, exist_ok=True)
35
+ with path.open("w", encoding="utf-8") as f:
36
+ json.dump(payload, f, ensure_ascii=False, indent=2)
37
+ f.write("\n")
38
+
39
+
40
+ def prompt_to_string(prompt_case: dict[str, Any]) -> str:
41
+ return json.dumps(prompt_case["caption"], ensure_ascii=False, separators=(",", ":"))
42
+
43
+
44
+ def current_gpu_mb() -> int | None:
45
+ try:
46
+ output = subprocess.check_output(
47
+ ["nvidia-smi", "--query-gpu=memory.used", "--format=csv,noheader,nounits"],
48
+ text=True,
49
+ timeout=5,
50
+ )
51
+ return max(int(line.strip()) for line in output.splitlines() if line.strip())
52
+ except Exception:
53
+ return None
54
+
55
+
56
+ class GpuPeakMonitor:
57
+ def __init__(self, interval: float = 0.05) -> None:
58
+ self.interval = interval
59
+ self.samples: list[int] = []
60
+ self._stop = threading.Event()
61
+ self._thread: threading.Thread | None = None
62
+
63
+ def start(self) -> None:
64
+ self.samples = []
65
+ self._stop.clear()
66
+ self._thread = threading.Thread(target=self._run, daemon=True)
67
+ self._thread.start()
68
+
69
+ def stop(self) -> int | None:
70
+ self._stop.set()
71
+ if self._thread is not None:
72
+ self._thread.join(timeout=2)
73
+ return max(self.samples) if self.samples else None
74
+
75
+ def _run(self) -> None:
76
+ while not self._stop.is_set():
77
+ value = current_gpu_mb()
78
+ if value is not None:
79
+ self.samples.append(value)
80
+ time.sleep(self.interval)
81
+
82
+
83
+ def cuda_cleanup() -> None:
84
+ gc.collect()
85
+ if torch.cuda.is_available():
86
+ torch.cuda.empty_cache()
87
+ torch.cuda.reset_peak_memory_stats()
88
+ torch.cuda.synchronize()
89
+
90
+
91
+ def measure(name: str, fn: Callable[[], Any], extra: dict[str, Any] | None = None) -> tuple[Any, dict[str, Any]]:
92
+ cuda_cleanup()
93
+ before = current_gpu_mb()
94
+ monitor = GpuPeakMonitor()
95
+ monitor.start()
96
+ start = time.perf_counter()
97
+ result = fn()
98
+ if torch.cuda.is_available():
99
+ torch.cuda.synchronize()
100
+ elapsed = time.perf_counter() - start
101
+ nvidia_peak = monitor.stop()
102
+ after = current_gpu_mb()
103
+ row = {
104
+ "name": name,
105
+ "elapsed_seconds": elapsed,
106
+ "gpu_before_mb": before,
107
+ "gpu_after_mb": after,
108
+ "gpu_peak_mb": nvidia_peak,
109
+ "torch_peak_allocated_mb": (
110
+ torch.cuda.max_memory_allocated() / 1024 / 1024 if torch.cuda.is_available() else None
111
+ ),
112
+ "torch_peak_reserved_mb": (
113
+ torch.cuda.max_memory_reserved() / 1024 / 1024 if torch.cuda.is_available() else None
114
+ ),
115
+ }
116
+ if extra:
117
+ row.update(extra)
118
+ return result, row
119
+
120
+
121
+ def append_jsonl(path: Path, row: dict[str, Any]) -> None:
122
+ path.parent.mkdir(parents=True, exist_ok=True)
123
+ with path.open("a", encoding="utf-8") as f:
124
+ f.write(json.dumps(row, ensure_ascii=False, default=str) + "\n")
125
+
126
+
127
+ def write_csv(path: Path, rows: list[dict[str, Any]]) -> None:
128
+ if not rows:
129
+ return
130
+ path.parent.mkdir(parents=True, exist_ok=True)
131
+ keys: list[str] = []
132
+ for row in rows:
133
+ for key in row:
134
+ if key not in keys:
135
+ keys.append(key)
136
+ with path.open("w", encoding="utf-8", newline="") as f:
137
+ writer = csv.DictWriter(f, fieldnames=keys)
138
+ writer.writeheader()
139
+ writer.writerows(rows)
140
+
141
+
142
+ def load_prompts(path: Path) -> list[dict[str, Any]]:
143
+ if path.exists():
144
+ return read_json(path)
145
+ downloaded = Path(hf_hub_download(SDNQ_REPO, filename="prompts.json"))
146
+ return read_json(downloaded)
147
+
148
+
149
+ def ensure_sdnq_helper() -> None:
150
+ helper = Path(hf_hub_download(SDNQ_REPO, filename="ideogram4_sdnq_pipeline.py"))
151
+ sys.path.insert(0, str(helper.parent))
152
+
153
+
154
+ def load_pipeline(variant: str, device: str):
155
+ if variant == "sdnq":
156
+ ensure_sdnq_helper()
157
+ from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline
158
+
159
+ return Ideogram4SDNQPipeline.from_pretrained(
160
+ SDNQ_REPO,
161
+ device=device,
162
+ dtype=DTYPE,
163
+ use_quantized_matmul=False,
164
+ dequantize_fp32=False,
165
+ )
166
+ if variant == "nf4":
167
+ return Ideogram4Pipeline.from_pretrained(
168
+ config=Ideogram4PipelineConfig(weights_repo=NF4_REPO),
169
+ device=device,
170
+ dtype=DTYPE,
171
+ )
172
+ raise ValueError(f"unknown variant: {variant}")
173
+
174
+
175
+ def command_generate(args: argparse.Namespace) -> None:
176
+ output_dir = Path(args.output_dir)
177
+ image_dir = output_dir / "images"
178
+ image_dir.mkdir(parents=True, exist_ok=True)
179
+ metrics_path = output_dir / f"{args.variant}_metrics.jsonl"
180
+ if metrics_path.exists():
181
+ metrics_path.unlink()
182
+ prompts = load_prompts(Path(args.prompts))
183
+ preset = PRESETS[args.preset]
184
+
185
+ pipe, load_row = measure(
186
+ f"{args.variant}_load",
187
+ lambda: load_pipeline(args.variant, args.device),
188
+ {"variant": args.variant, "hardware": args.hardware, "preset": args.preset},
189
+ )
190
+ append_jsonl(metrics_path, load_row)
191
+ rows = [load_row]
192
+
193
+ for idx, case in enumerate(prompts):
194
+ prompt = prompt_to_string(case)
195
+ seed = int(case.get("seed", idx))
196
+ height = int(case.get("height", args.height))
197
+ width = int(case.get("width", args.width))
198
+
199
+ def run_case() -> Image.Image:
200
+ return pipe(
201
+ prompt,
202
+ height=height,
203
+ width=width,
204
+ num_steps=preset.num_steps,
205
+ guidance_schedule=preset.guidance_schedule,
206
+ mu=preset.mu,
207
+ std=preset.std,
208
+ seed=seed,
209
+ raise_on_caption_issues=False,
210
+ )[0]
211
+
212
+ image, row = measure(
213
+ f"{args.variant}_generate",
214
+ run_case,
215
+ {
216
+ "variant": args.variant,
217
+ "hardware": args.hardware,
218
+ "case_id": case["id"],
219
+ "case_index": idx,
220
+ "seed": seed,
221
+ "height": height,
222
+ "width": width,
223
+ "preset": args.preset,
224
+ "request_temperature": "cold" if idx == 0 else "hot",
225
+ },
226
+ )
227
+ out_path = image_dir / f"{idx + 1:02d}_{case['id']}_{args.variant}.png"
228
+ image.save(out_path)
229
+ row["image"] = str(out_path)
230
+ append_jsonl(metrics_path, row)
231
+ rows.append(row)
232
+ print(json.dumps(row, ensure_ascii=False, default=str), flush=True)
233
+
234
+ write_csv(output_dir / f"{args.variant}_metrics.csv", rows)
235
+
236
+
237
+ def read_jsonl(path: Path) -> list[dict[str, Any]]:
238
+ if not path.exists():
239
+ return []
240
+ return [json.loads(line) for line in path.read_text(encoding="utf-8").splitlines() if line.strip()]
241
+
242
+
243
+ def summarize_variant(rows: list[dict[str, Any]], variant: str) -> dict[str, Any]:
244
+ load = next((r for r in rows if r.get("name") == f"{variant}_load"), {})
245
+ gens = [r for r in rows if r.get("name") == f"{variant}_generate"]
246
+ cold = next((r for r in gens if r.get("request_temperature") == "cold"), {})
247
+ hot = [r for r in gens if r.get("request_temperature") == "hot"]
248
+
249
+ def mean(key: str, items: list[dict[str, Any]]) -> float | None:
250
+ vals = [float(x[key]) for x in items if x.get(key) not in (None, "")]
251
+ return sum(vals) / len(vals) if vals else None
252
+
253
+ def maxv(key: str, items: list[dict[str, Any]]) -> float | None:
254
+ vals = [float(x[key]) for x in items if x.get(key) not in (None, "")]
255
+ return max(vals) if vals else None
256
+
257
+ return {
258
+ "variant": variant,
259
+ "load_seconds": load.get("elapsed_seconds"),
260
+ "load_peak_reserved_mb": load.get("torch_peak_reserved_mb"),
261
+ "load_peak_nvidia_mb": load.get("gpu_peak_mb"),
262
+ "cold_request_seconds": cold.get("elapsed_seconds"),
263
+ "cold_request_peak_reserved_mb": cold.get("torch_peak_reserved_mb"),
264
+ "cold_request_peak_nvidia_mb": cold.get("gpu_peak_mb"),
265
+ "hot_request_mean_seconds": mean("elapsed_seconds", hot),
266
+ "hot_request_max_seconds": maxv("elapsed_seconds", hot),
267
+ "generation_peak_reserved_mb": maxv("torch_peak_reserved_mb", gens),
268
+ "generation_peak_nvidia_mb": maxv("gpu_peak_mb", gens),
269
+ "cases": len(gens),
270
+ }
271
+
272
+
273
+ def fmt(value: Any) -> str:
274
+ if value is None or value == "":
275
+ return ""
276
+ if isinstance(value, str):
277
+ return value
278
+ return f"{float(value):.2f}"
279
+
280
+
281
+ def markdown_table(rows: list[dict[str, Any]], keys: list[tuple[str, str]]) -> str:
282
+ header = "| " + " | ".join(label for label, _ in keys) + " |"
283
+ sep = "| " + " | ".join("---" for _ in keys) + " |"
284
+ body = ["| " + " | ".join(fmt(row.get(key)) for _, key in keys) + " |" for row in rows]
285
+ return "\n".join([header, sep, *body])
286
+
287
+
288
+ def load_font(size: int) -> ImageFont.ImageFont:
289
+ for path in [
290
+ "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
291
+ "/usr/share/fonts/truetype/liberation2/LiberationSans-Regular.ttf",
292
+ ]:
293
+ try:
294
+ return ImageFont.truetype(path, size)
295
+ except Exception:
296
+ pass
297
+ return ImageFont.load_default()
298
+
299
+
300
+ def draw_centered(draw: ImageDraw.ImageDraw, xy: tuple[int, int, int, int], text: str, font: ImageFont.ImageFont, fill: tuple[int, int, int]) -> None:
301
+ left, top, right, bottom = xy
302
+ bbox = draw.textbbox((0, 0), text, font=font)
303
+ x = left + (right - left - (bbox[2] - bbox[0])) // 2
304
+ y = top + (bottom - top - (bbox[3] - bbox[1])) // 2
305
+ draw.text((x, y), text, font=font, fill=fill)
306
+
307
+
308
+ def make_side_by_side_matrix(
309
+ left_images: list[Path],
310
+ right_images: list[Path],
311
+ left_label: str,
312
+ right_label: str,
313
+ output_path: Path,
314
+ ) -> None:
315
+ if len(left_images) != len(right_images):
316
+ raise ValueError("left and right image counts differ")
317
+ count = len(left_images)
318
+ canvas_size = 8192
319
+ header_h = 160
320
+ row_h = (canvas_size - header_h) // count
321
+ col_w = canvas_size // 2
322
+ tile = min(col_w, row_h) - 18
323
+ bg = (18, 18, 18)
324
+ line = (58, 58, 58)
325
+ canvas = Image.new("RGB", (canvas_size, canvas_size), bg)
326
+ draw = ImageDraw.Draw(canvas)
327
+ header_font = load_font(82)
328
+ label_font = load_font(36)
329
+ draw.rectangle((0, 0, canvas_size, header_h), fill=(28, 28, 28))
330
+ draw_centered(draw, (0, 0, col_w, header_h), left_label, header_font, (245, 245, 245))
331
+ draw_centered(draw, (col_w, 0, canvas_size, header_h), right_label, header_font, (245, 245, 245))
332
+ draw.line((col_w, 0, col_w, canvas_size), fill=line, width=3)
333
+
334
+ for idx, (left_path, right_path) in enumerate(zip(left_images, right_images)):
335
+ y = header_h + idx * row_h
336
+ draw.line((0, y, canvas_size, y), fill=line, width=1)
337
+ for col, path in enumerate([left_path, right_path]):
338
+ with Image.open(path) as img:
339
+ img = img.convert("RGB")
340
+ img.thumbnail((tile, tile), Image.Resampling.LANCZOS)
341
+ x0 = col * col_w
342
+ px = x0 + (col_w - img.width) // 2
343
+ py = y + (row_h - img.height) // 2
344
+ canvas.paste(img, (px, py))
345
+ label = path.stem.split("_", 1)[-1].rsplit("_", 1)[0]
346
+ draw.text((col * col_w + 28, y + 16), f"{idx + 1:02d} {label}", font=label_font, fill=(230, 230, 230))
347
+
348
+ output_path.parent.mkdir(parents=True, exist_ok=True)
349
+ canvas.save(output_path, "WEBP", quality=95, method=6)
350
+
351
+
352
+ def command_collect(args: argparse.Namespace) -> None:
353
+ results_dir = Path(args.results_dir)
354
+ publish_dir = Path(args.publish_dir)
355
+ publish_dir.mkdir(parents=True, exist_ok=True)
356
+ sdnq_rows = read_jsonl(results_dir / "sdnq" / "sdnq_metrics.jsonl")
357
+ nf4_rows = read_jsonl(results_dir / "nf4" / "nf4_metrics.jsonl")
358
+ summaries = [summarize_variant(sdnq_rows, "sdnq"), summarize_variant(nf4_rows, "nf4")]
359
+ write_json(publish_dir / "summary_4090_sdnq_vs_nf4.json", summaries)
360
+
361
+ sdnq_images = sorted((results_dir / "sdnq" / "images").glob("*_sdnq.png"))
362
+ nf4_images = sorted((results_dir / "nf4" / "images").glob("*_nf4.png"))
363
+ matrix_path = publish_dir / "sdnq_vs_nf4_4090_side_by_side.webp"
364
+ make_side_by_side_matrix(sdnq_images, nf4_images, "SDNQ UInt4", "Official NF4", matrix_path)
365
+
366
+ for rel in [
367
+ "sdnq/sdnq_metrics.jsonl",
368
+ "sdnq/sdnq_metrics.csv",
369
+ "nf4/nf4_metrics.jsonl",
370
+ "nf4/nf4_metrics.csv",
371
+ ]:
372
+ src = results_dir / rel
373
+ if src.exists():
374
+ shutil.copy2(src, publish_dir / src.name.replace("_metrics", "_4090_metrics"))
375
+
376
+ table = markdown_table(
377
+ summaries,
378
+ [
379
+ ("Variant", "variant"),
380
+ ("Cases", "cases"),
381
+ ("Load s", "load_seconds"),
382
+ ("Load peak reserved MB", "load_peak_reserved_mb"),
383
+ ("Load peak nvidia MB", "load_peak_nvidia_mb"),
384
+ ("Cold request s", "cold_request_seconds"),
385
+ ("Hot mean s", "hot_request_mean_seconds"),
386
+ ("Hot max s", "hot_request_max_seconds"),
387
+ ("Gen peak reserved MB", "generation_peak_reserved_mb"),
388
+ ("Gen peak nvidia MB", "generation_peak_nvidia_mb"),
389
+ ],
390
+ )
391
+ (publish_dir / "README_APPEND.md").write_text(
392
+ f"""## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4
393
+
394
+ Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader.
395
+
396
+ {table}
397
+
398
+ ![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_side_by_side.webp)
399
+ """,
400
+ encoding="utf-8",
401
+ )
402
+ print(table)
403
+ print(matrix_path)
404
+
405
+
406
+ def main() -> None:
407
+ parser = argparse.ArgumentParser()
408
+ sub = parser.add_subparsers(dest="command", required=True)
409
+
410
+ gen = sub.add_parser("generate")
411
+ gen.add_argument("--variant", choices=["sdnq", "nf4"], required=True)
412
+ gen.add_argument("--prompts", default="/workspace/ideogram4_followup/prompts.json")
413
+ gen.add_argument("--output-dir", required=True)
414
+ gen.add_argument("--device", default="cuda")
415
+ gen.add_argument("--height", type=int, default=1024)
416
+ gen.add_argument("--width", type=int, default=1024)
417
+ gen.add_argument("--preset", default="V4_DEFAULT_20", choices=sorted(PRESETS))
418
+ gen.add_argument("--hardware", default="NVIDIA GeForce RTX 4090")
419
+ gen.set_defaults(func=command_generate)
420
+
421
+ collect = sub.add_parser("collect")
422
+ collect.add_argument("--results-dir", default="/workspace/ideogram4_followup/results")
423
+ collect.add_argument("--publish-dir", default="/workspace/ideogram4_followup/publish")
424
+ collect.set_defaults(func=command_collect)
425
+
426
+ args = parser.parse_args()
427
+ os.environ.setdefault("HF_XET_HIGH_PERFORMANCE", "1")
428
+ args.func(args)
429
+
430
+
431
+ if __name__ == "__main__":
432
+ main()
benchmark/nf4_4090_metrics.csv ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name,elapsed_seconds,gpu_before_mb,gpu_after_mb,gpu_peak_mb,torch_peak_allocated_mb,torch_peak_reserved_mb,variant,hardware,preset,case_id,case_index,seed,height,width,request_temperature,image
2
+ nf4_load,269.30941787501797,396,15766,15766,15349.59521484375,15370.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,,,,,,,
3
+ nf4_generate,36.57375315600075,15766,21430,21430,20293.41552734375,20952.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,editorial_watch_photo,0,4101,1024,1024,cold,/workspace/ideogram4_followup/results/nf4/images/01_editorial_watch_photo_nf4.png
4
+ nf4_generate,36.26296863902826,15888,21370,21761,20256.4404296875,20892.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,risograph_botanical_poster,1,4102,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/02_risograph_botanical_poster_nf4.png
5
+ nf4_generate,36.768314866989385,15888,21490,21490,20344.2578125,21012.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,cyrillic_cafe_menu,2,4103,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/03_cyrillic_cafe_menu_nf4.png
6
+ nf4_generate,36.241010975965764,15888,21410,21410,20290.642578125,20932.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,brutalist_architecture,3,4104,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/04_brutalist_architecture_nf4.png
7
+ nf4_generate,36.19913812598679,15888,21370,21370,20256.4404296875,20892.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,ink_manga_rain,4,4105,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/05_ink_manga_rain_nf4.png
8
+ nf4_generate,36.216044905013405,15888,21390,21390,20269.380859375,20912.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,museum_clay_render,5,4106,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/06_museum_clay_render_nf4.png
9
+ nf4_generate,36.23377947497647,15888,21370,21370,20262.91064453125,20892.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,food_packaging_label,6,4107,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/07_food_packaging_label_nf4.png
10
+ nf4_generate,36.32639682298759,15888,21430,21430,20303.583984375,20952.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,fantasy_map_typography,7,4108,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/08_fantasy_map_typography_nf4.png
11
+ nf4_generate,36.178082400991116,15888,21350,21350,20248.12060546875,20872.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,streetwear_lookbook,8,4109,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/09_streetwear_lookbook_nf4.png
12
+ nf4_generate,36.34287546604173,15888,21410,21801,20292.4912109375,20932.0,nf4,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,scientific_cutaway,9,4110,1024,1024,hot,/workspace/ideogram4_followup/results/nf4/images/10_scientific_cutaway_nf4.png
benchmark/nf4_4090_metrics.jsonl ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"name": "nf4_load", "elapsed_seconds": 269.30941787501797, "gpu_before_mb": 396, "gpu_after_mb": 15766, "gpu_peak_mb": 15766, "torch_peak_allocated_mb": 15349.59521484375, "torch_peak_reserved_mb": 15370.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "preset": "V4_DEFAULT_20"}
2
+ {"name": "nf4_generate", "elapsed_seconds": 36.57375315600075, "gpu_before_mb": 15766, "gpu_after_mb": 21430, "gpu_peak_mb": 21430, "torch_peak_allocated_mb": 20293.41552734375, "torch_peak_reserved_mb": 20952.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "editorial_watch_photo", "case_index": 0, "seed": 4101, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "cold", "image": "/workspace/ideogram4_followup/results/nf4/images/01_editorial_watch_photo_nf4.png"}
3
+ {"name": "nf4_generate", "elapsed_seconds": 36.26296863902826, "gpu_before_mb": 15888, "gpu_after_mb": 21370, "gpu_peak_mb": 21761, "torch_peak_allocated_mb": 20256.4404296875, "torch_peak_reserved_mb": 20892.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "risograph_botanical_poster", "case_index": 1, "seed": 4102, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/02_risograph_botanical_poster_nf4.png"}
4
+ {"name": "nf4_generate", "elapsed_seconds": 36.768314866989385, "gpu_before_mb": 15888, "gpu_after_mb": 21490, "gpu_peak_mb": 21490, "torch_peak_allocated_mb": 20344.2578125, "torch_peak_reserved_mb": 21012.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "cyrillic_cafe_menu", "case_index": 2, "seed": 4103, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/03_cyrillic_cafe_menu_nf4.png"}
5
+ {"name": "nf4_generate", "elapsed_seconds": 36.241010975965764, "gpu_before_mb": 15888, "gpu_after_mb": 21410, "gpu_peak_mb": 21410, "torch_peak_allocated_mb": 20290.642578125, "torch_peak_reserved_mb": 20932.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "brutalist_architecture", "case_index": 3, "seed": 4104, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/04_brutalist_architecture_nf4.png"}
6
+ {"name": "nf4_generate", "elapsed_seconds": 36.19913812598679, "gpu_before_mb": 15888, "gpu_after_mb": 21370, "gpu_peak_mb": 21370, "torch_peak_allocated_mb": 20256.4404296875, "torch_peak_reserved_mb": 20892.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "ink_manga_rain", "case_index": 4, "seed": 4105, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/05_ink_manga_rain_nf4.png"}
7
+ {"name": "nf4_generate", "elapsed_seconds": 36.216044905013405, "gpu_before_mb": 15888, "gpu_after_mb": 21390, "gpu_peak_mb": 21390, "torch_peak_allocated_mb": 20269.380859375, "torch_peak_reserved_mb": 20912.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "museum_clay_render", "case_index": 5, "seed": 4106, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/06_museum_clay_render_nf4.png"}
8
+ {"name": "nf4_generate", "elapsed_seconds": 36.23377947497647, "gpu_before_mb": 15888, "gpu_after_mb": 21370, "gpu_peak_mb": 21370, "torch_peak_allocated_mb": 20262.91064453125, "torch_peak_reserved_mb": 20892.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "food_packaging_label", "case_index": 6, "seed": 4107, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/07_food_packaging_label_nf4.png"}
9
+ {"name": "nf4_generate", "elapsed_seconds": 36.32639682298759, "gpu_before_mb": 15888, "gpu_after_mb": 21430, "gpu_peak_mb": 21430, "torch_peak_allocated_mb": 20303.583984375, "torch_peak_reserved_mb": 20952.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "fantasy_map_typography", "case_index": 7, "seed": 4108, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/08_fantasy_map_typography_nf4.png"}
10
+ {"name": "nf4_generate", "elapsed_seconds": 36.178082400991116, "gpu_before_mb": 15888, "gpu_after_mb": 21350, "gpu_peak_mb": 21350, "torch_peak_allocated_mb": 20248.12060546875, "torch_peak_reserved_mb": 20872.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "streetwear_lookbook", "case_index": 8, "seed": 4109, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/09_streetwear_lookbook_nf4.png"}
11
+ {"name": "nf4_generate", "elapsed_seconds": 36.34287546604173, "gpu_before_mb": 15888, "gpu_after_mb": 21410, "gpu_peak_mb": 21801, "torch_peak_allocated_mb": 20292.4912109375, "torch_peak_reserved_mb": 20932.0, "variant": "nf4", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "scientific_cutaway", "case_index": 9, "seed": 4110, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/nf4/images/10_scientific_cutaway_nf4.png"}
benchmark/sdnq_4090_metrics.csv ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name,elapsed_seconds,gpu_before_mb,gpu_after_mb,gpu_peak_mb,torch_peak_allocated_mb,torch_peak_reserved_mb,variant,hardware,preset,case_id,case_index,seed,height,width,request_temperature,image
2
+ sdnq_load,211.60528413800057,396,14522,14466,14107.06298828125,14124.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,,,,,,,
3
+ sdnq_generate,59.65369569603354,14522,20186,20186,19050.88330078125,19708.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,editorial_watch_photo,0,4101,1024,1024,cold,/workspace/ideogram4_followup/results/sdnq/images/01_editorial_watch_photo_sdnq.png
4
+ sdnq_generate,36.95279458502773,14622,20126,20126,19013.908203125,19648.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,risograph_botanical_poster,1,4102,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/02_risograph_botanical_poster_sdnq.png
5
+ sdnq_generate,37.568486024974845,14622,20246,20246,19101.7255859375,19768.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,cyrillic_cafe_menu,2,4103,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/03_cyrillic_cafe_menu_sdnq.png
6
+ sdnq_generate,37.06334384600632,14622,20186,20186,19048.1103515625,19708.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,brutalist_architecture,3,4104,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/04_brutalist_architecture_sdnq.png
7
+ sdnq_generate,36.373742469004355,14622,20126,20521,19013.908203125,19648.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,ink_manga_rain,4,4105,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/05_ink_manga_rain_sdnq.png
8
+ sdnq_generate,37.08211989700794,14622,20146,20146,19026.8486328125,19668.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,museum_clay_render,5,4106,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/06_museum_clay_render_sdnq.png
9
+ sdnq_generate,37.078365966968704,14622,20146,20146,19020.37841796875,19668.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,food_packaging_label,6,4107,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/07_food_packaging_label_sdnq.png
10
+ sdnq_generate,37.32429828296881,14622,20186,20186,19061.0517578125,19708.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,fantasy_map_typography,7,4108,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/08_fantasy_map_typography_sdnq.png
11
+ sdnq_generate,36.95170207798947,14622,20126,20126,19005.58837890625,19648.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,streetwear_lookbook,8,4109,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/09_streetwear_lookbook_sdnq.png
12
+ sdnq_generate,37.0877975319745,14622,20186,20186,19049.958984375,19708.0,sdnq,NVIDIA GeForce RTX 4090,V4_DEFAULT_20,scientific_cutaway,9,4110,1024,1024,hot,/workspace/ideogram4_followup/results/sdnq/images/10_scientific_cutaway_sdnq.png
benchmark/sdnq_4090_metrics.jsonl ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"name": "sdnq_load", "elapsed_seconds": 211.60528413800057, "gpu_before_mb": 396, "gpu_after_mb": 14522, "gpu_peak_mb": 14466, "torch_peak_allocated_mb": 14107.06298828125, "torch_peak_reserved_mb": 14124.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "preset": "V4_DEFAULT_20"}
2
+ {"name": "sdnq_generate", "elapsed_seconds": 59.65369569603354, "gpu_before_mb": 14522, "gpu_after_mb": 20186, "gpu_peak_mb": 20186, "torch_peak_allocated_mb": 19050.88330078125, "torch_peak_reserved_mb": 19708.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "editorial_watch_photo", "case_index": 0, "seed": 4101, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "cold", "image": "/workspace/ideogram4_followup/results/sdnq/images/01_editorial_watch_photo_sdnq.png"}
3
+ {"name": "sdnq_generate", "elapsed_seconds": 36.95279458502773, "gpu_before_mb": 14622, "gpu_after_mb": 20126, "gpu_peak_mb": 20126, "torch_peak_allocated_mb": 19013.908203125, "torch_peak_reserved_mb": 19648.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "risograph_botanical_poster", "case_index": 1, "seed": 4102, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/02_risograph_botanical_poster_sdnq.png"}
4
+ {"name": "sdnq_generate", "elapsed_seconds": 37.568486024974845, "gpu_before_mb": 14622, "gpu_after_mb": 20246, "gpu_peak_mb": 20246, "torch_peak_allocated_mb": 19101.7255859375, "torch_peak_reserved_mb": 19768.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "cyrillic_cafe_menu", "case_index": 2, "seed": 4103, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/03_cyrillic_cafe_menu_sdnq.png"}
5
+ {"name": "sdnq_generate", "elapsed_seconds": 37.06334384600632, "gpu_before_mb": 14622, "gpu_after_mb": 20186, "gpu_peak_mb": 20186, "torch_peak_allocated_mb": 19048.1103515625, "torch_peak_reserved_mb": 19708.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "brutalist_architecture", "case_index": 3, "seed": 4104, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/04_brutalist_architecture_sdnq.png"}
6
+ {"name": "sdnq_generate", "elapsed_seconds": 36.373742469004355, "gpu_before_mb": 14622, "gpu_after_mb": 20126, "gpu_peak_mb": 20521, "torch_peak_allocated_mb": 19013.908203125, "torch_peak_reserved_mb": 19648.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "ink_manga_rain", "case_index": 4, "seed": 4105, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/05_ink_manga_rain_sdnq.png"}
7
+ {"name": "sdnq_generate", "elapsed_seconds": 37.08211989700794, "gpu_before_mb": 14622, "gpu_after_mb": 20146, "gpu_peak_mb": 20146, "torch_peak_allocated_mb": 19026.8486328125, "torch_peak_reserved_mb": 19668.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "museum_clay_render", "case_index": 5, "seed": 4106, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/06_museum_clay_render_sdnq.png"}
8
+ {"name": "sdnq_generate", "elapsed_seconds": 37.078365966968704, "gpu_before_mb": 14622, "gpu_after_mb": 20146, "gpu_peak_mb": 20146, "torch_peak_allocated_mb": 19020.37841796875, "torch_peak_reserved_mb": 19668.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "food_packaging_label", "case_index": 6, "seed": 4107, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/07_food_packaging_label_sdnq.png"}
9
+ {"name": "sdnq_generate", "elapsed_seconds": 37.32429828296881, "gpu_before_mb": 14622, "gpu_after_mb": 20186, "gpu_peak_mb": 20186, "torch_peak_allocated_mb": 19061.0517578125, "torch_peak_reserved_mb": 19708.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "fantasy_map_typography", "case_index": 7, "seed": 4108, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/08_fantasy_map_typography_sdnq.png"}
10
+ {"name": "sdnq_generate", "elapsed_seconds": 36.95170207798947, "gpu_before_mb": 14622, "gpu_after_mb": 20126, "gpu_peak_mb": 20126, "torch_peak_allocated_mb": 19005.58837890625, "torch_peak_reserved_mb": 19648.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "streetwear_lookbook", "case_index": 8, "seed": 4109, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/09_streetwear_lookbook_sdnq.png"}
11
+ {"name": "sdnq_generate", "elapsed_seconds": 37.0877975319745, "gpu_before_mb": 14622, "gpu_after_mb": 20186, "gpu_peak_mb": 20186, "torch_peak_allocated_mb": 19049.958984375, "torch_peak_reserved_mb": 19708.0, "variant": "sdnq", "hardware": "NVIDIA GeForce RTX 4090", "case_id": "scientific_cutaway", "case_index": 9, "seed": 4110, "height": 1024, "width": 1024, "preset": "V4_DEFAULT_20", "request_temperature": "hot", "image": "/workspace/ideogram4_followup/results/sdnq/images/10_scientific_cutaway_sdnq.png"}
benchmark/summary_4090_sdnq_vs_nf4.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "variant": "sdnq",
4
+ "load_seconds": 211.60528413800057,
5
+ "load_peak_reserved_mb": 14124.0,
6
+ "load_peak_nvidia_mb": 14466,
7
+ "cold_request_seconds": 59.65369569603354,
8
+ "cold_request_peak_reserved_mb": 19708.0,
9
+ "cold_request_peak_nvidia_mb": 20186,
10
+ "hot_request_mean_seconds": 37.05362785354696,
11
+ "hot_request_max_seconds": 37.568486024974845,
12
+ "generation_peak_reserved_mb": 19768.0,
13
+ "generation_peak_nvidia_mb": 20521.0,
14
+ "cases": 10
15
+ },
16
+ {
17
+ "variant": "nf4",
18
+ "load_seconds": 269.30941787501797,
19
+ "load_peak_reserved_mb": 15370.0,
20
+ "load_peak_nvidia_mb": 15766,
21
+ "cold_request_seconds": 36.57375315600075,
22
+ "cold_request_peak_reserved_mb": 20952.0,
23
+ "cold_request_peak_nvidia_mb": 21430,
24
+ "hot_request_mean_seconds": 36.30762351977561,
25
+ "hot_request_max_seconds": 36.768314866989385,
26
+ "generation_peak_reserved_mb": 21012.0,
27
+ "generation_peak_nvidia_mb": 21801.0,
28
+ "cases": 10
29
+ }
30
+ ]