Replace comparison matrices with vertical layout

ea2e674 verified 2 days ago

6.82 kB

	---
	license: other
	license_name: ideogram-4-non-commercial
	base_model: ideogram-ai/ideogram-4-fp8
	pipeline_tag: text-to-image
	tags:
	- ideogram
	- text-to-image
	- sdnq
	- uint4
	- diffusion
	- typography
	---

	# Ideogram 4 FP8 -> SDNQ UInt4

	This is an experimental SDNQ UInt4 conversion of `ideogram-ai/ideogram-4-fp8`. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ `uint4` component-by-component.

	The model includes SDNQ-compressed `text_encoder`, `transformer`, `unconditional_transformer`, and `vae` components. The official `ideogram4` loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes `ideogram4_sdnq_pipeline.py`.

	## Usage

	```python
	import torch
	from ideogram4 import PRESETS
	from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline

	pipe = Ideogram4SDNQPipeline.from_pretrained(
	"WaveCut/ideogram-4-sdnq-uint4",
	device="cuda",
	dtype=torch.bfloat16,
	)

	preset = PRESETS["V4_DEFAULT_20"]
	image = pipe(
	"a typographic poster reading HELLO WORLD",
	height=1024,
	width=1024,
	num_steps=preset.num_steps,
	guidance_schedule=preset.guidance_schedule,
	mu=preset.mu,
	std=preset.std,
	seed=4101,
	raise_on_caption_issues=False,
	)[0]
	image.save("out.png")
	```

	Install requirements:

	```bash
	pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow
	```

	## Component Structure

	Upstream FP8 structure:

	- `text_encoder`: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT.
	- `transformer`: conditional 34-layer single-stream DiT.
	- `unconditional_transformer`: image-only negative branch used for asymmetric CFG.
	- `vae`: Flux2-style KL autoencoder decoder.
	- `tokenizer` and `scheduler`: copied from upstream.

	## Quantization

	\| Component \| Source materialized MB \| SDNQ state MB \| Quantize s \| Quant peak nvidia MB \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| transformer \| 17698.84 \| 4979.66 \| 112.64 \| 36525.00 \|
	\| unconditional_transformer \| 17698.84 \| 4979.66 \| 108.68 \| 36525.00 \|
	\| text_encoder \| 14435.59 \| 4097.53 \| 102.32 \| 24477.00 \|
	\| vae \| 160.31 \| 50.19 \| 2.68 \| 861.00 \|

	## Benchmark

	Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with `V4_DEFAULT_20`.
	The FP8 baseline was loaded through the upstream `ideogram4` `Ideogram4Pipeline.from_pretrained` recipe with `weights_repo="ideogram-ai/ideogram-4-fp8"`; magic-prompt expansion was disabled because the prompts are already structured captions.

	\| Variant \| Load s \| Load peak reserved MB \| Load peak nvidia MB \| Cold request s \| Hot mean s \| Gen peak reserved MB \| Gen peak nvidia MB \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| original \| 267.83 \| 28198.00 \| 28759.00 \| 17.90 \| 17.51 \| 34430.00 \| 35099.00 \|
	\| sdnq \| 239.46 \| 14558.00 \| 15109.00 \| 18.56 \| 16.52 \| 21650.00 \| 22321.00 \|

	## Example Matrix

	The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95.

	![Original FP8 vs SDNQ UInt4 vertical comparison](assets/original_vs_sdnq_vertical.webp)

	## Prompt Set

	\| # \| id \| summary \|
	\| --- \| --- \| --- \|
	\| 1 \| `editorial_watch_photo` \| A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial. \|
	\| 2 \| `risograph_botanical_poster` \| A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy. \|
	\| 3 \| `cyrillic_cafe_menu` \| A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels. \|
	\| 4 \| `brutalist_architecture` \| A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale. \|
	\| 5 \| `ink_manga_rain` \| A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage. \|
	\| 6 \| `museum_clay_render` \| A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules. \|
	\| 7 \| `food_packaging_label` \| A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy. \|
	\| 8 \| `fantasy_map_typography` \| A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations. \|
	\| 9 \| `streetwear_lookbook` \| A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures. \|
	\| 10 \| `scientific_cutaway` \| A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography. \|

	## Files

	- `prompts.json`: the 10 structured prompts used for the comparison.
	- `assets/original_vs_sdnq_vertical.webp`: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95.
	- `assets/sdnq_vs_nf4_4090_vertical.webp`: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95.
	- `benchmark/`: raw benchmark JSONL/CSV files and `summary.json`.
	- `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks.
	- `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components.

	## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4

	Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader.

	\| Variant \| Cases \| Load s \| Load peak reserved MB \| Load peak nvidia MB \| Cold request s \| Hot mean s \| Hot max s \| Gen peak reserved MB \| Gen peak nvidia MB \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| sdnq \| 10.00 \| 211.61 \| 14124.00 \| 14466.00 \| 59.65 \| 37.05 \| 37.57 \| 19768.00 \| 20521.00 \|
	\| nf4 \| 10.00 \| 269.31 \| 15370.00 \| 15766.00 \| 36.57 \| 36.31 \| 36.77 \| 21012.00 \| 21801.00 \|

	![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_vertical.webp)

	Raw follow-up metrics are in `benchmark/summary_4090_sdnq_vs_nf4.json`, `benchmark/sdnq_4090_metrics.`, and `benchmark/nf4_4090_metrics.`. The exact runner used for the follow-up is `benchmark/followup_runner.py`.


	## License

	This checkpoint is derived from `ideogram-ai/ideogram-4-fp8` and follows the upstream Ideogram 4 non-commercial license. See `LICENSE.md`.