File size: 6,822 Bytes
f3d279e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea2e674
f3d279e
ea2e674
f3d279e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea2e674
 
f3d279e
 
 
 
98ad5d3
 
 
 
 
 
 
 
 
ea2e674
98ad5d3
 
f3d279e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: other
license_name: ideogram-4-non-commercial
base_model: ideogram-ai/ideogram-4-fp8
pipeline_tag: text-to-image
tags:
- ideogram
- text-to-image
- sdnq
- uint4
- diffusion
- typography
---

# Ideogram 4 FP8 -> SDNQ UInt4

This is an experimental SDNQ UInt4 conversion of `ideogram-ai/ideogram-4-fp8`. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ `uint4` component-by-component.

The model includes SDNQ-compressed `text_encoder`, `transformer`, `unconditional_transformer`, and `vae` components. The official `ideogram4` loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes `ideogram4_sdnq_pipeline.py`.

## Usage

```python
import torch
from ideogram4 import PRESETS
from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline

pipe = Ideogram4SDNQPipeline.from_pretrained(
    "WaveCut/ideogram-4-sdnq-uint4",
    device="cuda",
    dtype=torch.bfloat16,
)

preset = PRESETS["V4_DEFAULT_20"]
image = pipe(
    "a typographic poster reading HELLO WORLD",
    height=1024,
    width=1024,
    num_steps=preset.num_steps,
    guidance_schedule=preset.guidance_schedule,
    mu=preset.mu,
    std=preset.std,
    seed=4101,
    raise_on_caption_issues=False,
)[0]
image.save("out.png")
```

Install requirements:

```bash
pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow
```

## Component Structure

Upstream FP8 structure:

- `text_encoder`: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT.
- `transformer`: conditional 34-layer single-stream DiT.
- `unconditional_transformer`: image-only negative branch used for asymmetric CFG.
- `vae`: Flux2-style KL autoencoder decoder.
- `tokenizer` and `scheduler`: copied from upstream.

## Quantization

| Component | Source materialized MB | SDNQ state MB | Quantize s | Quant peak nvidia MB |
| --- | --- | --- | --- | --- |
| transformer | 17698.84 | 4979.66 | 112.64 | 36525.00 |
| unconditional_transformer | 17698.84 | 4979.66 | 108.68 | 36525.00 |
| text_encoder | 14435.59 | 4097.53 | 102.32 | 24477.00 |
| vae | 160.31 | 50.19 | 2.68 | 861.00 |

## Benchmark

Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with `V4_DEFAULT_20`.
The FP8 baseline was loaded through the upstream `ideogram4` `Ideogram4Pipeline.from_pretrained` recipe with `weights_repo="ideogram-ai/ideogram-4-fp8"`; magic-prompt expansion was disabled because the prompts are already structured captions.

| Variant | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Gen peak reserved MB | Gen peak nvidia MB |
| --- | --- | --- | --- | --- | --- | --- | --- |
| original | 267.83 | 28198.00 | 28759.00 | 17.90 | 17.51 | 34430.00 | 35099.00 |
| sdnq | 239.46 | 14558.00 | 15109.00 | 18.56 | 16.52 | 21650.00 | 22321.00 |

## Example Matrix

The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95.

![Original FP8 vs SDNQ UInt4 vertical comparison](assets/original_vs_sdnq_vertical.webp)

## Prompt Set

| # | id | summary |
| --- | --- | --- |
| 1 | `editorial_watch_photo` | A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial. |
| 2 | `risograph_botanical_poster` | A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy. |
| 3 | `cyrillic_cafe_menu` | A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels. |
| 4 | `brutalist_architecture` | A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale. |
| 5 | `ink_manga_rain` | A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage. |
| 6 | `museum_clay_render` | A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules. |
| 7 | `food_packaging_label` | A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy. |
| 8 | `fantasy_map_typography` | A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations. |
| 9 | `streetwear_lookbook` | A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures. |
| 10 | `scientific_cutaway` | A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography. |

## Files

- `prompts.json`: the 10 structured prompts used for the comparison.
- `assets/original_vs_sdnq_vertical.webp`: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95.
- `assets/sdnq_vs_nf4_4090_vertical.webp`: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95.
- `benchmark/`: raw benchmark JSONL/CSV files and `summary.json`.
- `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks.
- `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components.

## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4

Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader.

| Variant | Cases | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Hot max s | Gen peak reserved MB | Gen peak nvidia MB |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| sdnq | 10.00 | 211.61 | 14124.00 | 14466.00 | 59.65 | 37.05 | 37.57 | 19768.00 | 20521.00 |
| nf4 | 10.00 | 269.31 | 15370.00 | 15766.00 | 36.57 | 36.31 | 36.77 | 21012.00 | 21801.00 |

![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_vertical.webp)

Raw follow-up metrics are in `benchmark/summary_4090_sdnq_vs_nf4.json`, `benchmark/sdnq_4090_metrics.*`, and `benchmark/nf4_4090_metrics.*`. The exact runner used for the follow-up is `benchmark/followup_runner.py`.


## License

This checkpoint is derived from `ideogram-ai/ideogram-4-fp8` and follows the upstream Ideogram 4 non-commercial license. See `LICENSE.md`.