WaveCut's picture
Replace comparison matrices with vertical layout
ea2e674 verified
---
license: other
license_name: ideogram-4-non-commercial
base_model: ideogram-ai/ideogram-4-fp8
pipeline_tag: text-to-image
tags:
- ideogram
- text-to-image
- sdnq
- uint4
- diffusion
- typography
---
# Ideogram 4 FP8 -> SDNQ UInt4
This is an experimental SDNQ UInt4 conversion of `ideogram-ai/ideogram-4-fp8`. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ `uint4` component-by-component.
The model includes SDNQ-compressed `text_encoder`, `transformer`, `unconditional_transformer`, and `vae` components. The official `ideogram4` loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes `ideogram4_sdnq_pipeline.py`.
## Usage
```python
import torch
from ideogram4 import PRESETS
from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline
pipe = Ideogram4SDNQPipeline.from_pretrained(
"WaveCut/ideogram-4-sdnq-uint4",
device="cuda",
dtype=torch.bfloat16,
)
preset = PRESETS["V4_DEFAULT_20"]
image = pipe(
"a typographic poster reading HELLO WORLD",
height=1024,
width=1024,
num_steps=preset.num_steps,
guidance_schedule=preset.guidance_schedule,
mu=preset.mu,
std=preset.std,
seed=4101,
raise_on_caption_issues=False,
)[0]
image.save("out.png")
```
Install requirements:
```bash
pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow
```
## Component Structure
Upstream FP8 structure:
- `text_encoder`: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT.
- `transformer`: conditional 34-layer single-stream DiT.
- `unconditional_transformer`: image-only negative branch used for asymmetric CFG.
- `vae`: Flux2-style KL autoencoder decoder.
- `tokenizer` and `scheduler`: copied from upstream.
## Quantization
| Component | Source materialized MB | SDNQ state MB | Quantize s | Quant peak nvidia MB |
| --- | --- | --- | --- | --- |
| transformer | 17698.84 | 4979.66 | 112.64 | 36525.00 |
| unconditional_transformer | 17698.84 | 4979.66 | 108.68 | 36525.00 |
| text_encoder | 14435.59 | 4097.53 | 102.32 | 24477.00 |
| vae | 160.31 | 50.19 | 2.68 | 861.00 |
## Benchmark
Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with `V4_DEFAULT_20`.
The FP8 baseline was loaded through the upstream `ideogram4` `Ideogram4Pipeline.from_pretrained` recipe with `weights_repo="ideogram-ai/ideogram-4-fp8"`; magic-prompt expansion was disabled because the prompts are already structured captions.
| Variant | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Gen peak reserved MB | Gen peak nvidia MB |
| --- | --- | --- | --- | --- | --- | --- | --- |
| original | 267.83 | 28198.00 | 28759.00 | 17.90 | 17.51 | 34430.00 | 35099.00 |
| sdnq | 239.46 | 14558.00 | 15109.00 | 18.56 | 16.52 | 21650.00 | 22321.00 |
## Example Matrix
The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95.
![Original FP8 vs SDNQ UInt4 vertical comparison](assets/original_vs_sdnq_vertical.webp)
## Prompt Set
| # | id | summary |
| --- | --- | --- |
| 1 | `editorial_watch_photo` | A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial. |
| 2 | `risograph_botanical_poster` | A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy. |
| 3 | `cyrillic_cafe_menu` | A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels. |
| 4 | `brutalist_architecture` | A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale. |
| 5 | `ink_manga_rain` | A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage. |
| 6 | `museum_clay_render` | A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules. |
| 7 | `food_packaging_label` | A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy. |
| 8 | `fantasy_map_typography` | A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations. |
| 9 | `streetwear_lookbook` | A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures. |
| 10 | `scientific_cutaway` | A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography. |
## Files
- `prompts.json`: the 10 structured prompts used for the comparison.
- `assets/original_vs_sdnq_vertical.webp`: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95.
- `assets/sdnq_vs_nf4_4090_vertical.webp`: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95.
- `benchmark/`: raw benchmark JSONL/CSV files and `summary.json`.
- `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks.
- `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components.
## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4
Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader.
| Variant | Cases | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Hot max s | Gen peak reserved MB | Gen peak nvidia MB |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| sdnq | 10.00 | 211.61 | 14124.00 | 14466.00 | 59.65 | 37.05 | 37.57 | 19768.00 | 20521.00 |
| nf4 | 10.00 | 269.31 | 15370.00 | 15766.00 | 36.57 | 36.31 | 36.77 | 21012.00 | 21801.00 |
![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_vertical.webp)
Raw follow-up metrics are in `benchmark/summary_4090_sdnq_vs_nf4.json`, `benchmark/sdnq_4090_metrics.*`, and `benchmark/nf4_4090_metrics.*`. The exact runner used for the follow-up is `benchmark/followup_runner.py`.
## License
This checkpoint is derived from `ideogram-ai/ideogram-4-fp8` and follows the upstream Ideogram 4 non-commercial license. See `LICENSE.md`.