--- license: other license_name: ideogram-4-non-commercial base_model: ideogram-ai/ideogram-4-fp8 pipeline_tag: text-to-image tags: - ideogram - text-to-image - sdnq - uint4 - diffusion - typography --- # Ideogram 4 FP8 -> SDNQ UInt4 This is an experimental SDNQ UInt4 conversion of `ideogram-ai/ideogram-4-fp8`. It is intended for local research and non-commercial use under the upstream Ideogram 4 license. The conversion was made from the FP8 checkpoint, materializing FP8 linears back to bf16 and then applying static SDNQ `uint4` component-by-component. The model includes SDNQ-compressed `text_encoder`, `transformer`, `unconditional_transformer`, and `vae` components. The official `ideogram4` loader does not know how to instantiate SDNQ-packed custom transformers, so this repository includes `ideogram4_sdnq_pipeline.py`. ## Usage ```python import torch from ideogram4 import PRESETS from ideogram4_sdnq_pipeline import Ideogram4SDNQPipeline pipe = Ideogram4SDNQPipeline.from_pretrained( "WaveCut/ideogram-4-sdnq-uint4", device="cuda", dtype=torch.bfloat16, ) preset = PRESETS["V4_DEFAULT_20"] image = pipe( "a typographic poster reading HELLO WORLD", height=1024, width=1024, num_steps=preset.num_steps, guidance_schedule=preset.guidance_schedule, mu=preset.mu, std=preset.std, seed=4101, raise_on_caption_issues=False, )[0] image.save("out.png") ``` Install requirements: ```bash pip install git+https://github.com/ideogram-oss/ideogram4 sdnq safetensors transformers accelerate pillow ``` ## Component Structure Upstream FP8 structure: - `text_encoder`: Qwen3-VL text path used in text-only mode. Hidden states from 13 layers are concatenated for the DiT. - `transformer`: conditional 34-layer single-stream DiT. - `unconditional_transformer`: image-only negative branch used for asymmetric CFG. - `vae`: Flux2-style KL autoencoder decoder. - `tokenizer` and `scheduler`: copied from upstream. ## Quantization | Component | Source materialized MB | SDNQ state MB | Quantize s | Quant peak nvidia MB | | --- | --- | --- | --- | --- | | transformer | 17698.84 | 4979.66 | 112.64 | 36525.00 | | unconditional_transformer | 17698.84 | 4979.66 | 108.68 | 36525.00 | | text_encoder | 14435.59 | 4097.53 | 102.32 | 24477.00 | | vae | 160.31 | 50.19 | 2.68 | 861.00 | ## Benchmark Hardware: RunPod NVIDIA RTX PRO 6000 Blackwell Server Edition, single process, concurrency 1. Generation used 10 structured JSON prompts at 1024x1024 with `V4_DEFAULT_20`. The FP8 baseline was loaded through the upstream `ideogram4` `Ideogram4Pipeline.from_pretrained` recipe with `weights_repo="ideogram-ai/ideogram-4-fp8"`; magic-prompt expansion was disabled because the prompts are already structured captions. | Variant | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Gen peak reserved MB | Gen peak nvidia MB | | --- | --- | --- | --- | --- | --- | --- | --- | | original | 267.83 | 28198.00 | 28759.00 | 17.90 | 17.51 | 34430.00 | 35099.00 | | sdnq | 239.46 | 14558.00 | 15109.00 | 18.56 | 16.52 | 21650.00 | 22321.00 | ## Example Matrix The matrix below keeps the original FP8 and SDNQ UInt4 outputs side by side in narrow vertical columns. It is a WebP at quality 95. ![Original FP8 vs SDNQ UInt4 vertical comparison](assets/original_vs_sdnq_vertical.webp) ## Prompt Set | # | id | summary | | --- | --- | --- | | 1 | `editorial_watch_photo` | A photorealistic editorial product photograph of a transparent mechanical wristwatch resting on a wet black stone slab, with tiny engraved labels visible on the dial. | | 2 | `risograph_botanical_poster` | A layered risograph botanical exhibition poster with bold overprint textures and clean typographic hierarchy. | | 3 | `cyrillic_cafe_menu` | A cozy Moscow cafe menu board photographed straight-on, testing clean Cyrillic typography in chalk and printed labels. | | 4 | `brutalist_architecture` | A cinematic architectural photograph of a brutalist library atrium with tiny wayfinding signs and people for scale. | | 5 | `ink_manga_rain` | A dramatic black-and-white manga splash page of a courier cycling through rain, with sound effects and shop signage. | | 6 | `museum_clay_render` | A polished 3D clay render of a museum diorama showing a future Arctic research station with labeled miniature modules. | | 7 | `food_packaging_label` | A realistic premium chocolate bar packaging mockup with layered foil, embossed typography, and ingredient microcopy. | | 8 | `fantasy_map_typography` | A hand-painted fantasy map on parchment with readable place names, compass ornament, and coastal illustrations. | | 9 | `streetwear_lookbook` | A fashion lookbook cover photograph for a streetwear collection, with crisp cover typography and realistic fabric textures. | | 10 | `scientific_cutaway` | A detailed scientific cutaway illustration of a compact fusion battery prototype with annotated parts and clean technical typography. | ## Files - `prompts.json`: the 10 structured prompts used for the comparison. - `assets/original_vs_sdnq_vertical.webp`: vertical side-by-side WebP comparison matrix for original FP8 vs SDNQ UInt4, quality 95. - `assets/sdnq_vs_nf4_4090_vertical.webp`: vertical side-by-side WebP comparison matrix for the RTX 4090 SDNQ vs official NF4 follow-up, quality 95. - `benchmark/`: raw benchmark JSONL/CSV files and `summary.json`. - `quantization_manifest.json`: component-level quantization timings, storage, and VRAM peaks. - `ideogram4_sdnq_pipeline.py`: loader helper for the SDNQ custom transformer components. ## RTX 4090 Follow-up: SDNQ UInt4 vs Official NF4 Hardware: RunPod NVIDIA GeForce RTX 4090, 24 GB VRAM, single process, concurrency 1. Both variants used the same 10 structured captions from `prompts.json`, 1024x1024, `V4_DEFAULT_20`, and no magic-prompt expansion. `nf4` uses the official `ideogram-ai/ideogram-4-nf4` checkpoint through the upstream `ideogram4` loader. | Variant | Cases | Load s | Load peak reserved MB | Load peak nvidia MB | Cold request s | Hot mean s | Hot max s | Gen peak reserved MB | Gen peak nvidia MB | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | sdnq | 10.00 | 211.61 | 14124.00 | 14466.00 | 59.65 | 37.05 | 37.57 | 19768.00 | 20521.00 | | nf4 | 10.00 | 269.31 | 15370.00 | 15766.00 | 36.57 | 36.31 | 36.77 | 21012.00 | 21801.00 | ![SDNQ vs official NF4 on RTX 4090](assets/sdnq_vs_nf4_4090_vertical.webp) Raw follow-up metrics are in `benchmark/summary_4090_sdnq_vs_nf4.json`, `benchmark/sdnq_4090_metrics.*`, and `benchmark/nf4_4090_metrics.*`. The exact runner used for the follow-up is `benchmark/followup_runner.py`. ## License This checkpoint is derived from `ideogram-ai/ideogram-4-fp8` and follows the upstream Ideogram 4 non-commercial license. See `LICENSE.md`.