Instructions to use ideogram-ai/ideogram-4-nf4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ideogram-ai/ideogram-4-nf4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ideogram-ai/ideogram-4-nf4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
nf4 build: visible gradient banding + vertical streaks in smooth skies
Running ideogram-ai/ideogram-4-nf4 locally via Ideogram4Pipeline (diffusers, bf16, weights kept nf4-quantized, eager - no AOTI), I'm seeing color banding and faint vertical streaks in large smooth gradient regions (e.g. golden-hour skies). They're present in the raw saved PNG at native 1024x1024, not just on-screen scaling.
- Model:
ideogram-ai/ideogram-4-nf4 - diffusers: PR #13860 head (
04b197e...), transformers 5.8.0, bitsandbytes (nf4) - GPU: NVIDIA (CUDA), Windows
- Steps: Default 20 (also tried higher); guidance schedule 7.0 -> 3.0
Questions:
- Is this banding expected from the nf4 quantization, and does the fp8 build improve smooth-gradient fidelity?
- Any recommended settings (steps, guidance, dtype) to reduce banding?
- Is output dithering / higher-precision VAE decode advisable?
Repro prompt (JSON caption via local Qwen upsampler): "a movie poster for 'THE LAST SUMMER' with dramatic golden-hour backlighting".
Update: root cause likely matches the diffusers MRoPE inv_freq float32->bfloat16 downcast (see ideogram4 issue #24).
I verified on the installed diffusers Ideogram4Pipeline (nf4, orch_dtype=torch.bfloat16, CUDA/Windows):
- In diffusers/models/transformers/transformer_ideogram4.py, inv_freq is built in float32 (persistent=False), but Ideogram4MRoPE(...).to(torch.bfloat16) downcasts it to bfloat16. rom_pretrained(torch_dtype=torch.bfloat16) triggers exactly this.
- orward() re-casts to float32, but only after precision was already lost, so it does not help.
- IMAGE_POSITION_OFFSET = 65536 (vs 10000 in the ideogram4 inference repo), so the error is larger here.
Measured impact at position 65536 with bf16 inv_freq:
- max phase error ~58.8 rad (>9 full rotations)
- 12/64 frequencies have >pi rad error; 28/64 have >0.1 rad error
So part of what I reported here is likely this positional-encoding corruption, fixable by restoring inv_freq in float32 after load. (The smooth-gradient banding/streaks may be a separate 4-bit quantization effect.)
