--- base_model: - ideogram-ai/ideogram-4-fp8 base_model_relation: quantized pipeline_tag: text-to-image language: - en - zh tags: - text-to-image - diffusion - quantized - quantfunc - ideogram - precision-config license: apache-2.0 --- # QuantFunc

🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 💬 WeChat (微信) | 🎮 Discord

# Ideogram-4-Series > ⚠️ **Config-only repository — no model weights.** > This repo contains **only** a QuantFunc per-layer **precision config** (`precision-config/ideogram4_a4w4.json`). > It does **not** contain, mirror, or redistribute any Ideogram model weights. **You bring your own** officially-obtained Ideogram 4 model; this config only tells the QuantFunc engine how to quantize it **at load time, on your own machine**. **Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) — the fastest diffusion inference engine:** - 🚀 **2x–11x speedup** over standard BF16/FP16 Python pipelines. - ⚙️ **Native C++/CUDA** (`libquantfunc.so` / `quantfunc.dll`), **zero Python model dependencies**. - 🧩 **Universal format adapter** — loads **diffusers / BFL (Flux) / HF / nunchaku SVDQ** layouts directly, no manual conversion. - 🟢 **Full GPU coverage** — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native **FP4** on Blackwell. 👉 **Install the plugin:** **https://github.com/RealJonathanYip/ComfyUI-QuantFunc** ## What this repository provides Just the precision config — **no weights**: ``` Ideogram-4-Series/ ├── config.json # canonical per-layer precision map (W4A4) └── precision-config/ └── ideogram4_a4w4.json # identical copy, named for manual / plugin use ``` > `config.json` and `precision-config/ideogram4_a4w4.json` are **identical**. Both are the W4A4 precision map — pick whichever your workflow expects. We deliberately **do not host Ideogram 4 weights**. The QuantFunc **Lighting** backend does **runtime** quantization: you load the *official* weights and they are quantized **in-memory at load**, so no pre-quantized checkpoint is ever distributed. ## How to use 1. **Obtain the official Ideogram 4 model yourself** in any QuantFunc-supported layout (**diffusers**, BFL/Flux-style, or HF). Follow Ideogram's official distribution channels and license terms. 2. **Install the QuantFunc ComfyUI plugin:** https://github.com/RealJonathanYip/ComfyUI-QuantFunc 3. **Load the official model** through the **Build Pipeline** node (universal format adapter). 4. **Precision config** — leave the node on **`auto detect`** (it recognizes Ideogram 4 and applies `ideogram4_a4w4.json` automatically), or point it at this file manually. The Lighting engine then runtime-quantizes the transformer to **W4A4** (4-bit heavy GEMMs + 8-bit sensitive projections). ## Precision config — `ideogram4_a4w4.json` Per-layer precision map (mirrors the Klein-style configs). **Measured** on a dual-transformer **24 GB** card (`cuda_overhead` 399 MB) to fit and render a coherent, prompt-matching image — with sharper detail than FP16-non-block. | Layer group | Precision | Why | |---|---|---| | `layers.attention.qkv` · `layers.attention.o` | **4-bit** (AUTO_4 → INT4 on SM89, FP4 on SM120) | self-attention projections; large K/N, quant-robust | | `layers.feed_forward.w1/w2/w3` | **4-bit** | SwiGLU MLP — largest matrices, primary memory target | | `input_proj` · `llm_cond_proj` · `t_embedding.mlp_in/out` · `adaln_proj` · `final_layer.linear` | **8-bit** (AUTO_8 → FP8 on SM89+, INT8 older; W8A8) | sensitive non-block projection GEMMs | | `layers.adaln_modulation` · `final_layer.adaln_modulation` | **FP16** | M=1 modulation GEMVs — per-token activation quant collapses conditioning; engine skips them | **Net:** 170 block GEMMs @ 4-bit · 5 non-block projection GEMMs @ AUTO_8 (FP8 on SM89) · 2 adaLN-modulation GEMVs @ FP16. Verified coherent on SM89 (INT8 dashboard-run + FP8 CLI-run, each `cuda_overhead` 399 MB). AUTO_8 picks **FP8** on SM89 for better dynamic range on these sensitive projections. ## Hardware - NVIDIA **RTX 20-series and above** (CUDA 12 & 13). Native **FP4** on Blackwell (SM120); INT4 on SM89. - Fits a **24 GB** card with the a4w4 map (measured `cuda_overhead` 399 MB). ## Legal / Attribution - This repository distributes **only** the QuantFunc precision-config JSON — our own work, Apache-2.0. - It contains **no Ideogram weights** and is **not affiliated with, nor endorsed by, Ideogram**. - "Ideogram" is a trademark of its respective owner. You are solely responsible for obtaining the official model and complying with its license and terms of use. ## Community - 🎮 [Discord server](https://discord.gg/jCp9TpFWcn) - 💬 Scan the QR code below to join our WeChat group: