Ideogram-4-Series / README.md
QuantFunc's picture
docs: house-style metadata (base_model ideogram-4-fp8, tags, pipeline_tag)
52d9764 verified
|
Raw
History Blame Contribute Delete
5.54 kB
---
base_model:
- ideogram-ai/ideogram-4-fp8
base_model_relation: quantized
pipeline_tag: text-to-image
language:
- en
- zh
tags:
- text-to-image
- diffusion
- quantized
- quantfunc
- ideogram
- precision-config
license: apache-2.0
---
# QuantFunc
<div align="center" style="margin-top: 50px;">
<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/logo.webp" width="300" alt="Logo">
</div>
<p align="center">
๐Ÿค— <a href="https://huggingface.co/QuantFunc">Hugging Face</a> &nbsp;|&nbsp;
๐Ÿค– <a href="https://www.modelscope.cn/profile/QuantFunc">ModelScope</a> &nbsp;|&nbsp;
๐Ÿ’ป <a href="https://github.com/RealJonathanYip/ComfyUI-QuantFunc">GitHub</a> &nbsp;|&nbsp;
๐Ÿ’ฌ <a href="#wechat">WeChat (ๅพฎไฟก)</a> &nbsp;|&nbsp;
๐ŸŽฎ <a href="https://discord.gg/jCp9TpFWcn">Discord</a>
</p>
# Ideogram-4-Series
> โš ๏ธ **Config-only repository โ€” no model weights.**
> This repo contains **only** a QuantFunc per-layer **precision config** (`precision-config/ideogram4_a4w4.json`).
> It does **not** contain, mirror, or redistribute any Ideogram model weights. **You bring your own** officially-obtained Ideogram 4 model; this config only tells the QuantFunc engine how to quantize it **at load time, on your own machine**.
**Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) โ€” the fastest diffusion inference engine:**
- ๐Ÿš€ **2xโ€“11x speedup** over standard BF16/FP16 Python pipelines.
- โš™๏ธ **Native C++/CUDA** (`libquantfunc.so` / `quantfunc.dll`), **zero Python model dependencies**.
- ๐Ÿงฉ **Universal format adapter** โ€” loads **diffusers / BFL (Flux) / HF / nunchaku SVDQ** layouts directly, no manual conversion.
- ๐ŸŸข **Full GPU coverage** โ€” RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native **FP4** on Blackwell.
๐Ÿ‘‰ **Install the plugin:** **https://github.com/RealJonathanYip/ComfyUI-QuantFunc**
## What this repository provides
Just the precision config โ€” **no weights**:
```
Ideogram-4-Series/
โ”œโ”€โ”€ config.json # canonical per-layer precision map (W4A4)
โ””โ”€โ”€ precision-config/
โ””โ”€โ”€ ideogram4_a4w4.json # identical copy, named for manual / plugin use
```
> `config.json` and `precision-config/ideogram4_a4w4.json` are **identical**. Both are the W4A4 precision map โ€” pick whichever your workflow expects.
We deliberately **do not host Ideogram 4 weights**. The QuantFunc **Lighting** backend does **runtime** quantization: you load the *official* weights and they are quantized **in-memory at load**, so no pre-quantized checkpoint is ever distributed.
## How to use
1. **Obtain the official Ideogram 4 model yourself** in any QuantFunc-supported layout (**diffusers**, BFL/Flux-style, or HF). Follow Ideogram's official distribution channels and license terms.
2. **Install the QuantFunc ComfyUI plugin:** https://github.com/RealJonathanYip/ComfyUI-QuantFunc
3. **Load the official model** through the **Build Pipeline** node (universal format adapter).
4. **Precision config** โ€” leave the node on **`auto detect`** (it recognizes Ideogram 4 and applies `ideogram4_a4w4.json` automatically), or point it at this file manually. The Lighting engine then runtime-quantizes the transformer to **W4A4** (4-bit heavy GEMMs + 8-bit sensitive projections).
## Precision config โ€” `ideogram4_a4w4.json`
Per-layer precision map (mirrors the Klein-style configs). **Measured** on a dual-transformer **24 GB** card (`cuda_overhead` 399 MB) to fit and render a coherent, prompt-matching image โ€” with sharper detail than FP16-non-block.
| Layer group | Precision | Why |
|---|---|---|
| `layers.attention.qkv` ยท `layers.attention.o` | **4-bit** (AUTO_4 โ†’ INT4 on SM89, FP4 on SM120) | self-attention projections; large K/N, quant-robust |
| `layers.feed_forward.w1/w2/w3` | **4-bit** | SwiGLU MLP โ€” largest matrices, primary memory target |
| `input_proj` ยท `llm_cond_proj` ยท `t_embedding.mlp_in/out` ยท `adaln_proj` ยท `final_layer.linear` | **8-bit** (AUTO_8 โ†’ FP8 on SM89+, INT8 older; W8A8) | sensitive non-block projection GEMMs |
| `layers.adaln_modulation` ยท `final_layer.adaln_modulation` | **FP16** | M=1 modulation GEMVs โ€” per-token activation quant collapses conditioning; engine skips them |
**Net:** 170 block GEMMs @ 4-bit ยท 5 non-block projection GEMMs @ AUTO_8 (FP8 on SM89) ยท 2 adaLN-modulation GEMVs @ FP16.
Verified coherent on SM89 (INT8 dashboard-run + FP8 CLI-run, each `cuda_overhead` 399 MB). AUTO_8 picks **FP8** on SM89 for better dynamic range on these sensitive projections.
## Hardware
- NVIDIA **RTX 20-series and above** (CUDA 12 & 13). Native **FP4** on Blackwell (SM120); INT4 on SM89.
- Fits a **24 GB** card with the a4w4 map (measured `cuda_overhead` 399 MB).
## Legal / Attribution
- This repository distributes **only** the QuantFunc precision-config JSON โ€” our own work, Apache-2.0.
- It contains **no Ideogram weights** and is **not affiliated with, nor endorsed by, Ideogram**.
- "Ideogram" is a trademark of its respective owner. You are solely responsible for obtaining the official model and complying with its license and terms of use.
## Community
- ๐ŸŽฎ [Discord server](https://discord.gg/jCp9TpFWcn)
- ๐Ÿ’ฌ Scan the QR code below to join our WeChat group:
<div align="center" id="wechat">
<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/WeChat.jpg" alt="WeChat Group" width="300">
</div>