docs: house-style metadata (base_model ideogram-4-fp8, tags, pipeline_tag)

52d9764 verified 12 days ago

5.54 kB

	---
	base_model:
	- ideogram-ai/ideogram-4-fp8
	base_model_relation: quantized
	pipeline_tag: text-to-image
	language:
	- en
	- zh
	tags:
	- text-to-image
	- diffusion
	- quantized
	- quantfunc
	- ideogram
	- precision-config
	license: apache-2.0
	---

	# QuantFunc

	<div align="center" style="margin-top: 50px;">
	<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/logo.webp" width="300" alt="Logo">
	</div>

	<p align="center">
	🤗 <a href="https://huggingface.co/QuantFunc">Hugging Face</a>  \|
	🤖 <a href="https://www.modelscope.cn/profile/QuantFunc">ModelScope</a>  \|
	💻 <a href="https://github.com/RealJonathanYip/ComfyUI-QuantFunc">GitHub</a>  \|
	💬 <a href="#wechat">WeChat (微信)</a>  \|
	🎮 <a href="https://discord.gg/jCp9TpFWcn">Discord</a>
	</p>

	# Ideogram-4-Series

	> ⚠️ Config-only repository — no model weights.
	> This repo contains only a QuantFunc per-layer precision config (`precision-config/ideogram4_a4w4.json`).
	> It does not contain, mirror, or redistribute any Ideogram model weights. You bring your own officially-obtained Ideogram 4 model; this config only tells the QuantFunc engine how to quantize it at load time, on your own machine.

	Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) — the fastest diffusion inference engine:

	- 🚀 2x–11x speedup over standard BF16/FP16 Python pipelines.
	- ⚙️ Native C++/CUDA (`libquantfunc.so` / `quantfunc.dll`), zero Python model dependencies.
	- 🧩 Universal format adapter — loads diffusers / BFL (Flux) / HF / nunchaku SVDQ layouts directly, no manual conversion.
	- 🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

	👉 Install the plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc

	## What this repository provides

	Just the precision config — no weights:

	```
	Ideogram-4-Series/
	├── config.json # canonical per-layer precision map (W4A4)
	└── precision-config/
	└── ideogram4_a4w4.json # identical copy, named for manual / plugin use
	```

	> `config.json` and `precision-config/ideogram4_a4w4.json` are identical. Both are the W4A4 precision map — pick whichever your workflow expects.

	We deliberately do not host Ideogram 4 weights. The QuantFunc Lighting backend does runtime quantization: you load the official weights and they are quantized in-memory at load, so no pre-quantized checkpoint is ever distributed.

	## How to use

	1. Obtain the official Ideogram 4 model yourself in any QuantFunc-supported layout (diffusers, BFL/Flux-style, or HF). Follow Ideogram's official distribution channels and license terms.
	2. Install the QuantFunc ComfyUI plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc
	3. Load the official model through the Build Pipeline node (universal format adapter).
	4. Precision config — leave the node on `auto detect` (it recognizes Ideogram 4 and applies `ideogram4_a4w4.json` automatically), or point it at this file manually. The Lighting engine then runtime-quantizes the transformer to W4A4 (4-bit heavy GEMMs + 8-bit sensitive projections).

	## Precision config — `ideogram4_a4w4.json`

	Per-layer precision map (mirrors the Klein-style configs). Measured on a dual-transformer 24 GB card (`cuda_overhead` 399 MB) to fit and render a coherent, prompt-matching image — with sharper detail than FP16-non-block.

	\| Layer group \| Precision \| Why \|
	\|---\|---\|---\|
	\| `layers.attention.qkv` · `layers.attention.o` \| 4-bit (AUTO_4 → INT4 on SM89, FP4 on SM120) \| self-attention projections; large K/N, quant-robust \|
	\| `layers.feed_forward.w1/w2/w3` \| 4-bit \| SwiGLU MLP — largest matrices, primary memory target \|
	\| `input_proj` · `llm_cond_proj` · `t_embedding.mlp_in/out` · `adaln_proj` · `final_layer.linear` \| 8-bit (AUTO_8 → FP8 on SM89+, INT8 older; W8A8) \| sensitive non-block projection GEMMs \|
	\| `layers.adaln_modulation` · `final_layer.adaln_modulation` \| FP16 \| M=1 modulation GEMVs — per-token activation quant collapses conditioning; engine skips them \|

	Net: 170 block GEMMs @ 4-bit · 5 non-block projection GEMMs @ AUTO_8 (FP8 on SM89) · 2 adaLN-modulation GEMVs @ FP16.

	Verified coherent on SM89 (INT8 dashboard-run + FP8 CLI-run, each `cuda_overhead` 399 MB). AUTO_8 picks FP8 on SM89 for better dynamic range on these sensitive projections.

	## Hardware

	- NVIDIA RTX 20-series and above (CUDA 12 & 13). Native FP4 on Blackwell (SM120); INT4 on SM89.
	- Fits a 24 GB card with the a4w4 map (measured `cuda_overhead` 399 MB).

	## Legal / Attribution

	- This repository distributes only the QuantFunc precision-config JSON — our own work, Apache-2.0.
	- It contains no Ideogram weights and is not affiliated with, nor endorsed by, Ideogram.
	- "Ideogram" is a trademark of its respective owner. You are solely responsible for obtaining the official model and complying with its license and terms of use.

	## Community

	- 🎮 [Discord server](https://discord.gg/jCp9TpFWcn)
	- 💬 Scan the QR code below to join our WeChat group:

	<div align="center" id="wechat">
	<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/WeChat.jpg" alt="WeChat Group" width="300">
	</div>