Model Card (SVDQuant)

Language: English | 中文

Model name

Model repo: tonera/Beyond_Reality_Zimage_v2_svdq
Base (Diffusers weights path): tonera/Beyond_Reality_Zimage_v2_svdq (repo root)
Quantized Transformer weights: tonera/Beyond_Reality_Zimage_v2_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v2_svdq.safetensors
Original model: huggingface modelscope

Quantization / inference tech

Inference engine: Nunchaku (https://github.com/nunchaku-ai/nunchaku)

Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and accelerate inference while keeping generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead from low-rank branches via operator/kernel fusion and other optimizations.

The Z-Image quantized weights in this repo (e.g. svdq-*_r32-*.safetensors) are designed to be used with Nunchaku for efficient inference on supported GPUs.

Quantization quality (fp4)

PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15) SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15) LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)

Performance

Config: bf16 / steps=9 / guidance_scale=0.0
Resolutions (5 images): 1024x1024, 1216x832, 1344x768, 832x1216, 768x1344

Cold start (end-to-end for the first image)

GPU	precision	metric	Diffusers	Nunchaku	speedup	gain
RTX 5090	fp4	load	4.911s	13.500s	0.36x	-174.9%
RTX 5090	fp4	cold_infer	3.945s	2.275s	1.73x	+42.3%
RTX 5090	fp4	cold_e2e	8.856s	15.775s	0.56x	-78.1%
RTX 3090	int4	load	6.934s	15.971s	0.43x	-130.3%
RTX 3090	int4	cold_infer	10.203s	5.178s	1.97x	+49.3%
RTX 3090	int4	cold_e2e	17.137s	21.149s	0.81x	-23.4%

After warmup (5 consecutive images)

GPU	precision	metric	Diffusers	Nunchaku	speedup	gain
RTX 5090	fp4	total (5 images)	17.416s	9.266s	1.88x	+46.8%
RTX 5090	fp4	avg (per image)	3.483s	1.853s	1.88x	+46.8%
RTX 3090	int4	total (5 images)	48.863s	24.114s	2.03x	+50.6%
RTX 3090	int4	avg (per image)	9.773s	4.823s	2.03x	+50.6%

Notes:

On both GPUs, Nunchaku provides clear speedups during inference (cold_infer and the post-warmup runs).
In this benchmark, Nunchaku is slower for load; it’s more meaningful to focus on post-warmup throughput.

Nunchaku is required

Official installation docs (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

(Recommended) Install the official prebuilt wheel

Prerequisite: PyTorch >= 2.5 (follow the wheel requirements)
Install a matching nunchaku wheel from GitHub Releases / HuggingFace / ModelScope (note: cp311 means Python 3.11):
- https://github.com/nunchaku-ai/nunchaku/releases

# Example (pick the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl

Tip (RTX 50 series): CUDA >= 12.8 is often recommended, and FP4 models are usually preferred for better compatibility/performance (follow official docs).

Usage (Diffusers + Nunchaku Transformer)

The following example is from: models/Beyond_Reality_Zimage_v2_svdq/infer.py.

import torch

from diffusers import ZImagePipeline
from nunchaku import NunchakuZImageTransformer2DModel
from nunchaku.utils import get_precision

MODEL = "Beyond_Reality_Zimage_v2_svdq"
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    transformer = NunchakuZImageTransformer2DModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
        torch_dtype=torch.bfloat16,
    )

    pipe = ZImagePipeline.from_pretrained(
        f"{REPO_ID}",
        torch_dtype=torch.bfloat16,
        transformer=transformer,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
    image.save("beyond-reality-zimage-v2-svdq.png")

Downloads last month: 33

Model tree for tonera/Beyond_Reality_Zimage_v2_svdq

Unable to build the model tree, the base model loops to the model itself. Learn more.

Collection including tonera/Beyond_Reality_Zimage_v2_svdq

Z-Image-Turbo-Nunchaku

Collection

3 items • Updated Mar 27