Model Card (SVDQuant)

Language: English | 中文

Model name

  • Model repo: tonera/Beyond_Reality_Zimage_v2_svdq
  • Base (Diffusers weights path): tonera/Beyond_Reality_Zimage_v2_svdq (repo root)
  • Quantized Transformer weights: tonera/Beyond_Reality_Zimage_v2_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v2_svdq.safetensors
  • Original model: huggingface modelscope

vitoom

Quantization / inference tech

  • Inference engine: Nunchaku (https://github.com/nunchaku-ai/nunchaku)

Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and accelerate inference while keeping generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead from low-rank branches via operator/kernel fusion and other optimizations.

The Z-Image quantized weights in this repo (e.g. svdq-*_r32-*.safetensors) are designed to be used with Nunchaku for efficient inference on supported GPUs.

Quantization quality (fp4)

PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15) SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15) LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)

Performance

  • Config: bf16 / steps=9 / guidance_scale=0.0
  • Resolutions (5 images): 1024x1024, 1216x832, 1344x768, 832x1216, 768x1344

Cold start (end-to-end for the first image)

GPU precision metric Diffusers Nunchaku speedup gain
RTX 5090 fp4 load 4.911s 13.500s 0.36x -174.9%
RTX 5090 fp4 cold_infer 3.945s 2.275s 1.73x +42.3%
RTX 5090 fp4 cold_e2e 8.856s 15.775s 0.56x -78.1%
RTX 3090 int4 load 6.934s 15.971s 0.43x -130.3%
RTX 3090 int4 cold_infer 10.203s 5.178s 1.97x +49.3%
RTX 3090 int4 cold_e2e 17.137s 21.149s 0.81x -23.4%

After warmup (5 consecutive images)

GPU precision metric Diffusers Nunchaku speedup gain
RTX 5090 fp4 total (5 images) 17.416s 9.266s 1.88x +46.8%
RTX 5090 fp4 avg (per image) 3.483s 1.853s 1.88x +46.8%
RTX 3090 int4 total (5 images) 48.863s 24.114s 2.03x +50.6%
RTX 3090 int4 avg (per image) 9.773s 4.823s 2.03x +50.6%

Notes:

  • On both GPUs, Nunchaku provides clear speedups during inference (cold_infer and the post-warmup runs).
  • In this benchmark, Nunchaku is slower for load; it’s more meaningful to focus on post-warmup throughput.

Nunchaku is required

  • Official installation docs (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

(Recommended) Install the official prebuilt wheel

  • Prerequisite: PyTorch >= 2.5 (follow the wheel requirements)
  • Install a matching nunchaku wheel from GitHub Releases / HuggingFace / ModelScope (note: cp311 means Python 3.11):
    • https://github.com/nunchaku-ai/nunchaku/releases
# Example (pick the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
  • Tip (RTX 50 series): CUDA >= 12.8 is often recommended, and FP4 models are usually preferred for better compatibility/performance (follow official docs).

Usage (Diffusers + Nunchaku Transformer)

The following example is from: models/Beyond_Reality_Zimage_v2_svdq/infer.py.

import torch

from diffusers import ZImagePipeline
from nunchaku import NunchakuZImageTransformer2DModel
from nunchaku.utils import get_precision

MODEL = "Beyond_Reality_Zimage_v2_svdq"
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    transformer = NunchakuZImageTransformer2DModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
        torch_dtype=torch.bfloat16,
    )

    pipe = ZImagePipeline.from_pretrained(
        f"{REPO_ID}",
        torch_dtype=torch.bfloat16,
        transformer=transformer,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
    image.save("beyond-reality-zimage-v2-svdq.png")
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonera/Beyond_Reality_Zimage_v2_svdq

Unable to build the model tree, the base model loops to the model itself. Learn more.