Model Card (SVDQuant)
Language: English | 中文
Model name
- Model repo:
tonera/Beyond_Reality_Zimage_v2_svdq - Base (Diffusers weights path):
tonera/Beyond_Reality_Zimage_v2_svdq(repo root) - Quantized Transformer weights:
tonera/Beyond_Reality_Zimage_v2_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v2_svdq.safetensors - Original model: huggingface modelscope
Quantization / inference tech
- Inference engine: Nunchaku (
https://github.com/nunchaku-ai/nunchaku)
Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and accelerate inference while keeping generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead from low-rank branches via operator/kernel fusion and other optimizations.
The Z-Image quantized weights in this repo (e.g. svdq-*_r32-*.safetensors) are designed to be used with Nunchaku for efficient inference on supported GPUs.
Quantization quality (fp4)
PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15) SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15) LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
Performance
- Config:
bf16 / steps=9 / guidance_scale=0.0 - Resolutions (5 images):
1024x1024,1216x832,1344x768,832x1216,768x1344
Cold start (end-to-end for the first image)
| GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
|---|---|---|---|---|---|---|
| RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
| RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
| RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
| RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
| RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
| RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
After warmup (5 consecutive images)
| GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
|---|---|---|---|---|---|---|
| RTX 5090 | fp4 | total (5 images) | 17.416s | 9.266s | 1.88x | +46.8% |
| RTX 5090 | fp4 | avg (per image) | 3.483s | 1.853s | 1.88x | +46.8% |
| RTX 3090 | int4 | total (5 images) | 48.863s | 24.114s | 2.03x | +50.6% |
| RTX 3090 | int4 | avg (per image) | 9.773s | 4.823s | 2.03x | +50.6% |
Notes:
- On both GPUs, Nunchaku provides clear speedups during inference (
cold_inferand the post-warmup runs). - In this benchmark, Nunchaku is slower for
load; it’s more meaningful to focus on post-warmup throughput.
Nunchaku is required
- Official installation docs (recommended source of truth):
https://nunchaku.tech/docs/nunchaku/installation/installation.html
(Recommended) Install the official prebuilt wheel
- Prerequisite:
PyTorch >= 2.5(follow the wheel requirements) - Install a matching nunchaku wheel from GitHub Releases / HuggingFace / ModelScope (note:
cp311means Python 3.11):https://github.com/nunchaku-ai/nunchaku/releases
# Example (pick the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
- Tip (RTX 50 series): CUDA
>= 12.8is often recommended, and FP4 models are usually preferred for better compatibility/performance (follow official docs).
Usage (Diffusers + Nunchaku Transformer)
The following example is from: models/Beyond_Reality_Zimage_v2_svdq/infer.py.
import torch
from diffusers import ZImagePipeline
from nunchaku import NunchakuZImageTransformer2DModel
from nunchaku.utils import get_precision
MODEL = "Beyond_Reality_Zimage_v2_svdq"
REPO_ID = f"tonera/{MODEL}"
if __name__ == "__main__":
transformer = NunchakuZImageTransformer2DModel.from_pretrained(
f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
torch_dtype=torch.bfloat16,
)
pipe = ZImagePipeline.from_pretrained(
f"{REPO_ID}",
torch_dtype=torch.bfloat16,
transformer=transformer,
).to("cuda")
prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
image.save("beyond-reality-zimage-v2-svdq.png")
- Downloads last month
- 20
Model tree for tonera/Beyond_Reality_Zimage_v2_svdq
Unable to build the model tree, the base model loops to the model itself. Learn more.
