metadata
pipeline_tag: text-to-image
library_name: diffusers
tags:
- Z-Image
- quantization
- svdquant
- nunchaku
- fp4
- int4
base_model: tonera/Beyond_Reality_Zimage_v3_svdq
base_model_relation: quantized
license: apache-2.0
模型说明(SVDQuant)
文档语言:中文|English
模型名称
- 模型仓库:
tonera/Beyond_Reality_Zimage_v3_svdq - Base(Diffusers 权重路径):
tonera/Beyond_Reality_Zimage_v3_svdq(本仓库根目录) - 量化 Transformer 权重:
tonera/Beyond_Reality_Zimage_v3_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v3_svdq.safetensors - 原始模型: huggingface modelscope
量化 / 推理技术
- 推理引擎:Nunchaku(
https://github.com/nunchaku-ai/nunchaku)
Nunchaku 是一个面向 4-bit(FP4/INT4)低比特神经网络的高性能推理引擎,核心目标是在尽量保持生成质量的同时显著降低显存占用并提升推理速度。它实现并工程化了 SVDQuant 等后训练量化方案,并通过算子/内核融合等优化减少低秩分支带来的额外开销。
本模型仓库中的 Z-Image 量化权重(例如 svdq-*_r32-*.safetensors)用于配合 Nunchaku,在支持的 GPU 上进行高效推理。
量化质量(fp4)
PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15) SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15) LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
性能提升
- 推理配置:
bf16 / steps=9 / guidance_scale=0.0 - 分辨率(共 5 张):
1024x1024,1216x832,1344x768,832x1216,768x1344
冷启动性能对比(首张图端到端)
| GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
|---|---|---|---|---|---|---|
| RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
| RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
| RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
| RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
| RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
| RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
Warmup 后连续 5 张性能对比
| GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
|---|---|---|---|---|---|---|
| RTX 5090 | fp4 | total (5张) | 17.416s | 9.266s | 1.88x | +46.8% |
| RTX 5090 | fp4 | avg (单张) | 3.483s | 1.853s | 1.88x | +46.8% |
| RTX 3090 | int4 | total (5张) | 48.863s | 24.114s | 2.03x | +50.6% |
| RTX 3090 | int4 | avg (单张) | 9.773s | 4.823s | 2.03x | +50.6% |
说明:
- 两张显卡上,Nunchaku 在推理阶段(
cold_infer与 warmup 后)均表现出明显加速 load阶段在这组测试里 Nunchaku 更慢;更适合关注 warmup 后连续出图吞吐
使用前必须安装 Nunchaku
- 官方安装文档(建议以此为准):
https://nunchaku.tech/docs/nunchaku/installation/installation.html
(推荐)方式:安装官方预编译 Wheel
- 前置条件:安装
PyTorch >= 2.5(实际以对应 wheel 的要求为准) - 安装 nunchaku wheel:从 GitHub Releases / HuggingFace / ModelScope 选择与你环境匹配的 wheel(注意
cp311表示 Python 3.11):https://github.com/nunchaku-ai/nunchaku/releases
# 示例(请按你的 torch/cuda/python 版本选择正确的 wheel URL)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
- 提示(50 系 GPU):通常建议
CUDA >= 12.8,并优先使用 FP4 模型以获得更好的兼容性与性能(以官方文档为准)。
使用示例(Diffusers + Nunchaku Transformer)
import torch
from diffusers import ZImagePipeline
from nunchaku import NunchakuZImageTransformer2DModel
from nunchaku.utils import get_precision
MODEL = "Beyond_Reality_Zimage_v3_svdq"
REPO_ID = f"tonera/{MODEL}"
if __name__ == "__main__":
transformer = NunchakuZImageTransformer2DModel.from_pretrained(
f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
torch_dtype=torch.bfloat16,
)
pipe = ZImagePipeline.from_pretrained(
f"{REPO_ID}",
torch_dtype=torch.bfloat16,
transformer=transformer,
).to("cuda")
prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
image.save("beyond-reality-zimage-v3-svdq.png")
