tonera's picture
Upload README_CN.md with huggingface_hub
1fcd734 verified
metadata
pipeline_tag: text-to-image
library_name: diffusers
tags:
  - Z-Image
  - quantization
  - svdquant
  - nunchaku
  - fp4
  - int4
base_model: tonera/Beyond_Reality_Zimage_v3_svdq
base_model_relation: quantized
license: apache-2.0

模型说明(SVDQuant)

文档语言:中文|English

模型名称

  • 模型仓库tonera/Beyond_Reality_Zimage_v3_svdq
  • Base(Diffusers 权重路径)tonera/Beyond_Reality_Zimage_v3_svdq(本仓库根目录)
  • 量化 Transformer 权重tonera/Beyond_Reality_Zimage_v3_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v3_svdq.safetensors
  • 原始模型huggingface modelscope

vitoom

量化 / 推理技术

  • 推理引擎:Nunchaku(https://github.com/nunchaku-ai/nunchaku

Nunchaku 是一个面向 4-bit(FP4/INT4)低比特神经网络的高性能推理引擎,核心目标是在尽量保持生成质量的同时显著降低显存占用并提升推理速度。它实现并工程化了 SVDQuant 等后训练量化方案,并通过算子/内核融合等优化减少低秩分支带来的额外开销。

本模型仓库中的 Z-Image 量化权重(例如 svdq-*_r32-*.safetensors)用于配合 Nunchaku,在支持的 GPU 上进行高效推理。

量化质量(fp4)

PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15) SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15) LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)

性能提升

  • 推理配置bf16 / steps=9 / guidance_scale=0.0
  • 分辨率(共 5 张)1024x1024, 1216x832, 1344x768, 832x1216, 768x1344

冷启动性能对比(首张图端到端)

GPU precision 指标 Diffusers Nunchaku 加速比 提升
RTX 5090 fp4 load 4.911s 13.500s 0.36x -174.9%
RTX 5090 fp4 cold_infer 3.945s 2.275s 1.73x +42.3%
RTX 5090 fp4 cold_e2e 8.856s 15.775s 0.56x -78.1%
RTX 3090 int4 load 6.934s 15.971s 0.43x -130.3%
RTX 3090 int4 cold_infer 10.203s 5.178s 1.97x +49.3%
RTX 3090 int4 cold_e2e 17.137s 21.149s 0.81x -23.4%

Warmup 后连续 5 张性能对比

GPU precision 指标 Diffusers Nunchaku 加速比 提升
RTX 5090 fp4 total (5张) 17.416s 9.266s 1.88x +46.8%
RTX 5090 fp4 avg (单张) 3.483s 1.853s 1.88x +46.8%
RTX 3090 int4 total (5张) 48.863s 24.114s 2.03x +50.6%
RTX 3090 int4 avg (单张) 9.773s 4.823s 2.03x +50.6%

说明

  • 两张显卡上,Nunchaku 在推理阶段(cold_infer 与 warmup 后)均表现出明显加速
  • load 阶段在这组测试里 Nunchaku 更慢;更适合关注 warmup 后连续出图吞吐

使用前必须安装 Nunchaku

  • 官方安装文档(建议以此为准):https://nunchaku.tech/docs/nunchaku/installation/installation.html

(推荐)方式:安装官方预编译 Wheel

  • 前置条件:安装 PyTorch >= 2.5(实际以对应 wheel 的要求为准)
  • 安装 nunchaku wheel:从 GitHub Releases / HuggingFace / ModelScope 选择与你环境匹配的 wheel(注意 cp311 表示 Python 3.11):
    • https://github.com/nunchaku-ai/nunchaku/releases
# 示例(请按你的 torch/cuda/python 版本选择正确的 wheel URL)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
  • 提示(50 系 GPU):通常建议 CUDA >= 12.8,并优先使用 FP4 模型以获得更好的兼容性与性能(以官方文档为准)。

使用示例(Diffusers + Nunchaku Transformer)

import torch

from diffusers import ZImagePipeline
from nunchaku import NunchakuZImageTransformer2DModel
from nunchaku.utils import get_precision

MODEL = "Beyond_Reality_Zimage_v3_svdq"
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    transformer = NunchakuZImageTransformer2DModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
        torch_dtype=torch.bfloat16,
    )

    pipe = ZImagePipeline.from_pretrained(
        f"{REPO_ID}",
        torch_dtype=torch.bfloat16,
        transformer=transformer,
    ).to("cuda")

    prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
    image.save("beyond-reality-zimage-v3-svdq.png")