Model Card (SVDQuant)

Language: English | 中文

Model Name

Model repo: tonera/protovisionXLHighFidelity3D_V660
Base (Diffusers weights path): tonera/protovisionXLHighFidelity3D_V660 (repo root)
Quantized UNet weights: tonera/protovisionXLHighFidelity3D_V660/svdq-<precision>_r32-protovisionXLHighFidelity3D_V660.safetensors

Quantization / Inference Tech

Inference engine: Nunchaku (https://github.com/nunchaku-ai/nunchaku)

Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.

The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.

Quantization Quality (fp8)

PSNR: mean=19.5746 p50=19.7498 p90=24.2246 best=26.2204 worst=12.8562 (N=25)
SSIM: mean=0.747737 p50=0.757871 p90=0.867436 best=0.885265 worst=0.538331 (N=25)
LPIPS: mean=0.283842 p50=0.286661 p90=0.431429 best=0.0962036 worst=0.612014 (N=25)

Performance

Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).

Inference config: bf16 / steps=30 / guidance_scale=5.0
Resolutions (5 images each, batch=5): 1024x1024, 1024x768, 768x1024, 832x1216, 1216x832
Software versions: torch 2.9 / cuda 12.8 / nunchaku 1.1.0+torch2.9 / diffusers 0.37.0.dev0
Optimization switches: no torch.compile, no explicit cudnn tuning flags

Cold-start performance (end-to-end for the first image)

GPU	Metric	Diffusers	Nunchaku	Speedup	Gain
RTX 5090	load	3.505s	3.432s	1.02x	+2.1%
RTX 5090	cold_infer	2.944s	2.447s	1.20x	+16.9%
RTX 5090	cold_e2e	6.449s	5.880s	1.10x	+8.8%
RTX 3090	load	3.787s	3.442s	1.10x	+9.1%
RTX 3090	cold_infer	7.503s	5.231s	1.43x	+30.3%
RTX 3090	cold_e2e	11.290s	8.673s	1.30x	+23.2%

Steady-state performance (5 consecutive images after warmup)

GPU	Metric	Diffusers	Nunchaku	Speedup	Gain
RTX 5090	total (5 images)	12.937s	9.813s	1.32x	+24.2%
RTX 5090	avg (per image)	2.587s	1.963s	1.32x	+24.2%
RTX 3090	total (5 images)	33.413s	22.975s	1.45x	+31.2%
RTX 3090	avg (per image)	6.683s	4.595s	1.45x	+31.2%

Notes:

The longer load time on RTX 3090 is due to extra one-time processing when loading quantized weights.
During inference (cold_infer and steady-state), Nunchaku shows clear speedups on both GPUs.

Nunchaku Installation Required

Official installation docs (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

(Recommended) Install the official prebuilt wheel

Prerequisite: PyTorch >= 2.5 (follow the wheel requirements)
Install Nunchaku wheel: choose a wheel matching your torch/cuda/python versions from GitHub Releases / HuggingFace / ModelScope (note cp311 means Python 3.11):
- https://github.com/nunchaku-ai/nunchaku/releases

# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl

Tip (RTX 50 series): typically prefer CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).

Usage Example (Diffusers + Nunchaku UNet)

import torch
from diffusers import StableDiffusionXLPipeline

from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision

MODEL = "protovisionXLHighFidelity3D_V660"  # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
    )

    pipe = StableDiffusionXLPipeline.from_pretrained(
        f"{REPO_ID}",
        unet=unet,
        torch_dtype=torch.bfloat16,
        use_safetensors=True,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
    image.save("sdxl.png")

Downloads last month: 14