Model Card (SVDQuant)

Language: English | 中文

Model Name

  • Model repo: tonera/protovisionXLHighFidelity3D_V660
  • Base (Diffusers weights path): tonera/protovisionXLHighFidelity3D_V660 (repo root)
  • Quantized UNet weights: tonera/protovisionXLHighFidelity3D_V660/svdq-<precision>_r32-protovisionXLHighFidelity3D_V660.safetensors

Quantization / Inference Tech

  • Inference engine: Nunchaku (https://github.com/nunchaku-ai/nunchaku)

Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.

The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.

Quantization Quality (fp8)

PSNR: mean=19.5746 p50=19.7498 p90=24.2246 best=26.2204 worst=12.8562 (N=25)
SSIM: mean=0.747737 p50=0.757871 p90=0.867436 best=0.885265 worst=0.538331 (N=25)
LPIPS: mean=0.283842 p50=0.286661 p90=0.431429 best=0.0962036 worst=0.612014 (N=25)

Performance

Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).

  • Inference config: bf16 / steps=30 / guidance_scale=5.0
  • Resolutions (5 images each, batch=5): 1024x1024, 1024x768, 768x1024, 832x1216, 1216x832
  • Software versions: torch 2.9 / cuda 12.8 / nunchaku 1.1.0+torch2.9 / diffusers 0.37.0.dev0
  • Optimization switches: no torch.compile, no explicit cudnn tuning flags

Cold-start performance (end-to-end for the first image)

GPU Metric Diffusers Nunchaku Speedup Gain
RTX 5090 load 3.505s 3.432s 1.02x +2.1%
RTX 5090 cold_infer 2.944s 2.447s 1.20x +16.9%
RTX 5090 cold_e2e 6.449s 5.880s 1.10x +8.8%
RTX 3090 load 3.787s 3.442s 1.10x +9.1%
RTX 3090 cold_infer 7.503s 5.231s 1.43x +30.3%
RTX 3090 cold_e2e 11.290s 8.673s 1.30x +23.2%

Steady-state performance (5 consecutive images after warmup)

GPU Metric Diffusers Nunchaku Speedup Gain
RTX 5090 total (5 images) 12.937s 9.813s 1.32x +24.2%
RTX 5090 avg (per image) 2.587s 1.963s 1.32x +24.2%
RTX 3090 total (5 images) 33.413s 22.975s 1.45x +31.2%
RTX 3090 avg (per image) 6.683s 4.595s 1.45x +31.2%

Notes:

  • The longer load time on RTX 3090 is due to extra one-time processing when loading quantized weights.
  • During inference (cold_infer and steady-state), Nunchaku shows clear speedups on both GPUs.

Nunchaku Installation Required

  • Official installation docs (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

(Recommended) Install the official prebuilt wheel

  • Prerequisite: PyTorch >= 2.5 (follow the wheel requirements)
  • Install Nunchaku wheel: choose a wheel matching your torch/cuda/python versions from GitHub Releases / HuggingFace / ModelScope (note cp311 means Python 3.11):
    • https://github.com/nunchaku-ai/nunchaku/releases
# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
  • Tip (RTX 50 series): typically prefer CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).

Usage Example (Diffusers + Nunchaku UNet)

import torch
from diffusers import StableDiffusionXLPipeline

from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision

MODEL = "protovisionXLHighFidelity3D_V660"  # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
    )

    pipe = StableDiffusionXLPipeline.from_pretrained(
        f"{REPO_ID}",
        unet=unet,
        torch_dtype=torch.bfloat16,
        use_safetensors=True,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
    image.save("sdxl.png")
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support