Model Card (SVDQuant)
Language: English | 中文
Model Name
- Model repo:
tonera/protovisionXLHighFidelity3D_V660 - Base (Diffusers weights path):
tonera/protovisionXLHighFidelity3D_V660(repo root) - Quantized UNet weights:
tonera/protovisionXLHighFidelity3D_V660/svdq-<precision>_r32-protovisionXLHighFidelity3D_V660.safetensors
Quantization / Inference Tech
- Inference engine: Nunchaku (
https://github.com/nunchaku-ai/nunchaku)
Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.
The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.
Quantization Quality (fp8)
PSNR: mean=19.5746 p50=19.7498 p90=24.2246 best=26.2204 worst=12.8562 (N=25)
SSIM: mean=0.747737 p50=0.757871 p90=0.867436 best=0.885265 worst=0.538331 (N=25)
LPIPS: mean=0.283842 p50=0.286661 p90=0.431429 best=0.0962036 worst=0.612014 (N=25)
Performance
Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).
- Inference config:
bf16 / steps=30 / guidance_scale=5.0 - Resolutions (5 images each, batch=5):
1024x1024,1024x768,768x1024,832x1216,1216x832 - Software versions:
torch 2.9/cuda 12.8/nunchaku 1.1.0+torch2.9/diffusers 0.37.0.dev0 - Optimization switches: no
torch.compile, no explicitcudnntuning flags
Cold-start performance (end-to-end for the first image)
| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | load | 3.505s | 3.432s | 1.02x | +2.1% |
| RTX 5090 | cold_infer | 2.944s | 2.447s | 1.20x | +16.9% |
| RTX 5090 | cold_e2e | 6.449s | 5.880s | 1.10x | +8.8% |
| RTX 3090 | load | 3.787s | 3.442s | 1.10x | +9.1% |
| RTX 3090 | cold_infer | 7.503s | 5.231s | 1.43x | +30.3% |
| RTX 3090 | cold_e2e | 11.290s | 8.673s | 1.30x | +23.2% |
Steady-state performance (5 consecutive images after warmup)
| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | total (5 images) | 12.937s | 9.813s | 1.32x | +24.2% |
| RTX 5090 | avg (per image) | 2.587s | 1.963s | 1.32x | +24.2% |
| RTX 3090 | total (5 images) | 33.413s | 22.975s | 1.45x | +31.2% |
| RTX 3090 | avg (per image) | 6.683s | 4.595s | 1.45x | +31.2% |
Notes:
- The longer load time on RTX 3090 is due to extra one-time processing when loading quantized weights.
- During inference (cold_infer and steady-state), Nunchaku shows clear speedups on both GPUs.
Nunchaku Installation Required
- Official installation docs (recommended source of truth):
https://nunchaku.tech/docs/nunchaku/installation/installation.html
(Recommended) Install the official prebuilt wheel
- Prerequisite:
PyTorch >= 2.5(follow the wheel requirements) - Install Nunchaku wheel: choose a wheel matching your torch/cuda/python versions from GitHub Releases / HuggingFace / ModelScope (note
cp311means Python 3.11):https://github.com/nunchaku-ai/nunchaku/releases
# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
- Tip (RTX 50 series): typically prefer
CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).
Usage Example (Diffusers + Nunchaku UNet)
import torch
from diffusers import StableDiffusionXLPipeline
from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision
MODEL = "protovisionXLHighFidelity3D_V660" # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"
if __name__ == "__main__":
unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
)
pipe = StableDiffusionXLPipeline.from_pretrained(
f"{REPO_ID}",
unet=unet,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to("cuda")
prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
image.save("sdxl.png")
- Downloads last month
- 14