SVDQuant-SDXL
Collection
SVDQuant SDXL models collections
•
20 items
•
Updated
Language: English | 中文
tonera/animagineXL40_v4Opttonera/animagineXL40_v4Opt (repo root)tonera/animagineXL40_v4Opt/svdq-<precision>_r32-animagineXL40_v4Opt.safetensorshttps://github.com/nunchaku-ai/nunchaku)Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.
The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.
PSNR: mean=19.3208 p50=19.194 p90=21.65 best=23.9493 worst=15.9134 (N=25)
SSIM: mean=0.670886 p50=0.681078 p90=0.768993 best=0.806146 worst=0.374165 (N=25)
LPIPS: mean=0.288783 p50=0.272914 p90=0.395079 best=0.105951 worst=0.54417 (N=25)
Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).
bf16 / steps=30 / guidance_scale=5.01024x1024, 1024x768, 768x1024, 832x1216, 1216x832torch 2.9 / cuda 12.8 / nunchaku 1.1.0+torch2.9 / diffusers 0.37.0.dev0torch.compile, no explicit cudnn tuning flags| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | load | 3.505s | 3.432s | 1.02x | +2.1% |
| RTX 5090 | cold_infer | 2.944s | 2.447s | 1.20x | +16.9% |
| RTX 5090 | cold_e2e | 6.449s | 5.880s | 1.10x | +8.8% |
| RTX 3090 | load | 3.787s | 3.442s | 1.10x | +9.1% |
| RTX 3090 | cold_infer | 7.503s | 5.231s | 1.43x | +30.3% |
| RTX 3090 | cold_e2e | 11.290s | 8.673s | 1.30x | +23.2% |
| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | total (5 images) | 12.937s | 9.813s | 1.32x | +24.2% |
| RTX 5090 | avg (per image) | 2.587s | 1.963s | 1.32x | +24.2% |
| RTX 3090 | total (5 images) | 33.413s | 22.975s | 1.45x | +31.2% |
| RTX 3090 | avg (per image) | 6.683s | 4.595s | 1.45x | +31.2% |
Notes:
https://nunchaku.tech/docs/nunchaku/installation/installation.htmlPyTorch >= 2.5 (follow the wheel requirements)cp311 means Python 3.11):https://github.com/nunchaku-ai/nunchaku/releases# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).import torch
from diffusers import StableDiffusionXLPipeline
from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision
MODEL = "animagineXL40_v4Opt" # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"
if __name__ == "__main__":
unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
)
pipe = StableDiffusionXLPipeline.from_pretrained(
f"{REPO_ID}",
unet=unet,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to("cuda")
prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
image.save("sdxl.png")