File size: 4,982 Bytes
b81e8fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
pipeline_tag: text-to-image
library_name: diffusers
tags:
  - sdxl
  - quantization
  - svdquant
  - nunchaku
  - fp4
  - int4
base_model: tonera/oneObsession_v19
base_model_relation: quantized
license: apache-2.0
---

# Model Card (SVDQuant)

> **Language**: English | [中文](README_CN.md)

## Model Name

- **Model repo**: `tonera/oneObsession_v19`
- **Base (Diffusers weights path)**: `tonera/oneObsession_v19` (repo root)
- **Quantized UNet weights**: `tonera/oneObsession_v19/svdq-<precision>_r32-oneObsession_v19.safetensors`

## Quantization / Inference Tech

- **Inference engine**: Nunchaku (`https://github.com/nunchaku-ai/nunchaku`)

Nunchaku is a high-performance inference engine for **4-bit (FP4/INT4) low-bit neural networks**. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as **SVDQuant**, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.

The SDXL quantized weights in this repository (e.g. `svdq-*_r32-*.safetensors`) are intended to be used with Nunchaku for efficient inference on supported GPUs.

## Quantization Quality (fp8)

```text
PSNR: mean=18.1617 p50=17.5738 p90=22.6608 best=26.7009 worst=13.9146 (N=25)
SSIM: mean=0.708654 p50=0.733115 p90=0.821183 best=0.90688 worst=0.544626 (N=25)
LPIPS: mean=0.292463 p50=0.269715 p90=0.475591 best=0.0675235 worst=0.561108 (N=25)
```

## Performance

Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).

- **Inference config**: `bf16 / steps=30 / guidance_scale=5.0`
- **Resolutions (5 images each, batch=5)**: `1024x1024`, `1024x768`, `768x1024`, `832x1216`, `1216x832`
- **Software versions**: `torch 2.9` / `cuda 12.8` / `nunchaku 1.1.0+torch2.9` / `diffusers 0.37.0.dev0`
- **Optimization switches**: no `torch.compile`, no explicit `cudnn` tuning flags

### Cold-start performance (end-to-end for the first image)

| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|-----|--------|-----------|----------|---------|------|
| RTX 5090 | load | 3.505s | 3.432s | 1.02x | +2.1% |
| RTX 5090 | cold_infer | 2.944s | 2.447s | 1.20x | +16.9% |
| RTX 5090 | cold_e2e | 6.449s | 5.880s | 1.10x | +8.8% |
| RTX 3090 | load | 3.787s | 3.442s | 1.10x | +9.1% |
| RTX 3090 | cold_infer | 7.503s | 5.231s | 1.43x | +30.3% |
| RTX 3090 | cold_e2e | 11.290s | 8.673s | 1.30x | +23.2% |

### Steady-state performance (5 consecutive images after warmup)

| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|-----|--------|-----------|----------|---------|------|
| RTX 5090 | total (5 images) | 12.937s | 9.813s | 1.32x | +24.2% |
| RTX 5090 | avg (per image) | 2.587s | 1.963s | 1.32x | +24.2% |
| RTX 3090 | total (5 images) | 33.413s | 22.975s | 1.45x | +31.2% |
| RTX 3090 | avg (per image) | 6.683s | 4.595s | 1.45x | +31.2% |

**Notes**:
- The longer load time on RTX 3090 is due to extra one-time processing when loading quantized weights.
- During inference (cold_infer and steady-state), Nunchaku shows clear speedups on both GPUs.

## Nunchaku Installation Required

- **Official installation docs** (recommended source of truth): `https://nunchaku.tech/docs/nunchaku/installation/installation.html`

### (Recommended) Install the official prebuilt wheel

- **Prerequisite**: `PyTorch >= 2.5` (follow the wheel requirements)
- **Install Nunchaku wheel**: choose a wheel matching your torch/cuda/python versions from GitHub Releases / HuggingFace / ModelScope (note `cp311` means Python 3.11):
  - `https://github.com/nunchaku-ai/nunchaku/releases`

```bash
# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
```

- **Tip (RTX 50 series)**: typically prefer `CUDA >= 12.8`, and prefer FP4 models for compatibility/performance (follow official docs).

## Usage Example (Diffusers + Nunchaku UNet)

```python
import torch
from diffusers import StableDiffusionXLPipeline

from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision

MODEL = "oneObsession_v19"  # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
    )

    pipe = StableDiffusionXLPipeline.from_pretrained(
        f"{REPO_ID}",
        unet=unet,
        torch_dtype=torch.bfloat16,
        use_safetensors=True,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
    image.save("sdxl.png")
```