tonera commited on
Commit
906c235
·
verified ·
1 Parent(s): 0580234

Add files using upload-large-folder tool

Browse files
Files changed (5) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +1 -0
  3. README.md +120 -0
  4. README_CN.md +120 -0
  5. sample.png +3 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ sample.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ library_name: diffusers
4
+ tags:
5
+ - Z-Image
6
+ - quantization
7
+ - svdquant
8
+ - nunchaku
9
+ - fp4
10
+ - int4
11
+ base_model: tonera/Beyond_Reality_Zimage_v2_svdq
12
+ base_model_relation: quantized
13
+ license: apache-2.0
14
+ ---
15
+
16
+ # Model Card (SVDQuant)
17
+
18
+ > **Language**: English | [中文](README_CN.md)
19
+
20
+ ## Model name
21
+
22
+ - **Model repo**: `tonera/Beyond_Reality_Zimage_v2_svdq`
23
+ - **Base (Diffusers weights path)**: `tonera/Beyond_Reality_Zimage_v2_svdq` (repo root)
24
+ - **Quantized Transformer weights**: `tonera/Beyond_Reality_Zimage_v2_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v2_svdq.safetensors`
25
+ - **Original model**:
26
+ [huggingface](https://huggingface.co/Nurburgring/BEYOND_REALITY_Z_IMAGE)
27
+ [modelscope](https://modelscope.cn/models/Nurburgring/BEYOND_REALITY_Z_IMAGE)
28
+
29
+ ![vitoom](sample.png)
30
+
31
+ ## Quantization / inference tech
32
+
33
+ - **Inference engine**: Nunchaku (`https://github.com/nunchaku-ai/nunchaku`)
34
+
35
+ Nunchaku is a high-performance inference engine for **4-bit (FP4/INT4) low-bit neural networks**. Its goal is to significantly reduce VRAM usage and accelerate inference while keeping generation quality as much as possible. It implements and productionizes post-training quantization methods such as **SVDQuant**, and reduces the overhead from low-rank branches via operator/kernel fusion and other optimizations.
36
+
37
+ The Z-Image quantized weights in this repo (e.g. `svdq-*_r32-*.safetensors`) are designed to be used with Nunchaku for efficient inference on supported GPUs.
38
+
39
+ ## Quantization quality (fp4)
40
+
41
+ PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15)
42
+ SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15)
43
+ LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
44
+
45
+ ## Performance
46
+
47
+ - **Config**: `bf16 / steps=9 / guidance_scale=0.0`
48
+ - **Resolutions (5 images)**: `1024x1024`, `1216x832`, `1344x768`, `832x1216`, `768x1344`
49
+
50
+ ### Cold start (end-to-end for the first image)
51
+
52
+ | GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
53
+ |-----|-----------|--------|-----------|----------|--------|------|
54
+ | RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
55
+ | RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
56
+ | RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
57
+ | RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
58
+ | RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
59
+ | RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
60
+
61
+ ### After warmup (5 consecutive images)
62
+
63
+ | GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
64
+ |-----|-----------|--------|-----------|----------|--------|------|
65
+ | RTX 5090 | fp4 | total (5 images) | 17.416s | 9.266s | 1.88x | +46.8% |
66
+ | RTX 5090 | fp4 | avg (per image) | 3.483s | 1.853s | 1.88x | +46.8% |
67
+ | RTX 3090 | int4 | total (5 images) | 48.863s | 24.114s | 2.03x | +50.6% |
68
+ | RTX 3090 | int4 | avg (per image) | 9.773s | 4.823s | 2.03x | +50.6% |
69
+
70
+ **Notes**:
71
+ - On both GPUs, Nunchaku provides clear speedups during inference (`cold_infer` and the post-warmup runs).
72
+ - In this benchmark, Nunchaku is slower for `load`; it’s more meaningful to focus on post-warmup throughput.
73
+
74
+ ## Nunchaku is required
75
+
76
+ - **Official installation docs** (recommended source of truth): `https://nunchaku.tech/docs/nunchaku/installation/installation.html`
77
+
78
+ ### (Recommended) Install the official prebuilt wheel
79
+
80
+ - **Prerequisite**: `PyTorch >= 2.5` (follow the wheel requirements)
81
+ - **Install a matching nunchaku wheel** from GitHub Releases / HuggingFace / ModelScope (note: `cp311` means Python 3.11):
82
+ - `https://github.com/nunchaku-ai/nunchaku/releases`
83
+
84
+ ```bash
85
+ # Example (pick the correct wheel URL for your torch/cuda/python versions)
86
+ pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
87
+ ```
88
+
89
+ - **Tip (RTX 50 series)**: CUDA `>= 12.8` is often recommended, and FP4 models are usually preferred for better compatibility/performance (follow official docs).
90
+
91
+ ## Usage (Diffusers + Nunchaku Transformer)
92
+
93
+
94
+ ```python
95
+ import torch
96
+
97
+ from diffusers import ZImagePipeline
98
+ from nunchaku import NunchakuZImageTransformer2DModel
99
+ from nunchaku.utils import get_precision
100
+
101
+ MODEL = "Beyond_Reality_Zimage_v2_svdq"
102
+ REPO_ID = f"tonera/{MODEL}"
103
+
104
+ if __name__ == "__main__":
105
+ transformer = NunchakuZImageTransformer2DModel.from_pretrained(
106
+ f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
107
+ torch_dtype=torch.bfloat16,
108
+ )
109
+
110
+ pipe = ZImagePipeline.from_pretrained(
111
+ f"{REPO_ID}",
112
+ torch_dtype=torch.bfloat16,
113
+ transformer=transformer,
114
+ ).to("cuda")
115
+
116
+ prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
117
+ image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
118
+ image.save("beyond-reality-zimage-v2-svdq.png")
119
+ ```
120
+
README_CN.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ library_name: diffusers
4
+ tags:
5
+ - Z-Image
6
+ - quantization
7
+ - svdquant
8
+ - nunchaku
9
+ - fp4
10
+ - int4
11
+ base_model: tonera/Beyond_Reality_Zimage_v3_svdq
12
+ base_model_relation: quantized
13
+ license: apache-2.0
14
+ ---
15
+
16
+ # 模型说明(SVDQuant)
17
+
18
+ > **文档语言**:中文|[English](README.md)
19
+ ## 模型名称
20
+
21
+ - **模型仓库**:`tonera/Beyond_Reality_Zimage_v3_svdq`
22
+ - **Base(Diffusers 权重路径)**:`tonera/Beyond_Reality_Zimage_v3_svdq`(本仓库根目录)
23
+ - **量化 Transformer 权重**:`tonera/Beyond_Reality_Zimage_v3_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v3_svdq.safetensors`
24
+ - **原始模型**:
25
+ [huggingface](https://huggingface.co/Nurburgring/BEYOND_REALITY_Z_IMAGE)
26
+ [modelscope](https://modelscope.cn/models/Nurburgring/BEYOND_REALITY_Z_IMAGE)
27
+
28
+ ![vitoom](sample.png)
29
+
30
+ ## 量化 / 推理技术
31
+
32
+ - **推理引擎**:Nunchaku(`https://github.com/nunchaku-ai/nunchaku`)
33
+
34
+ Nunchaku 是一个面向 **4-bit(FP4/INT4)低比特神经网络**的高性能推理引擎,核心目标是在尽量保持生成质量的同时显著降低显存占用并提升推理速度。它实现并工程化了 **SVDQuant** 等后训练量化方案,并通过算子/内核融合等优化减少低秩分支带来的额外开销。
35
+
36
+ 本模型仓库中的 Z-Image 量化权重(例如 `svdq-*_r32-*.safetensors`)用于配合 Nunchaku,在支持的 GPU 上进行高效推理。
37
+
38
+ ## 量化质量(fp4)
39
+
40
+ PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15)
41
+ SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15)
42
+ LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
43
+
44
+ ## 性能提升
45
+
46
+ - **推理配置**:`bf16 / steps=9 / guidance_scale=0.0`
47
+ - **分辨率(共 5 张)**:`1024x1024`, `1216x832`, `1344x768`, `832x1216`, `768x1344`
48
+
49
+ ### 冷启动性能对比(首张图端到端)
50
+
51
+ | GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
52
+ |-----|-----------|------|-----------|----------|--------|------|
53
+ | RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
54
+ | RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
55
+ | RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
56
+ | RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
57
+ | RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
58
+ | RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
59
+
60
+ ### Warmup 后连续 5 张性能对比
61
+
62
+ | GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
63
+ |-----|-----------|------|-----------|----------|--------|------|
64
+ | RTX 5090 | fp4 | total (5张) | 17.416s | 9.266s | 1.88x | +46.8% |
65
+ | RTX 5090 | fp4 | avg (单张) | 3.483s | 1.853s | 1.88x | +46.8% |
66
+ | RTX 3090 | int4 | total (5张) | 48.863s | 24.114s | 2.03x | +50.6% |
67
+ | RTX 3090 | int4 | avg (单张) | 9.773s | 4.823s | 2.03x | +50.6% |
68
+
69
+ **说明**:
70
+ - 两张显卡上,Nunchaku 在推理阶段(`cold_infer` 与 warmup 后)均表现出明显加速
71
+ - `load` 阶段在这组测试里 Nunchaku 更慢;更适合关注 warmup 后连续出图吞吐
72
+
73
+ ## 使用前必须安装 Nunchaku
74
+
75
+ - **官方安装文档**(建议以此为准):`https://nunchaku.tech/docs/nunchaku/installation/installation.html`
76
+
77
+ ### (推荐)方式:安装官方预编译 Wheel
78
+
79
+ - **前置条件**:安装 `PyTorch >= 2.5`(实际以对应 wheel 的要求为准)
80
+ - **安装 nunchaku wheel**:从 GitHub Releases / HuggingFace / ModelScope 选择与你环境匹配的 wheel(注意 `cp311` 表示 Python 3.11):
81
+ - `https://github.com/nunchaku-ai/nunchaku/releases`
82
+
83
+ ```bash
84
+ # 示例(请按你的 torch/cuda/python 版本选择正确的 wheel URL)
85
+ pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
86
+ ```
87
+
88
+ - **提示(50 系 GPU)**:通常建议 `CUDA >= 12.8`,并优先使用 FP4 模型以获得更好的兼容性与性能(以官方文档为准)。
89
+
90
+ ## 使用示例(Diffusers + Nunchaku Transformer)
91
+
92
+
93
+
94
+ ```python
95
+ import torch
96
+
97
+ from diffusers import ZImagePipeline
98
+ from nunchaku import NunchakuZImageTransformer2DModel
99
+ from nunchaku.utils import get_precision
100
+
101
+ MODEL = "Beyond_Reality_Zimage_v3_svdq"
102
+ REPO_ID = f"tonera/{MODEL}"
103
+
104
+ if __name__ == "__main__":
105
+ transformer = NunchakuZImageTransformer2DModel.from_pretrained(
106
+ f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
107
+ torch_dtype=torch.bfloat16,
108
+ )
109
+
110
+ pipe = ZImagePipeline.from_pretrained(
111
+ f"{REPO_ID}",
112
+ torch_dtype=torch.bfloat16,
113
+ transformer=transformer,
114
+ ).to("cuda")
115
+
116
+ prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
117
+ image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
118
+ image.save("beyond-reality-zimage-v2-svdq.png")
119
+ ```
120
+
sample.png ADDED

Git LFS Details

  • SHA256: 2780dac02d3c43a63af215e1e386bb1573c2737f843a24403732fb7c8006d559
  • Pointer size: 131 Bytes
  • Size of remote file: 188 kB