Add files using upload-large-folder tool

Browse files

Files changed (5) hide show

.DS_Store +0 -0
.gitattributes +1 -0
README.md +120 -0
README_CN.md +120 -0
sample.png +3 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+sample.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,120 @@

+---
+pipeline_tag: text-to-image
+library_name: diffusers
+tags:
+  - Z-Image
+  - quantization
+  - svdquant
+  - nunchaku
+  - fp4
+  - int4
+base_model: tonera/Beyond_Reality_Zimage_v2_svdq
+base_model_relation: quantized
+license: apache-2.0
+---
+# Model Card (SVDQuant)
+> **Language**: English | [中文](README_CN.md)
+## Model name
+- **Model repo**: `tonera/Beyond_Reality_Zimage_v2_svdq`
+- **Base (Diffusers weights path)**: `tonera/Beyond_Reality_Zimage_v2_svdq` (repo root)
+- **Quantized Transformer weights**: `tonera/Beyond_Reality_Zimage_v2_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v2_svdq.safetensors`
+- **Original model**:
+[huggingface](https://huggingface.co/Nurburgring/BEYOND_REALITY_Z_IMAGE)
+[modelscope](https://modelscope.cn/models/Nurburgring/BEYOND_REALITY_Z_IMAGE)
+![vitoom](sample.png)
+## Quantization / inference tech
+- **Inference engine**: Nunchaku (`https://github.com/nunchaku-ai/nunchaku`)
+Nunchaku is a high-performance inference engine for **4-bit (FP4/INT4) low-bit neural networks**. Its goal is to significantly reduce VRAM usage and accelerate inference while keeping generation quality as much as possible. It implements and productionizes post-training quantization methods such as **SVDQuant**, and reduces the overhead from low-rank branches via operator/kernel fusion and other optimizations.
+The Z-Image quantized weights in this repo (e.g. `svdq-*_r32-*.safetensors`) are designed to be used with Nunchaku for efficient inference on supported GPUs.
+## Quantization quality (fp4)
+PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15)
+SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15)
+LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
+## Performance
+- **Config**: `bf16 / steps=9 / guidance_scale=0.0`
+- **Resolutions (5 images)**: `1024x1024`, `1216x832`, `1344x768`, `832x1216`, `768x1344`
+### Cold start (end-to-end for the first image)
+| GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
+|-----|-----------|--------|-----------|----------|--------|------|
+| RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
+| RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
+| RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
+| RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
+| RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
+| RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
+### After warmup (5 consecutive images)
+| GPU | precision | metric | Diffusers | Nunchaku | speedup | gain |
+|-----|-----------|--------|-----------|----------|--------|------|
+| RTX 5090 | fp4 | total (5 images) | 17.416s | 9.266s | 1.88x | +46.8% |
+| RTX 5090 | fp4 | avg (per image) | 3.483s | 1.853s | 1.88x | +46.8% |
+| RTX 3090 | int4 | total (5 images) | 48.863s | 24.114s | 2.03x | +50.6% |
+| RTX 3090 | int4 | avg (per image) | 9.773s | 4.823s | 2.03x | +50.6% |
+**Notes**:
+- On both GPUs, Nunchaku provides clear speedups during inference (`cold_infer` and the post-warmup runs).
+- In this benchmark, Nunchaku is slower for `load`; it’s more meaningful to focus on post-warmup throughput.
+## Nunchaku is required
+- **Official installation docs** (recommended source of truth): `https://nunchaku.tech/docs/nunchaku/installation/installation.html`
+### (Recommended) Install the official prebuilt wheel
+- **Prerequisite**: `PyTorch >= 2.5` (follow the wheel requirements)
+- **Install a matching nunchaku wheel** from GitHub Releases / HuggingFace / ModelScope (note: `cp311` means Python 3.11):
+  - `https://github.com/nunchaku-ai/nunchaku/releases`
+```bash
+# Example (pick the correct wheel URL for your torch/cuda/python versions)
+pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
+```
+- **Tip (RTX 50 series)**: CUDA `>= 12.8` is often recommended, and FP4 models are usually preferred for better compatibility/performance (follow official docs).
+## Usage (Diffusers + Nunchaku Transformer)
+```python
+import torch
+from diffusers import ZImagePipeline
+from nunchaku import NunchakuZImageTransformer2DModel
+from nunchaku.utils import get_precision
+MODEL = "Beyond_Reality_Zimage_v2_svdq"
+REPO_ID = f"tonera/{MODEL}"
+if __name__ == "__main__":
+    transformer = NunchakuZImageTransformer2DModel.from_pretrained(
+        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
+        torch_dtype=torch.bfloat16,
+    )
+    pipe = ZImagePipeline.from_pretrained(
+        f"{REPO_ID}",
+        torch_dtype=torch.bfloat16,
+        transformer=transformer,
+    ).to("cuda")
+    prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
+    image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
+    image.save("beyond-reality-zimage-v2-svdq.png")
+```

README_CN.md ADDED Viewed

	@@ -0,0 +1,120 @@

+---
+pipeline_tag: text-to-image
+library_name: diffusers
+tags:
+  - Z-Image
+  - quantization
+  - svdquant
+  - nunchaku
+  - fp4
+  - int4
+base_model: tonera/Beyond_Reality_Zimage_v3_svdq
+base_model_relation: quantized
+license: apache-2.0
+---
+# 模型说明（SVDQuant）
+> **文档语言**：中文｜[English](README.md)
+## 模型名称
+- **模型仓库**：`tonera/Beyond_Reality_Zimage_v3_svdq`
+- **Base（Diffusers 权重路径）**：`tonera/Beyond_Reality_Zimage_v3_svdq`（本仓库根目录）
+- **量化 Transformer 权重**：`tonera/Beyond_Reality_Zimage_v3_svdq/svdq-<precision>_r32-Beyond_Reality_Zimage_v3_svdq.safetensors`
+- **原始模型**：
+[huggingface](https://huggingface.co/Nurburgring/BEYOND_REALITY_Z_IMAGE)
+[modelscope](https://modelscope.cn/models/Nurburgring/BEYOND_REALITY_Z_IMAGE)
+![vitoom](sample.png)
+## 量化 / 推理技术
+- **推理引擎**：Nunchaku（`https://github.com/nunchaku-ai/nunchaku`）
+Nunchaku 是一个面向 **4-bit（FP4/INT4）低比特神经网络**的高性能推理引擎，核心目标是在尽量保持生成质量的同时显著降低显存占用并提升推理速度。它实现并工程化了 **SVDQuant** 等后训练量化方案，并通过算子/内核融合等优化减少低秩分支带来的额外开销。
+本模型仓库中的 Z-Image 量化权重（例如 `svdq-*_r32-*.safetensors`）用于配合 Nunchaku，在支持的 GPU 上进行高效推理。
+## 量化质量（fp4）
+PSNR: mean=15.0697 p50=14.8491 p90=17.1213 best=18.3484 worst=11.7532 (N=15)
+SSIM: mean=0.604458 p50=0.594962 p90=0.724261 best=0.739746 worst=0.436558 (N=15)
+LPIPS: mean=0.317187 p50=0.30015 p90=0.407988 best=0.191258 worst=0.477386 (N=15)
+## 性能提升
+- **推理配置**：`bf16 / steps=9 / guidance_scale=0.0`
+- **分辨率（共 5 张）**：`1024x1024`, `1216x832`, `1344x768`, `832x1216`, `768x1344`
+### 冷启动性能对比（首张图端到端）
+| GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
+|-----|-----------|------|-----------|----------|--------|------|
+| RTX 5090 | fp4 | load | 4.911s | 13.500s | 0.36x | -174.9% |
+| RTX 5090 | fp4 | cold_infer | 3.945s | 2.275s | 1.73x | +42.3% |
+| RTX 5090 | fp4 | cold_e2e | 8.856s | 15.775s | 0.56x | -78.1% |
+| RTX 3090 | int4 | load | 6.934s | 15.971s | 0.43x | -130.3% |
+| RTX 3090 | int4 | cold_infer | 10.203s | 5.178s | 1.97x | +49.3% |
+| RTX 3090 | int4 | cold_e2e | 17.137s | 21.149s | 0.81x | -23.4% |
+### Warmup 后连续 5 张性能对比
+| GPU | precision | 指标 | Diffusers | Nunchaku | 加速比 | 提升 |
+|-----|-----------|------|-----------|----------|--------|------|
+| RTX 5090 | fp4 | total (5张) | 17.416s | 9.266s | 1.88x | +46.8% |
+| RTX 5090 | fp4 | avg (单张) | 3.483s | 1.853s | 1.88x | +46.8% |
+| RTX 3090 | int4 | total (5张) | 48.863s | 24.114s | 2.03x | +50.6% |
+| RTX 3090 | int4 | avg (单张) | 9.773s | 4.823s | 2.03x | +50.6% |
+**说明**：
+- 两张显卡上，Nunchaku 在推理阶段（`cold_infer` 与 warmup 后）均表现出明显加速
+- `load` 阶段在这组测试里 Nunchaku 更慢；更适合关注 warmup 后连续出图吞吐
+## 使用前必须安装 Nunchaku
+- **官方安装文档**（建议以此为准）：`https://nunchaku.tech/docs/nunchaku/installation/installation.html`
+### （推荐）方式：安装官方预编译 Wheel
+- **前置条件**：安装 `PyTorch >= 2.5`（实际以对应 wheel 的要求为准）
+- **安装 nunchaku wheel**：从 GitHub Releases / HuggingFace / ModelScope 选择与你环境匹配的 wheel（注意 `cp311` 表示 Python 3.11）：
+  - `https://github.com/nunchaku-ai/nunchaku/releases`
+```bash
+# 示例（请按你的 torch/cuda/python 版本选择正确的 wheel URL）
+pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
+```
+- **提示（50 系 GPU）**：通常建议 `CUDA >= 12.8`，并优先使用 FP4 模型以获得更好的兼容性与性能（以官方文档为准）。
+## 使用示例（Diffusers + Nunchaku Transformer）
+```python
+import torch
+from diffusers import ZImagePipeline
+from nunchaku import NunchakuZImageTransformer2DModel
+from nunchaku.utils import get_precision
+MODEL = "Beyond_Reality_Zimage_v3_svdq"
+REPO_ID = f"tonera/{MODEL}"
+if __name__ == "__main__":
+    transformer = NunchakuZImageTransformer2DModel.from_pretrained(
+        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors",
+        torch_dtype=torch.bfloat16,
+    )
+    pipe = ZImagePipeline.from_pretrained(
+        f"{REPO_ID}",
+        torch_dtype=torch.bfloat16,
+        transformer=transformer,
+    ).to("cuda")
+    prompt = "a cat hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
+    image = pipe(prompt=prompt, guidance_scale=0, num_inference_steps=9).images[0]
+    image.save("beyond-reality-zimage-v2-svdq.png")
+```

sample.png ADDED Viewed

Git LFS Details

SHA256: 2780dac02d3c43a63af215e1e386bb1573c2737f843a24403732fb7c8006d559
Pointer size: 131 Bytes
Size of remote file: 188 kB