Vitoom Nunchaku (Extended)
English | 中文
Community-maintained fork of Nunchaku. Nunchaku is an SVDQuant inference engine for 4-bit diffusion models (paper: SVDQuant). The upstream project has not been actively maintained for a long time. This fork keeps the original capabilities while extending model and quantization support, and distributes prebuilt wheels on Hugging Face.
Official docs remain useful as a baseline: Nunchaku Docs · DeepCompressor
Extensions Over Upstream
| Category | Description |
|---|---|
| New model: Chroma | Quantized Chroma diffusion Transformer inference (NunchakuChromaTransformer2dModel) |
| New model: FLUX.2 Klein | SVDQuant inference for the FLUX.2 Klein family (4B / 9B, etc.) (NunchakuFlux2Transformer2DModel) |
| Qwen-Image 8-bit quantization | W8A16 / int8 inference for the Qwen-Image backbone, in addition to the existing 4-bit path |
| Text encoder quantization | Quantized text encoder runtimes for Qwen2.5-VL and Qwen3 (NunchakuQwen2VLTextEncoderModel, NunchakuQwen3TextEncoderModel, etc.) |
Prebuilt Wheels
Current release: 1.3.0.dev20260622 ~ dev20260629, with PyTorch 2.11 / 2.12 builds.
Wheels are published on Hugging Face:
https://huggingface.co/tonera/vitoom-nunchaku
x86_64 (Linux)
For typical x86 servers and workstations (consumer RTX 30 / 40 / 50 GPUs and compatible datacenter cards).
| File | Python | CUDA / Torch |
|---|---|---|
nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp310-cp310-linux_x86_64.whl |
3.10 | cu128 + torch 2.11 |
nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl |
3.11 | cu128 + torch 2.11 |
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp310-cp310-linux_x86_64.whl |
3.10 | cu130 + torch 2.11 |
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_x86_64.whl |
3.11 | cu130 + torch 2.11 |
nunchaku-1.3.0.dev20260624+cu13.0torch2.12-cp313-cp313-linux_x86_64.whl |
3.13 | cu130 + torch 2.12 |
cu128 wheels: sm_75 / 80 / 86 / 89 / 120a — suitable for RTX 30 / 40 / 50 series.
cu130 wheels: all of the above plus sm_121a (GB10 / DGX Spark, etc.), while remaining compatible with RTX 50 series.
aarch64 (Linux ARM64)
| File | Python | CUDA / Torch |
|---|---|---|
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_aarch64.whl |
3.11 | cu130 + torch 2.11 |
For ARM64 + CUDA 13.0 environments (e.g. GB10 / DGX Spark clusters). Cannot be installed on x86 machines.
Windows (win_amd64)
For Windows workstations (consumer RTX 30 / 40 / 50 GPUs). Requires CUDA 13.0.
| File | Python | CUDA / Torch |
|---|---|---|
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp311-cp311-win_amd64.whl |
3.11 | cu130 + torch 2.11 |
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp312-cp312-win_amd64.whl |
3.12 | cu130 + torch 2.11 |
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp313-cp313-win_amd64.whl |
3.13 | cu130 + torch 2.11 |
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp311-cp311-win_amd64.whl |
3.11 | cu130 + torch 2.12 |
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp312-cp312-win_amd64.whl |
3.12 | cu130 + torch 2.12 |
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp313-cp313-win_amd64.whl |
3.13 | cu130 + torch 2.12 |
Use via Vitoom (Recommended)
If you only need image generation acceleration in production and prefer not to manage Python/CUDA/wheel versions yourself, use Vitoom instead of installing these wheels manually.
Vitoom is a Docker-based, locally deployable AIGC platform. Through a browser you can run text, image, audio, and video inference on your own hardware (RTX 30 / 40 / 50, DGX Spark, GB10, etc.). Its visual inference module ships with vitoom-nunchaku pre-integrated—Chroma, FLUX.2 Klein, Qwen-Image quantization, and related features are available out of the box in the Web UI after deployment.
Quick start (see the Vitoom README for full details):
git clone https://github.com/tonera/vitoom.git
cd vitoom
python scripts/setup_vitoom.py
python scripts/load_vitoom_images.py
docker compose up -d backend
docker compose -f docker-compose.inference.release.yml --profile visual up -d
Then open http://<LAN-IP>:8888 in your browser. Download models from the Web UI or via python scripts/download_initial_models.py.
| Approach | Best for |
|---|---|
| Install Vitoom | End users who want a ready-to-use AIGC platform; no manual pip install nunchaku |
| Install wheel from this repo | Developers integrating vitoom-nunchaku into their own Python environment, scripts, or custom pipelines |
Deployment guides: docker-usage-en.md · docker-usage-cn.md
Installation (Manual Wheel)
After downloading the wheel that matches your platform and Python version from Hugging Face:
pip install nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl
Or use the Hugging Face CLI:
hf download tonera/vitoom-nunchaku \
nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl \
--local-dir ./wheels
pip install ./wheels/nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl
Verify
python -c "import nunchaku; print(nunchaku.__version__)"
Which Wheel to Choose
| Scenario | Recommended wheel |
|---|---|
| x86 Linux + RTX 4090 / 5090, PyTorch cu128 | linux_x86_64 + +cu12.8torch2.11 + matching cp tag |
| x86 Linux + GB10, or sm_121a required | linux_x86_64 + +cu13.0torch2.11 + matching cp tag |
| x86 Linux + Python 3.13 + PyTorch 2.12 | dev20260624 + +cu13.0torch2.12-cp313 + linux_x86_64 |
| ARM64 GB10 cluster, Python 3.11 | linux_aarch64 + +cu13.0torch2.11-cp311 |
| Windows + RTX 30 / 40 / 50, Python 3.11–3.13 | win_amd64 + +cu13.0torch2.11 or +cu13.0torch2.12 + matching cp tag |
Notes:
- Python tag (
cp310/cp311/cp312/cp313) must match your runtime. - PyTorch version (
torch2.11/torch2.12) must match your installedtorchversion. - CUDA variant (cu128 / cu130) must match
torch.version.cuda. - ARM wheels and x86 / Windows wheels are not interchangeable.
Citation
If the inference acceleration in this project helps your research, please cite the original SVDQuant paper:
@inproceedings{
li2024svdquant,
title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
Disclaimer
This is a community-maintained extension of upstream Nunchaku. Wheels are distributed independently by the Vitoom team and are not an official release from MIT Han Lab / nunchaku-ai. For support, please open an issue on the corresponding Hugging Face repository or project tracker.