Vitoom Nunchaku (Extended)

English | 中文

Community-maintained fork of Nunchaku. Nunchaku is an SVDQuant inference engine for 4-bit diffusion models (paper: SVDQuant). The upstream project has not been actively maintained for a long time. This fork keeps the original capabilities while extending model and quantization support, and distributes prebuilt wheels on Hugging Face.

Official docs remain useful as a baseline: Nunchaku Docs · DeepCompressor

Extensions Over Upstream

Category	Description
New model: Chroma	Quantized Chroma diffusion Transformer inference (`NunchakuChromaTransformer2dModel`)
New model: FLUX.2 Klein	SVDQuant inference for the FLUX.2 Klein family (4B / 9B, etc.) (`NunchakuFlux2Transformer2DModel`)
Qwen-Image 8-bit quantization	W8A16 / int8 inference for the Qwen-Image backbone, in addition to the existing 4-bit path
Text encoder quantization	Quantized text encoder runtimes for Qwen2.5-VL and Qwen3 (`NunchakuQwen2VLTextEncoderModel`, `NunchakuQwen3TextEncoderModel`, etc.)

Prebuilt Wheels

Current release: 1.3.0.dev20260622 ~ dev20260629, with PyTorch 2.11 / 2.12 builds.

Wheels are published on Hugging Face:

https://huggingface.co/tonera/vitoom-nunchaku

x86_64 (Linux)

For typical x86 servers and workstations (consumer RTX 30 / 40 / 50 GPUs and compatible datacenter cards).

File	Python	CUDA / Torch
`nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp310-cp310-linux_x86_64.whl`	3.10	cu128 + torch 2.11
`nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl`	3.11	cu128 + torch 2.11
`nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp310-cp310-linux_x86_64.whl`	3.10	cu130 + torch 2.11
`nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_x86_64.whl`	3.11	cu130 + torch 2.11
`nunchaku-1.3.0.dev20260624+cu13.0torch2.12-cp313-cp313-linux_x86_64.whl`	3.13	cu130 + torch 2.12

cu128 wheels: sm_75 / 80 / 86 / 89 / 120a — suitable for RTX 30 / 40 / 50 series.
cu130 wheels: all of the above plus sm_121a (GB10 / DGX Spark, etc.), while remaining compatible with RTX 50 series.

aarch64 (Linux ARM64)

File	Python	CUDA / Torch
`nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_aarch64.whl`	3.11	cu130 + torch 2.11

For ARM64 + CUDA 13.0 environments (e.g. GB10 / DGX Spark clusters). Cannot be installed on x86 machines.

Windows (win_amd64)

For Windows workstations (consumer RTX 30 / 40 / 50 GPUs). Requires CUDA 13.0.

File	Python	CUDA / Torch
`nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp311-cp311-win_amd64.whl`	3.11	cu130 + torch 2.11
`nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp312-cp312-win_amd64.whl`	3.12	cu130 + torch 2.11
`nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp313-cp313-win_amd64.whl`	3.13	cu130 + torch 2.11
`nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp311-cp311-win_amd64.whl`	3.11	cu130 + torch 2.12
`nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp312-cp312-win_amd64.whl`	3.12	cu130 + torch 2.12
`nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp313-cp313-win_amd64.whl`	3.13	cu130 + torch 2.12

Use via Vitoom (Recommended)

If you only need image generation acceleration in production and prefer not to manage Python/CUDA/wheel versions yourself, use Vitoom instead of installing these wheels manually.

Vitoom is a Docker-based, locally deployable AIGC platform. Through a browser you can run text, image, audio, and video inference on your own hardware (RTX 30 / 40 / 50, DGX Spark, GB10, etc.). Its visual inference module ships with vitoom-nunchaku pre-integrated—Chroma, FLUX.2 Klein, Qwen-Image quantization, and related features are available out of the box in the Web UI after deployment.

Quick start (see the Vitoom README for full details):

git clone https://github.com/tonera/vitoom.git
cd vitoom
python scripts/setup_vitoom.py
python scripts/load_vitoom_images.py
docker compose up -d backend
docker compose -f docker-compose.inference.release.yml --profile visual up -d

Then open http://<LAN-IP>:8888 in your browser. Download models from the Web UI or via python scripts/download_initial_models.py.

Approach	Best for
Install Vitoom	End users who want a ready-to-use AIGC platform; no manual `pip install nunchaku`
Install wheel from this repo	Developers integrating vitoom-nunchaku into their own Python environment, scripts, or custom pipelines

Deployment guides: docker-usage-en.md · docker-usage-cn.md

Installation (Manual Wheel)

After downloading the wheel that matches your platform and Python version from Hugging Face:

pip install nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl

Or use the Hugging Face CLI:

hf download tonera/vitoom-nunchaku \
  nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl \
  --local-dir ./wheels

pip install ./wheels/nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl

Verify

python -c "import nunchaku; print(nunchaku.__version__)"

Which Wheel to Choose

Scenario	Recommended wheel
x86 Linux + RTX 4090 / 5090, PyTorch cu128	`linux_x86_64` + `+cu12.8torch2.11` + matching cp tag
x86 Linux + GB10, or sm_121a required	`linux_x86_64` + `+cu13.0torch2.11` + matching cp tag
x86 Linux + Python 3.13 + PyTorch 2.12	`dev20260624` + `+cu13.0torch2.12-cp313` + `linux_x86_64`
ARM64 GB10 cluster, Python 3.11	`linux_aarch64` + `+cu13.0torch2.11-cp311`
Windows + RTX 30 / 40 / 50, Python 3.11–3.13	`win_amd64` + `+cu13.0torch2.11` or `+cu13.0torch2.12` + matching cp tag

Notes:

Python tag (cp310 / cp311 / cp312 / cp313) must match your runtime.
PyTorch version (torch2.11 / torch2.12) must match your installed torch version.
CUDA variant (cu128 / cu130) must match torch.version.cuda.
ARM wheels and x86 / Windows wheels are not interchangeable.

Citation

If the inference acceleration in this project helps your research, please cite the original SVDQuant paper:

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Disclaimer

This is a community-maintained extension of upstream Nunchaku. Wheels are distributed independently by the Vitoom team and are not an official release from MIT Han Lab / nunchaku-ai. For support, please open an issue on the corresponding Hugging Face repository or project tracker.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for tonera/vitoom-nunchaku

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published Nov 7, 2024 • 25