Vitoom Nunchaku (Extended)

English | 中文

Community-maintained fork of Nunchaku. Nunchaku is an SVDQuant inference engine for 4-bit diffusion models (paper: SVDQuant). The upstream project has not been actively maintained for a long time. This fork keeps the original capabilities while extending model and quantization support, and distributes prebuilt wheels on Hugging Face.

Official docs remain useful as a baseline: Nunchaku Docs · DeepCompressor


Extensions Over Upstream

Category Description
New model: Chroma Quantized Chroma diffusion Transformer inference (NunchakuChromaTransformer2dModel)
New model: FLUX.2 Klein SVDQuant inference for the FLUX.2 Klein family (4B / 9B, etc.) (NunchakuFlux2Transformer2DModel)
Qwen-Image 8-bit quantization W8A16 / int8 inference for the Qwen-Image backbone, in addition to the existing 4-bit path
Text encoder quantization Quantized text encoder runtimes for Qwen2.5-VL and Qwen3 (NunchakuQwen2VLTextEncoderModel, NunchakuQwen3TextEncoderModel, etc.)

Prebuilt Wheels

Current release: 1.3.0.dev20260622 ~ dev20260629, with PyTorch 2.11 / 2.12 builds.

Wheels are published on Hugging Face:

https://huggingface.co/tonera/vitoom-nunchaku

x86_64 (Linux)

For typical x86 servers and workstations (consumer RTX 30 / 40 / 50 GPUs and compatible datacenter cards).

File Python CUDA / Torch
nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp310-cp310-linux_x86_64.whl 3.10 cu128 + torch 2.11
nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl 3.11 cu128 + torch 2.11
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp310-cp310-linux_x86_64.whl 3.10 cu130 + torch 2.11
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_x86_64.whl 3.11 cu130 + torch 2.11
nunchaku-1.3.0.dev20260624+cu13.0torch2.12-cp313-cp313-linux_x86_64.whl 3.13 cu130 + torch 2.12

cu128 wheels: sm_75 / 80 / 86 / 89 / 120a — suitable for RTX 30 / 40 / 50 series.
cu130 wheels: all of the above plus sm_121a (GB10 / DGX Spark, etc.), while remaining compatible with RTX 50 series.

aarch64 (Linux ARM64)

File Python CUDA / Torch
nunchaku-1.3.0.dev20260622+cu13.0torch2.11-cp311-cp311-linux_aarch64.whl 3.11 cu130 + torch 2.11

For ARM64 + CUDA 13.0 environments (e.g. GB10 / DGX Spark clusters). Cannot be installed on x86 machines.

Windows (win_amd64)

For Windows workstations (consumer RTX 30 / 40 / 50 GPUs). Requires CUDA 13.0.

File Python CUDA / Torch
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp311-cp311-win_amd64.whl 3.11 cu130 + torch 2.11
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp312-cp312-win_amd64.whl 3.12 cu130 + torch 2.11
nunchaku-1.3.0.dev20260629+cu13.0torch2.11-cp313-cp313-win_amd64.whl 3.13 cu130 + torch 2.11
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp311-cp311-win_amd64.whl 3.11 cu130 + torch 2.12
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp312-cp312-win_amd64.whl 3.12 cu130 + torch 2.12
nunchaku-1.3.0.dev20260629+cu13.0torch2.12-cp313-cp313-win_amd64.whl 3.13 cu130 + torch 2.12

Use via Vitoom (Recommended)

If you only need image generation acceleration in production and prefer not to manage Python/CUDA/wheel versions yourself, use Vitoom instead of installing these wheels manually.

Vitoom is a Docker-based, locally deployable AIGC platform. Through a browser you can run text, image, audio, and video inference on your own hardware (RTX 30 / 40 / 50, DGX Spark, GB10, etc.). Its visual inference module ships with vitoom-nunchaku pre-integrated—Chroma, FLUX.2 Klein, Qwen-Image quantization, and related features are available out of the box in the Web UI after deployment.

Quick start (see the Vitoom README for full details):

git clone https://github.com/tonera/vitoom.git
cd vitoom
python scripts/setup_vitoom.py
python scripts/load_vitoom_images.py
docker compose up -d backend
docker compose -f docker-compose.inference.release.yml --profile visual up -d

Then open http://<LAN-IP>:8888 in your browser. Download models from the Web UI or via python scripts/download_initial_models.py.

Approach Best for
Install Vitoom End users who want a ready-to-use AIGC platform; no manual pip install nunchaku
Install wheel from this repo Developers integrating vitoom-nunchaku into their own Python environment, scripts, or custom pipelines

Deployment guides: docker-usage-en.md · docker-usage-cn.md


Installation (Manual Wheel)

After downloading the wheel that matches your platform and Python version from Hugging Face:

pip install nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl

Or use the Hugging Face CLI:

hf download tonera/vitoom-nunchaku \
  nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl \
  --local-dir ./wheels

pip install ./wheels/nunchaku-1.3.0.dev20260622+cu12.8torch2.11-cp311-cp311-linux_x86_64.whl

Verify

python -c "import nunchaku; print(nunchaku.__version__)"

Which Wheel to Choose

Scenario Recommended wheel
x86 Linux + RTX 4090 / 5090, PyTorch cu128 linux_x86_64 + +cu12.8torch2.11 + matching cp tag
x86 Linux + GB10, or sm_121a required linux_x86_64 + +cu13.0torch2.11 + matching cp tag
x86 Linux + Python 3.13 + PyTorch 2.12 dev20260624 + +cu13.0torch2.12-cp313 + linux_x86_64
ARM64 GB10 cluster, Python 3.11 linux_aarch64 + +cu13.0torch2.11-cp311
Windows + RTX 30 / 40 / 50, Python 3.11–3.13 win_amd64 + +cu13.0torch2.11 or +cu13.0torch2.12 + matching cp tag

Notes:

  • Python tag (cp310 / cp311 / cp312 / cp313) must match your runtime.
  • PyTorch version (torch2.11 / torch2.12) must match your installed torch version.
  • CUDA variant (cu128 / cu130) must match torch.version.cuda.
  • ARM wheels and x86 / Windows wheels are not interchangeable.

Citation

If the inference acceleration in this project helps your research, please cite the original SVDQuant paper:

@inproceedings{
  li2024svdquant,
  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Disclaimer

This is a community-maintained extension of upstream Nunchaku. Wheels are distributed independently by the Vitoom team and are not an official release from MIT Han Lab / nunchaku-ai. For support, please open an issue on the corresponding Hugging Face repository or project tracker.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for tonera/vitoom-nunchaku