ERNIE Image Turbo — Nunchaku W4A4 Quantized Inference

中文 | English


Introduction

This adds W4A4 quantized inference support for ERNIE Image Turbo to Nunchaku, delivering significant speedup and memory reduction with minimal quality loss.

Built on Nunchaku. We gratefully acknowledge their excellent work on efficient diffusion model inference.

Installation

# This fork adds ERNIE Image support to Nunchaku
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive

pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl

Quick Start

import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision

precision = get_precision()  # auto-detect: "int4" or "fp4"
rank = 64

transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
    f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image-Turbo",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    pe=None, pe_tokenizer=None,
)

image = pipe(
    prompt="a cute orange cat sitting on a sunlit windowsill",
    height=1024, width=1024,
    num_inference_steps=8,
    guidance_scale=1.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")

Performance (Reference)

Tested on a single A800 GPU, 1024×1024 resolution, 8 inference steps:

Model Avg Latency Speedup
Original BF16 4.89s 1.0x
Nunchaku W4A4 2.81s 1.74x

Notes

  • Only batch_size=1 is supported (same as typical inference use case).

简介

Nunchaku 添加了对 ERNIE Image TurboW4A4 量化推理支持,在保持图像质量的前提下显著提升推理速度、降低显存占用。

本实现基于 Nunchaku,感谢其在高效扩散模型推理方面的出色工作。

安装

# 本 fork 基于 Nunchaku 添加了对 ERNIE Image 的支持
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive

pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl

快速开始

import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision

precision = get_precision()  # 自动检测:int4 或 fp4
rank = 64

transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
    f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image-Turbo",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    pe=None, pe_tokenizer=None,
)

image = pipe(
    prompt="一只可爱的橘色猫咪坐在阳光照射的窗台上,旁边放着一盆绿色植物",
    height=1024, width=1024,
    num_inference_steps=8,
    guidance_scale=1.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")

性能参考

A800 单卡测试,1024×1024 分辨率,8 步推理:

模型 平均延迟 加速比
原始 BF16 4.89s 1.0x
Nunchaku W4A4 2.81s 1.74x

注意事项

  • 仅支持 batch_size=1(符合常见推理场景)。
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support