Krea 2 Turbo SDNQ UINT4

SDNQ UINT4 quantization of krea/Krea-2-Turbo for Diffusers Krea2Pipeline.

Original vs SDNQ comparison

What Is Quantized

Selected recipe: uint4-static-transformer-only.

Quantized components: transformer. Tokenizer, scheduler, and non-selected pipeline components are copied from the original Diffusers pipeline.

The initial smoke sweep also tried SDNQ packing for the text encoder, but standard Diffusers/Transformers loading rejected the packed Qwen3VLModel text-encoder weights. This release therefore keeps the text encoder loadable in bf16 and quantizes the Krea transformer only.

Benchmark Setup

  • Pipeline: Krea2Pipeline
  • Resolution: 2048x2048
  • Steps: 8
  • Guidance scale: 0.0
  • Seed base: 61000
  • Distilled mode: true
  • Torch dtype: bfloat16
  • Attention backend: chunked-native query attention, chunk_size=1024
  • Prompt set: 10 prompts covering simple scenes, public-domain style stress tests, tricky composition, long Latin text, long Cyrillic text, and mixed Latin/Cyrillic diagrams
  • Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition on a disposable RunPod pod with local container disk

Benchmark Summary

Model Load First gen Hot mean Hot max Load GPU peak Gen GPU peak Torch peak
original 7.019 s 91.147 s 90.487 s 90.712 s 33553 MB 52313 MB 50135.29150390625 MB
uint4-static-transformer-only 4.792 s 90.243 s 88.561 s 88.776 s 16189 MB 34865 MB 32690.51123046875 MB

Storage size of this release directory: 15.46 GB. Quantized local checkpoint size before packaging: 15.36 GB.

Raw per-prompt metrics are available in benchmark/*.csv and benchmark/*.jsonl. The combined benchmark summary is in benchmark/summary.json.

Usage

pip install -U git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors huggingface_hub sdnq
import torch
from diffusers import Krea2Pipeline
from sdnq.loader import apply_sdnq_options_to_model

repo_id = "WaveCut/Krea-2-Turbo-SDNQ-uint4"
device = "cuda"

pipe = Krea2Pipeline.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    is_distilled=True,
)

for name in ['transformer']:
    module = getattr(pipe, name, None)
    if module is not None:
        setattr(
            pipe,
            name,
            apply_sdnq_options_to_model(module, dtype=torch.bfloat16, use_quantized_matmul=True),
        )

pipe.to(device)
image = pipe(
    prompt="A clean technical poster with readable labels",
    height=2048,
    width=2048,
    num_inference_steps=8,
    guidance_scale=0.0,
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]
image.save("krea2-sdnq.png")

Quantization Recipe

{
  "dynamic_loss_threshold": null,
  "modules": [
    "transformer"
  ],
  "name": "uint4-static-transformer-only",
  "quant_conv": false,
  "quant_embedding": false,
  "svd_rank": 32,
  "svd_steps": 32,
  "use_dynamic_quantization": false,
  "use_svd": false,
  "weights_dtype": "uint4"
}

The checkpoint was produced by loading the original Diffusers pipeline, applying sdnq_post_load_quant only to the listed pipeline components, and saving with save_sdnq_model(..., is_pipeline=True).

Limitations

  • This is a quantized derivative and inherits the base model behavior, limits, and license terms.
  • The comparison set is a deployment smoke benchmark, not a preference study or FID evaluation.
  • Long text, small labels, and mixed Cyrillic/Latin diagrams should be inspected manually before production use.
  • Benchmark numbers depend on GPU, driver, PyTorch, Diffusers, SDNQ, and CUDA versions.

License

This repository contains a quantized derivative of krea/Krea-2-Turbo. Upstream license material copied during packaging: LICENSE.pdf. Review the upstream Krea model card and license before use or redistribution.

Downloads last month
103
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WaveCut/Krea-2-Turbo-SDNQ-uint4

Base model

krea/Krea-2-Raw
Quantized
(17)
this model