Instructions to use WaveCut/Krea-2-Turbo-SDNQ-uint4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Krea-2-Turbo-SDNQ-uint4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Krea-2-Turbo-SDNQ-uint4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Krea 2 Turbo SDNQ UINT4
SDNQ UINT4 quantization of krea/Krea-2-Turbo for Diffusers Krea2Pipeline.
What Is Quantized
Selected recipe: uint4-static-transformer-only.
Quantized components: transformer.
Tokenizer, scheduler, and non-selected pipeline components are copied from the original Diffusers pipeline.
The initial smoke sweep also tried SDNQ packing for the text encoder, but standard Diffusers/Transformers loading rejected the packed Qwen3VLModel text-encoder weights. This release therefore keeps the text encoder loadable in bf16 and quantizes the Krea transformer only.
Benchmark Setup
- Pipeline:
Krea2Pipeline - Resolution: 2048x2048
- Steps: 8
- Guidance scale: 0.0
- Seed base: 61000
- Distilled mode:
true - Torch dtype: bfloat16
- Attention backend: chunked-native query attention, chunk_size=1024
- Prompt set: 10 prompts covering simple scenes, public-domain style stress tests, tricky composition, long Latin text, long Cyrillic text, and mixed Latin/Cyrillic diagrams
- Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition on a disposable RunPod pod with local container disk
Benchmark Summary
| Model | Load | First gen | Hot mean | Hot max | Load GPU peak | Gen GPU peak | Torch peak |
|---|---|---|---|---|---|---|---|
| original | 7.019 s | 91.147 s | 90.487 s | 90.712 s | 33553 MB | 52313 MB | 50135.29150390625 MB |
| uint4-static-transformer-only | 4.792 s | 90.243 s | 88.561 s | 88.776 s | 16189 MB | 34865 MB | 32690.51123046875 MB |
Storage size of this release directory: 15.46 GB. Quantized local checkpoint size before packaging: 15.36 GB.
Raw per-prompt metrics are available in benchmark/*.csv and benchmark/*.jsonl. The combined benchmark summary is in benchmark/summary.json.
Usage
pip install -U git+https://github.com/huggingface/diffusers.git transformers accelerate safetensors huggingface_hub sdnq
import torch
from diffusers import Krea2Pipeline
from sdnq.loader import apply_sdnq_options_to_model
repo_id = "WaveCut/Krea-2-Turbo-SDNQ-uint4"
device = "cuda"
pipe = Krea2Pipeline.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
is_distilled=True,
)
for name in ['transformer']:
module = getattr(pipe, name, None)
if module is not None:
setattr(
pipe,
name,
apply_sdnq_options_to_model(module, dtype=torch.bfloat16, use_quantized_matmul=True),
)
pipe.to(device)
image = pipe(
prompt="A clean technical poster with readable labels",
height=2048,
width=2048,
num_inference_steps=8,
guidance_scale=0.0,
generator=torch.Generator(device=device).manual_seed(0),
).images[0]
image.save("krea2-sdnq.png")
Quantization Recipe
{
"dynamic_loss_threshold": null,
"modules": [
"transformer"
],
"name": "uint4-static-transformer-only",
"quant_conv": false,
"quant_embedding": false,
"svd_rank": 32,
"svd_steps": 32,
"use_dynamic_quantization": false,
"use_svd": false,
"weights_dtype": "uint4"
}
The checkpoint was produced by loading the original Diffusers pipeline, applying sdnq_post_load_quant only to the listed pipeline components, and saving with save_sdnq_model(..., is_pipeline=True).
Limitations
- This is a quantized derivative and inherits the base model behavior, limits, and license terms.
- The comparison set is a deployment smoke benchmark, not a preference study or FID evaluation.
- Long text, small labels, and mixed Cyrillic/Latin diagrams should be inspected manually before production use.
- Benchmark numbers depend on GPU, driver, PyTorch, Diffusers, SDNQ, and CUDA versions.
License
This repository contains a quantized derivative of krea/Krea-2-Turbo. Upstream license material copied during packaging: LICENSE.pdf. Review the upstream Krea model card and license before use or redistribution.
- Downloads last month
- 103
