mingyi456
/

Z-Image-Turbo-DF11

Diffusion Single File

Model card Files Files and versions

mingyi456 commited on Jan 28

Commit

fe6849f

·

verified ·

1 Parent(s): b11f28b

Update README.md

Files changed (1) hide show

README.md +95 -1

README.md CHANGED Viewed

@@ -10,4 +10,98 @@ pipeline_tag: text-to-image
 library_name: diffusers
 tags:
 - diffusion-single-file
----

 library_name: diffusers
 tags:
 - diffusion-single-file
+---
+For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
+Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.
+### How to Use
+#### `diffusers`
+```python
+import torch
+from diffusers import ZImagePipeline, ZImageTransformer2DModel
+from dfloat11 import DFloat11Model
+from transformers.modeling_utils import no_init_weights
+text_encoder = DFloat11Model.from_pretrained("DFloat11/Qwen3-4B-DF11", device="cpu")
+with no_init_weights():
+	transformer = ZImageTransformer2DModel.from_config(
+		ZImageTransformer2DModel.load_config(
+			"Tongyi-MAI/Z-Image-Turbo", subfolder="transformer"
+		),
+		torch_dtype=torch.bfloat16
+	).to(torch.bfloat16)
+DFloat11Model.from_pretrained("mingyi456/Z-Image-Turbo-DF11", device="cpu", bfloat16_model=transformer)
+pipe = ZImagePipeline.from_pretrained(
+    "Tongyi-MAI/Z-Image-Turbo",
+    text_encoder=text_encoder,
+    transformer=transformer,
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=False,
+)
+pipe.to("cuda")
+prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
+# 2. Generate Image
+image = pipe(
+    prompt=prompt,
+    height=1024,
+    width=1024,
+    num_inference_steps=9,  # This actually results in 8 DiT forwards
+    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
+    generator=torch.Generator("cuda").manual_seed(42),
+).images[0]
+image.save("example.png")
+```
+#### ComfyUI
+Refer to this [model](mingyi456/Z-Image-Turbo-DF11-ComfyUI) instead.
+### Compression details
+This is the `pattern_dict` for compression:
+```python
+pattern_dict = {
+    r"noise_refiner\.\d+": (
+        "attention.to_q",
+        "attention.to_k",
+        "attention.to_v",
+        "attention.to_out.0",
+        "feed_forward.w1",
+        "feed_forward.w2",
+        "feed_forward.w3",
+        "adaLN_modulation.0"
+    ),
+    r"context_refiner\.\d+": (
+        "attention.to_q",
+        "attention.to_k",
+        "attention.to_v",
+        "attention.to_out.0",
+        "feed_forward.w1",
+        "feed_forward.w2",
+        "feed_forward.w3",
+    ),
+    r"layers\.\d+": (
+        "attention.to_q",
+        "attention.to_k",
+        "attention.to_v",
+        "attention.to_out.0",
+        "feed_forward.w1",
+        "feed_forward.w2",
+        "feed_forward.w3",
+        "adaLN_modulation.0"
+    ),
+    r"cap_embedder": (
+        "1",
+    )
+}
+```