File size: 3,515 Bytes
d175f96
 
 
 
 
 
 
 
 
 
 
 
36d5825
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: apache-2.0
language:
- en
- zh
base_model:
- Tongyi-MAI/Z-Image
base_model_relation: quantized
pipeline_tag: text-to-image
library_name: diffusers
tags:
- diffusion-single-file
---
For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

### How to Use

#### `diffusers`

```python
import torch
from diffusers import ZImagePipeline, ZImageTransformer2DModel
from dfloat11 import DFloat11Model
from transformers.modeling_utils import no_init_weights
text_encoder = DFloat11Model.from_pretrained("DFloat11/Qwen3-4B-DF11", device="cpu")
with no_init_weights():
	transformer = ZImageTransformer2DModel.from_config(
		ZImageTransformer2DModel.load_config(
			"Tongyi-MAI/Z-Image", subfolder="transformer"
		),
		torch_dtype=torch.bfloat16
	).to(torch.bfloat16)
DFloat11Model.from_pretrained("mingyi456/Z-Image-DF11", device="cpu", bfloat16_model=transformer)
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image",
    text_encoder=text_encoder,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

prompt = "两名年轻亚裔女性紧密站在一起,背景为朴素的灰色纹理墙面,可能是室内地毯地面。左侧女性留着长卷发,身穿藏青色毛衣,左袖有奶油色褶皱装饰,内搭白色立领衬衫,下身白色裤子;佩戴小巧金色耳钉,双臂交叉于背后。右侧女性留直肩长发,身穿奶油色卫衣,胸前印有“Tun the tables”字样,下方为“New ideas”,搭配白色裤子;佩戴银色小环耳环,双臂交叉于胸前。两人均面带微笑直视镜头。照片,自然光照明,柔和阴影,以藏青、奶油白为主的中性色调,休闲时尚摄影,中等景深,面部和上半身对焦清晰,姿态放松,表情友好,室内环境,地毯地面,纯色背景。"
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1280,
    width=720,
    cfg_normalization=False,
    num_inference_steps=50,
    guidance_scale=4,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("example.png")
```

#### ComfyUI
Refer to this [model](https://huggingface.co/mingyi456/Z-Image-DF11-ComfyUI) instead.

### Compression details

This is the `pattern_dict` for compression:

```python
pattern_dict = {
    r"noise_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"context_refiner\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
    ),
    r"layers\.\d+": (
        "attention.to_q",
        "attention.to_k",
        "attention.to_v",
        "attention.to_out.0",
        "feed_forward.w1",
        "feed_forward.w2",
        "feed_forward.w3",
        "adaLN_modulation.0"
    ),
    r"cap_embedder": (
        "1",
    )
}
```