|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- Tongyi-MAI/Z-Image |
|
|
base_model_relation: quantized |
|
|
pipeline_tag: text-to-image |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
--- |
|
|
For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11 |
|
|
|
|
|
Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult. |
|
|
|
|
|
### How to Use |
|
|
|
|
|
#### `diffusers` |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import ZImagePipeline, ZImageTransformer2DModel |
|
|
from dfloat11 import DFloat11Model |
|
|
from transformers.modeling_utils import no_init_weights |
|
|
text_encoder = DFloat11Model.from_pretrained("DFloat11/Qwen3-4B-DF11", device="cpu") |
|
|
with no_init_weights(): |
|
|
transformer = ZImageTransformer2DModel.from_config( |
|
|
ZImageTransformer2DModel.load_config( |
|
|
"Tongyi-MAI/Z-Image", subfolder="transformer" |
|
|
), |
|
|
torch_dtype=torch.bfloat16 |
|
|
).to(torch.bfloat16) |
|
|
DFloat11Model.from_pretrained("mingyi456/Z-Image-DF11", device="cpu", bfloat16_model=transformer) |
|
|
pipe = ZImagePipeline.from_pretrained( |
|
|
"Tongyi-MAI/Z-Image", |
|
|
text_encoder=text_encoder, |
|
|
transformer=transformer, |
|
|
torch_dtype=torch.bfloat16, |
|
|
low_cpu_mem_usage=False, |
|
|
) |
|
|
pipe.to("cuda") |
|
|
|
|
|
prompt = "两名年轻亚裔女性紧密站在一起,背景为朴素的灰色纹理墙面,可能是室内地毯地面。左侧女性留着长卷发,身穿藏青色毛衣,左袖有奶油色褶皱装饰,内搭白色立领衬衫,下身白色裤子;佩戴小巧金色耳钉,双臂交叉于背后。右侧女性留直肩长发,身穿奶油色卫衣,胸前印有“Tun the tables”字样,下方为“New ideas”,搭配白色裤子;佩戴银色小环耳环,双臂交叉于胸前。两人均面带微笑直视镜头。照片,自然光照明,柔和阴影,以藏青、奶油白为主的中性色调,休闲时尚摄影,中等景深,面部和上半身对焦清晰,姿态放松,表情友好,室内环境,地毯地面,纯色背景。" |
|
|
negative_prompt = "" # Optional, but would be powerful when you want to remove some unwanted content |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
height=1280, |
|
|
width=720, |
|
|
cfg_normalization=False, |
|
|
num_inference_steps=50, |
|
|
guidance_scale=4, |
|
|
generator=torch.Generator("cuda").manual_seed(42), |
|
|
).images[0] |
|
|
|
|
|
image.save("example.png") |
|
|
``` |
|
|
|
|
|
#### ComfyUI |
|
|
Refer to this [model](https://huggingface.co/mingyi456/Z-Image-DF11-ComfyUI) instead. |
|
|
|
|
|
### Compression details |
|
|
|
|
|
This is the `pattern_dict` for compression: |
|
|
|
|
|
```python |
|
|
pattern_dict = { |
|
|
r"noise_refiner\.\d+": ( |
|
|
"attention.to_q", |
|
|
"attention.to_k", |
|
|
"attention.to_v", |
|
|
"attention.to_out.0", |
|
|
"feed_forward.w1", |
|
|
"feed_forward.w2", |
|
|
"feed_forward.w3", |
|
|
"adaLN_modulation.0" |
|
|
), |
|
|
r"context_refiner\.\d+": ( |
|
|
"attention.to_q", |
|
|
"attention.to_k", |
|
|
"attention.to_v", |
|
|
"attention.to_out.0", |
|
|
"feed_forward.w1", |
|
|
"feed_forward.w2", |
|
|
"feed_forward.w3", |
|
|
), |
|
|
r"layers\.\d+": ( |
|
|
"attention.to_q", |
|
|
"attention.to_k", |
|
|
"attention.to_v", |
|
|
"attention.to_out.0", |
|
|
"feed_forward.w1", |
|
|
"feed_forward.w2", |
|
|
"feed_forward.w3", |
|
|
"adaLN_modulation.0" |
|
|
), |
|
|
r"cap_embedder": ( |
|
|
"1", |
|
|
) |
|
|
} |
|
|
``` |
|
|
|