|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- meituan-longcat/LongCat-Image |
|
|
base_model_relation: quantized |
|
|
pipeline_tag: text-to-image |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
--- |
|
|
For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11 |
|
|
|
|
|
Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult. |
|
|
|
|
|
### How to Use |
|
|
|
|
|
#### `diffusers` |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import LongCatImagePipeline, LongCatImageTransformer2DModel |
|
|
from transformers.modeling_utils import no_init_weights |
|
|
|
|
|
with no_init_weights(): |
|
|
transformer = LongCatImageTransformer2DModel.from_config( |
|
|
LongCatImageTransformer2DModel.load_config( |
|
|
"meituan-longcat/LongCat-Image", subfolder="transformer" |
|
|
), |
|
|
torch_dtype=torch.bfloat16 |
|
|
).to(torch.bfloat16) |
|
|
|
|
|
DFloat11Model.from_pretrained( |
|
|
"mingyi456/LongCat-Image-DF11", |
|
|
device="cpu", |
|
|
bfloat16_model=transformer, |
|
|
) |
|
|
|
|
|
pipe = LongCatImagePipeline.from_pretrained( |
|
|
"meituan-longcat/LongCat-Image", |
|
|
transformer=transformer, |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
DFloat11Model.from_pretrained( |
|
|
"mingyi456/Qwen2.5-VL-7B-Instruct-DF11", |
|
|
device="cpu", |
|
|
bfloat16_model=pipe.text_encoder, |
|
|
) |
|
|
|
|
|
pipe.enable_model_cpu_offload() |
|
|
prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。' |
|
|
|
|
|
image = pipe( |
|
|
prompt, |
|
|
height=768, |
|
|
width=1344, |
|
|
guidance_scale=4.0, |
|
|
num_inference_steps=50, |
|
|
num_images_per_prompt=1, |
|
|
generator=torch.Generator("cpu").manual_seed(43), |
|
|
enable_cfg_renorm=True, |
|
|
enable_prompt_rewrite=True |
|
|
).images[0] |
|
|
image.save('image longcat-image.png') |
|
|
``` |
|
|
|
|
|
#### ComfyUI |
|
|
Currently, this model is not supported natively in ComfyUI. Do let me know if it receives native support, and I will get to supporting it. |
|
|
|
|
|
### Compression details |
|
|
|
|
|
This is the `pattern_dict` for compression: |
|
|
|
|
|
```python |
|
|
pattern_dict = { |
|
|
r"transformer_blocks\.\d+": ( |
|
|
"norm1.linear", |
|
|
"norm1_context.linear", |
|
|
"attn.to_q", |
|
|
"attn.to_k", |
|
|
"attn.to_v", |
|
|
"attn.to_out.0", |
|
|
"attn.add_q_proj", |
|
|
"attn.add_k_proj", |
|
|
"attn.add_v_proj", |
|
|
"attn.to_add_out", |
|
|
"ff.net.0.proj", |
|
|
"ff.net.2", |
|
|
"ff_context.net.0.proj", |
|
|
"ff_context.net.2", |
|
|
), |
|
|
r"single_transformer_blocks\.\d+": ( |
|
|
"norm.linear", |
|
|
"proj_mlp", |
|
|
"proj_out", |
|
|
"attn.to_q", |
|
|
"attn.to_k", |
|
|
"attn.to_v", |
|
|
), |
|
|
} |
|
|
``` |