File size: 2,772 Bytes
e2a8c63
 
 
 
 
 
 
 
3f721e3
e2a8c63
 
 
88108be
b3c4260
 
 
 
88108be
 
 
 
 
 
 
8014986
 
 
 
 
88108be
8014986
88108be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
license: apache-2.0
language:
- en
- zh
base_model:
- meituan-longcat/LongCat-Image-Edit
base_model_relation: quantized
pipeline_tag: image-text-to-image
library_name: diffusers
tags:
- diffusion-single-file
---
For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

### How to Use

#### `diffusers`

```python
import torch
from diffusers import LongCatImageEditPipeline, LongCatImageTransformer2DModel

# for transformers version >=5.0.0
# from transformers.initialization import no_init_weights

# else
from transformers.modeling_utils import no_init_weights

with no_init_weights():
    transformer = LongCatImageTransformer2DModel.from_config(
        LongCatImageTransformer2DModel.load_config(
            "meituan-longcat/LongCat-Image-Edit", subfolder="transformer"
        ),
        torch_dtype=torch.bfloat16
    ).to(torch.bfloat16)
DFloat11Model.from_pretrained(
    "mingyi456/LongCat-Image-Edit-DF11",
    device="cpu",
    bfloat16_model=transformer,
)
pipe = LongCatImageEditPipeline.from_pretrained(
    "meituan-longcat/LongCat-Image-Edit",
    transformer=transformer, 
    torch_dtype=torch.bfloat16
)
DFloat11Model.from_pretrained(
    "mingyi456/Qwen2.5-VL-7B-Instruct-DF11",
    device="cpu",
    bfloat16_model=pipe.text_encoder,
)
pipe.enable_model_cpu_offload()

img = Image.open('assets/test.png').convert('RGB')
prompt = '将猫变成狗'
image = pipe(
    img,
    prompt,
    negative_prompt='',
    guidance_scale=4.5,
    num_inference_steps=50,
    num_images_per_prompt=1,
    generator=torch.Generator("cpu").manual_seed(43)
).images[0]

image.save('image longcat-image-edit.png')
```

#### ComfyUI
Currently, this model is not supported natively in ComfyUI. Do let me know if it receives native support, and I will get to supporting it.

### Compression details

This is the `pattern_dict` for compression:

```python
pattern_dict = {
    r"transformer_blocks\.\d+": (
        "norm1.linear",
        "norm1_context.linear",
        "attn.to_q",
        "attn.to_k",
        "attn.to_v",
        "attn.to_out.0",
        "attn.add_q_proj",
        "attn.add_k_proj",
        "attn.add_v_proj",
        "attn.to_add_out",
        "ff.net.0.proj",
        "ff.net.2",
        "ff_context.net.0.proj",
        "ff_context.net.2",
    ),
    r"single_transformer_blocks\.\d+": (
        "norm.linear",
        "proj_mlp",
        "proj_out",
        "attn.to_q",
        "attn.to_k",
        "attn.to_v",
    ),
}
```