mingyi456
/

LongCat-Image-Edit-Turbo-DF11

Image-Text-to-Image

Diffusion Single File

Model card Files Files and versions

mingyi456 commited on Feb 7

Commit

e7306f6

·

verified ·

1 Parent(s): 6694446

Update README.md

Files changed (1) hide show

README.md +88 -1

README.md CHANGED Viewed

@@ -10,4 +10,91 @@ pipeline_tag: image-text-to-image
 library_name: diffusers
 tags:
 - diffusion-single-file
----

 library_name: diffusers
 tags:
 - diffusion-single-file
+---
+For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
+Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.
+### How to Use
+#### `diffusers`
+```python
+import torch
+from diffusers import LongCatImageEditPipeline, LongCatImageTransformer2DModel
+# for newer versions of `transformers`, it seems that from transformers.initialization import no_init_weights is required instead
+from transformers.modeling_utils import no_init_weights
+with no_init_weights():
+    transformer = LongCatImageTransformer2DModel.from_config(
+        LongCatImageTransformer2DModel.load_config(
+            "meituan-longcat/LongCat-Image-Edit-Turbo", subfolder="transformer"
+        ),
+        torch_dtype=torch.bfloat16
+    ).to(torch.bfloat16)
+DFloat11Model.from_pretrained(
+    "mingyi456/LongCat-Image-Edit-Turbo-DF11",
+    device="cpu",
+    bfloat16_model=transformer,
+)
+pipe = LongCatImageEditPipeline.from_pretrained(
+    "meituan-longcat/LongCat-Image-Edit-Turbo",
+    transformer=transformer,
+    torch_dtype=torch.bfloat16
+)
+DFloat11Model.from_pretrained(
+    "mingyi456/Qwen2.5-VL-7B-Instruct-DF11",
+    device="cpu",
+    bfloat16_model=pipe.text_encoder,
+)
+pipe.enable_model_cpu_offload()
+img = Image.open('assets/test.png').convert('RGB')
+prompt = '将猫变成狗'
+image = pipe(
+    img,
+    prompt,
+    negative_prompt='',
+    guidance_scale=1.0,
+    num_inference_steps=8,
+    num_images_per_prompt=1,
+    generator=torch.Generator("cpu").manual_seed(43)
+).images[0]
+image.save('image longcat-image-edit.png')
+```
+#### ComfyUI
+Currently, this model is not supported natively in ComfyUI. Do let me know if it receives native support, and I will get to supporting it.
+### Compression details
+This is the `pattern_dict` for compression:
+```python
+pattern_dict = {
+    r"transformer_blocks\.\d+": (
+        "norm1.linear",
+        "norm1_context.linear",
+        "attn.to_q",
+        "attn.to_k",
+        "attn.to_v",
+        "attn.to_out.0",
+        "attn.add_q_proj",
+        "attn.add_k_proj",
+        "attn.add_v_proj",
+        "attn.to_add_out",
+        "ff.net.0.proj",
+        "ff.net.2",
+        "ff_context.net.0.proj",
+        "ff_context.net.2",
+    ),
+    r"single_transformer_blocks\.\d+": (
+        "norm.linear",
+        "proj_mlp",
+        "proj_out",
+        "attn.to_q",
+        "attn.to_k",
+        "attn.to_v",
+    ),
+}
+```