| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - meituan-longcat/LongCat-Image-Edit-Turbo |
| | base_model_relation: quantized |
| | pipeline_tag: image-text-to-image |
| | library_name: diffusers |
| | tags: |
| | - diffusion-single-file |
| | --- |
| | For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11 |
| |
|
| | Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult. |
| |
|
| | ### How to Use |
| |
|
| | #### `diffusers` |
| |
|
| | ```python |
| | import torch |
| | from diffusers import LongCatImageEditPipeline, LongCatImageTransformer2DModel |
| | |
| | # for newer versions of `transformers`, it seems that from transformers.initialization import no_init_weights is required instead |
| | from transformers.modeling_utils import no_init_weights |
| | |
| | with no_init_weights(): |
| | transformer = LongCatImageTransformer2DModel.from_config( |
| | LongCatImageTransformer2DModel.load_config( |
| | "meituan-longcat/LongCat-Image-Edit-Turbo", subfolder="transformer" |
| | ), |
| | torch_dtype=torch.bfloat16 |
| | ).to(torch.bfloat16) |
| | DFloat11Model.from_pretrained( |
| | "mingyi456/LongCat-Image-Edit-Turbo-DF11", |
| | device="cpu", |
| | bfloat16_model=transformer, |
| | ) |
| | pipe = LongCatImageEditPipeline.from_pretrained( |
| | "meituan-longcat/LongCat-Image-Edit-Turbo", |
| | transformer=transformer, |
| | torch_dtype=torch.bfloat16 |
| | ) |
| | DFloat11Model.from_pretrained( |
| | "mingyi456/Qwen2.5-VL-7B-Instruct-DF11", |
| | device="cpu", |
| | bfloat16_model=pipe.text_encoder, |
| | ) |
| | pipe.enable_model_cpu_offload() |
| | img = Image.open('assets/test.png').convert('RGB') |
| | prompt = '将猫变成狗' |
| | image = pipe( |
| | img, |
| | prompt, |
| | negative_prompt='', |
| | guidance_scale=1.0, |
| | num_inference_steps=8, |
| | num_images_per_prompt=1, |
| | generator=torch.Generator("cpu").manual_seed(43) |
| | ).images[0] |
| | image.save('image longcat-image-edit.png') |
| | ``` |
| |
|
| | #### ComfyUI |
| | Currently, this model is not supported natively in ComfyUI. Do let me know if it receives native support, and I will get to supporting it. |
| |
|
| | ### Compression details |
| |
|
| | This is the `pattern_dict` for compression: |
| |
|
| | ```python |
| | pattern_dict = { |
| | r"transformer_blocks\.\d+": ( |
| | "norm1.linear", |
| | "norm1_context.linear", |
| | "attn.to_q", |
| | "attn.to_k", |
| | "attn.to_v", |
| | "attn.to_out.0", |
| | "attn.add_q_proj", |
| | "attn.add_k_proj", |
| | "attn.add_v_proj", |
| | "attn.to_add_out", |
| | "ff.net.0.proj", |
| | "ff.net.2", |
| | "ff_context.net.0.proj", |
| | "ff_context.net.2", |
| | ), |
| | r"single_transformer_blocks\.\d+": ( |
| | "norm.linear", |
| | "proj_mlp", |
| | "proj_out", |
| | "attn.to_q", |
| | "attn.to_k", |
| | "attn.to_v", |
| | ), |
| | } |
| | ``` |
| |
|