mingyi456
/

LongCat-Image-Edit-Turbo-DF11

Image-Text-to-Image

Diffusion Single File

Model card Files Files and versions

LongCat-Image-Edit-Turbo-DF11 / README.md

mingyi456's picture

Update README.md

e7306f6 verified 29 days ago

|

history blame contribute delete

2.82 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- meituan-longcat/LongCat-Image-Edit-Turbo
	base_model_relation: quantized
	pipeline_tag: image-text-to-image
	library_name: diffusers
	tags:
	- diffusion-single-file
	---
	For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

	Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.

	### How to Use

	#### `diffusers`

	```python
	import torch
	from diffusers import LongCatImageEditPipeline, LongCatImageTransformer2DModel

	# for newer versions of `transformers`, it seems that from transformers.initialization import no_init_weights is required instead
	from transformers.modeling_utils import no_init_weights

	with no_init_weights():
	transformer = LongCatImageTransformer2DModel.from_config(
	LongCatImageTransformer2DModel.load_config(
	"meituan-longcat/LongCat-Image-Edit-Turbo", subfolder="transformer"
	),
	torch_dtype=torch.bfloat16
	).to(torch.bfloat16)
	DFloat11Model.from_pretrained(
	"mingyi456/LongCat-Image-Edit-Turbo-DF11",
	device="cpu",
	bfloat16_model=transformer,
	)
	pipe = LongCatImageEditPipeline.from_pretrained(
	"meituan-longcat/LongCat-Image-Edit-Turbo",
	transformer=transformer,
	torch_dtype=torch.bfloat16
	)
	DFloat11Model.from_pretrained(
	"mingyi456/Qwen2.5-VL-7B-Instruct-DF11",
	device="cpu",
	bfloat16_model=pipe.text_encoder,
	)
	pipe.enable_model_cpu_offload()
	img = Image.open('assets/test.png').convert('RGB')
	prompt = '将猫变成狗'
	image = pipe(
	img,
	prompt,
	negative_prompt='',
	guidance_scale=1.0,
	num_inference_steps=8,
	num_images_per_prompt=1,
	generator=torch.Generator("cpu").manual_seed(43)
	).images[0]
	image.save('image longcat-image-edit.png')
	```

	#### ComfyUI
	Currently, this model is not supported natively in ComfyUI. Do let me know if it receives native support, and I will get to supporting it.

	### Compression details

	This is the `pattern_dict` for compression:

	```python
	pattern_dict = {
	r"transformer_blocks\.\d+": (
	"norm1.linear",
	"norm1_context.linear",
	"attn.to_q",
	"attn.to_k",
	"attn.to_v",
	"attn.to_out.0",
	"attn.add_q_proj",
	"attn.add_k_proj",
	"attn.add_v_proj",
	"attn.to_add_out",
	"ff.net.0.proj",
	"ff.net.2",
	"ff_context.net.0.proj",
	"ff_context.net.2",
	),
	r"single_transformer_blocks\.\d+": (
	"norm.linear",
	"proj_mlp",
	"proj_out",
	"attn.to_q",
	"attn.to_k",
	"attn.to_v",
	),
	}
	```