Introduction
We introduce LongCat-Image-Edit-Turbo, the distilled version of LongCat-Image-Edit. It achieves high-quality image editing with only 8 NFEs (Number of Function Evaluations) , offering extremely low inference latency.
Installation
pip install git+https://github.com/huggingface/diffusers
Run Image Editing
π Special Handling for Text Rendering
For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese β...β / β...β styles are supported).
Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.
import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline
if __name__ == '__main__':
device = torch.device('cuda')
pipe = LongCatImageEditPipeline.from_pretrained("meituan-longcat/LongCat-Image-Edit-Turbo", torch_dtype= torch.bfloat16 )
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~18 GB); slower but prevents OOM
img = Image.open('assets/test.png').convert('RGB')
prompt = 'ε°η«εζη'
image = pipe(
img,
prompt,
negative_prompt='',
guidance_scale=1,
num_inference_steps=8,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43)
).images[0]
image.save('./edit_example.png')
- Downloads last month
- -
Paper for meituan-longcat/LongCat-Image-Edit-Turbo
Paper
β’
2512.07584
β’
Published
β’
23