LongCat-Image

Introduction

We introduce LongCat-Image-Edit-Turbo, the distilled version of LongCat-Image-Edit. It achieves high-quality image editing with only 8 NFEs (Number of Function Evaluations) , offering extremely low inference latency.

LongCat-Image-Edit model

Installation

pip install git+https://github.com/huggingface/diffusers

Run Image Editing

πŸ“ Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese β€˜...’ / β€œ...” styles are supported).

Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.

import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline

if __name__ == '__main__':
    device = torch.device('cuda')
    pipe = LongCatImageEditPipeline.from_pretrained("meituan-longcat/LongCat-Image-Edit-Turbo", torch_dtype= torch.bfloat16 )
    # pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
    pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~18 GB); slower but prevents OOM
    img = Image.open('assets/test.png').convert('RGB')
    prompt = 'ε°†ηŒ«ε˜ζˆη‹—'
    image = pipe(
        img,
        prompt,
        negative_prompt='',
        guidance_scale=1,
        num_inference_steps=8,
        num_images_per_prompt=1,
        generator=torch.Generator("cpu").manual_seed(43)
    ).images[0]
    image.save('./edit_example.png')
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for meituan-longcat/LongCat-Image-Edit-Turbo