File size: 4,636 Bytes
df80b82 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
license: apache-2.0
language:
- en
- zh
pipeline_tag: image-to-image
library_name: transformers
---
<div align="center">
<img src="assets/longcat-image_logo.svg" width="45%" alt="LongCat-Image" />
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href='https://arxiv.org/pdf/2512.07584'><img src='https://img.shields.io/badge/Technical-Report-red'></a>
<a href='https://github.com/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/GitHub-Code-black'></a>
<a href='https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/figures/wechat_official_accounts.png'><img src='https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white'></a>
<a href='https://x.com/Meituan_LongCat'><img src='https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white'></a>
</div>
<div align="center" style="line-height: 1;">
[//]: # ( <a href='https://meituan-longcat.github.io/LongCat-Image/'><img src='https://img.shields.io/badge/Project-Page-green'></a>)
<a href='https://huggingface.co/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image-blue'></a>
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Dev'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Dev-blue'></a>
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Edit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Edit-blue'></a>
</div>
## Introduction
We introduce **LongCat-Image-Edit**, the image editing version of Longcat-Image. LongCat-Image-Edit supports bilingual (Chinese-English) editing, achieves state-of-the-art performance among open-source image editing models, delivering leading instruction-following and image quality with superior visual consistency.
<div align="center">
<img src="assets/model_struct_edit.png" width="90%" alt="LongCat-Image-Edit model" />
</div>
### Key Features
- π **Superior Precise Editing**: LongCat-Image-Edit supports various editing tasks, such as global editing, local editing, text modification, and reference-guided editing. It has strong semantic understanding capabilities and can perform precise editing according to instructions.
- π **Consistency Preservation**: LongCat-Image-Edit has strong consistency preservation capabilities, specifically scrutinizes whether attributes in non-edited regions, such as layout, texture, color tone, and subject identity, remain invariant unless targeted by the instruction, is well demonstrated in multi-turn editing.
- π **Strong Benchmark Performance**: LongCat-Image-Edit achieves state-of-the-art (SOTA) performance in image editing tasks while significantly improving model inference efficiency, especially among open-source image editing models.
## π¨ Showcase
<div align="center">
<img src="assets/image_edit_gallery.jpg" width="90%" alt="LongCat-Image-Edit gallery." />
</div>
## Quick Start
[Hugging Face app](https://huggingface.co/spaces/anycoderapps/LongCat-Image-Edit)
### Installation
```shell
pip install git+https://github.com/huggingface/diffusers
```
### Run Image Editing
> [!CAUTION]
> **π Special Handling for Text Rendering**
>
> For both Text-to-Image and Image Editing tasks involving text generation, **you must enclose the target text within single or double quotation marks** (both English '...' / "..." and Chinese β...β / β...β styles are supported).
>
> **Reasoning:** The model utilizes a specialized **character-level encoding** strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.
>
```python
import torch
from PIL import Image
from diffusers import LongCatImageEditPipeline
if __name__ == '__main__':
device = torch.device('cuda')
pipe = LongCatImageEditPipeline.from_pretrained("meituan-longcat/LongCat-Image-Edit", torch_dtype= torch.bfloat16 )
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~18 GB); slower but prevents OOM
img = Image.open('assets/test.png').convert('RGB')
prompt = 'ε°η«εζη'
image = pipe(
img,
prompt,
negative_prompt='',
guidance_scale=4.5,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43)
).images[0]
image.save('./edit_example.png')
``` |