|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- WeiChow/CrispEdit-2M |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: image-to-image |
|
|
tags: |
|
|
- image-edit |
|
|
base_model: |
|
|
- google/gemma-2-2b-it |
|
|
- MeissonFlow/Meissonic |
|
|
--- |
|
|
# EditMGT |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
[](https://arxiv.org/abs/2512.11715) |
|
|
[](https://huggingface.co/datasets/WeiChow/CrispEdit-2M) |
|
|
[](https://huggingface.co/WeiChow/EditMGT) |
|
|
[](https://github.com/weichow23/EditMGT/tree/main) |
|
|
[](https://weichow23.github.io/EditMGT/) |
|
|
[](https://www.python.org/downloads/release/python-392/) |
|
|
|
|
|
</div> |
|
|
|
|
|
## π Overview |
|
|
|
|
|
This is the official repository for **EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing** β¨. |
|
|
|
|
|
EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="asset/editmgt.png" alt="EditMGT Architecture" width="800px"> |
|
|
</p> |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
- π¨ Great style transfer capabilities |
|
|
- π Attention control over editing regions |
|
|
- β‘ The model backbone is only 960M, resulting in fast inference speed. |
|
|
- π Trained on the [CrispEdit-2M]() dataset |
|
|
|
|
|
## β‘ Quick Start |
|
|
|
|
|
First, clone the repository and navigate to the project root: |
|
|
```shell |
|
|
git clone https://github.com/weichow23/editmgt |
|
|
cd editmgt |
|
|
``` |
|
|
|
|
|
## π§ Environment Setup |
|
|
|
|
|
```bash |
|
|
# Create and activate conda environment |
|
|
conda create --name editmgt python=3.9.2 |
|
|
conda activate editmgt |
|
|
|
|
|
# Optional: Install system dependencies |
|
|
sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y |
|
|
|
|
|
# Install Python dependencies |
|
|
pip3 install git+https://github.com/openai/CLIP |
|
|
pip3 install -r requirements.txt |
|
|
``` |
|
|
|
|
|
β οΈ **Note**: If you encounter any strange environment library errors, please refer to [Issues](https://github.com/viiika/Meissonic/issues/14) to find the correct version that might fix the error. |
|
|
|
|
|
## π Inference |
|
|
|
|
|
Run the following script in the `editmgt` directory: |
|
|
|
|
|
```python |
|
|
import os |
|
|
import sys |
|
|
sys.path.append("./") |
|
|
from PIL import Image |
|
|
from src.editmgt import init_edit_mgt |
|
|
from src.v2_model import negative_prompt |
|
|
|
|
|
if __name__ == "__main__": |
|
|
pipe = init_edit_mgt(device='cuda:0') |
|
|
# Forcing the use of bf16 can improve speed, but it will incur a performance penalty. |
|
|
# We noticed that GEditBench dropped by about 0.8. |
|
|
# pipe = init_edit_mgt(device, enable_bf16=False) |
|
|
|
|
|
# pipe.local_guidance=0.01 # After starting, it will use the local GS auxiliary mode. |
|
|
# pipe.local_query_text = 'owl' # Use specific words as attention queries |
|
|
# pipe.attention_enable_blocks = [i for i in range(28, 37)] # attention layer used |
|
|
input_image = Image.open('assets/case_5.jspg') |
|
|
result = pipe( |
|
|
prompt=['Make it into Ghibli style'], |
|
|
height=1024, |
|
|
width=1024, |
|
|
num_inference_steps=36, # For some simple tasks, 16 steps are enough! |
|
|
guidance_scale=6, |
|
|
reference_strength=1.1, |
|
|
reference_image=[input_image.resize((1024, 1024))], |
|
|
negative_prompt=negative_prompt or None, |
|
|
) |
|
|
output_dir = "./output" |
|
|
os.makedirs(output_dir, exist_ok=True) |
|
|
|
|
|
file_path = os.path.join(output_dir, f"edited_case_5.png") |
|
|
w, h = input_image.size |
|
|
result.images[0].resize((w, h)).save(file_path) |
|
|
``` |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@article{chow2025editmgt, |
|
|
title={EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing}, |
|
|
author={Chow, Wei and Li, Linfeng and Kong, Lingdong and Li, Zefeng and Xu, Qi and Song, Hang and Ye, Tian and Wang, Xian and Bai, Jinbin and Xu, Shilin and others}, |
|
|
journal={arXiv preprint arXiv:2512.11715}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
We extend our sincere gratitude to all contributors and the research community for their valuable feedback and support in the development of this project. |