EditMGT / README.md

Update README.md

0a00f14 verified about 2 months ago

4.27 kB

	---
	license: apache-2.0
	datasets:
	- WeiChow/CrispEdit-2M
	language:
	- en
	pipeline_tag: image-to-image
	tags:
	- image-edit
	base_model:
	- google/gemma-2-2b-it
	- MeissonFlow/Meissonic
	---
	# EditMGT

	<div align="center">

	[![arXiv](https://img.shields.io/badge/arXiv-2512.11715-b31b1b.svg)](https://arxiv.org/abs/2512.11715)
	[![Dataset](https://img.shields.io/badge/🤗%20CrispEdit2M-Dataset-yellow)](https://huggingface.co/datasets/WeiChow/CrispEdit-2M)
	[![Checkpoint](https://img.shields.io/badge/🧨%20EditMGT-CKPT-blue)](https://huggingface.co/WeiChow/EditMGT)
	[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/weichow23/EditMGT/tree/main)
	[![Page](https://img.shields.io/badge/🏠%20Home-Page-b3.svg)](https://weichow23.github.io/EditMGT/)
	[![Python 3.9](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python)](https://www.python.org/downloads/release/python-392/)

	</div>

	## 🌟 Overview

	This is the official repository for EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing ✨.

	EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.

	<p align="center">
	<img src="asset/editmgt.png" alt="EditMGT Architecture" width="800px">
	</p>

	## ✨ Features

	- 🎨 Great style transfer capabilities
	- 🔍 Attention control over editing regions
	- ⚡ The model backbone is only 960M, resulting in fast inference speed.
	- 📊 Trained on the [CrispEdit-2M]() dataset

	## ⚡ Quick Start

	First, clone the repository and navigate to the project root:
	```shell
	git clone https://github.com/weichow23/editmgt
	cd editmgt
	```

	## 🔧 Environment Setup

	```bash
	# Create and activate conda environment
	conda create --name editmgt python=3.9.2
	conda activate editmgt

	# Optional: Install system dependencies
	sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y

	# Install Python dependencies
	pip3 install git+https://github.com/openai/CLIP
	pip3 install -r requirements.txt
	```

	⚠️ Note: If you encounter any strange environment library errors, please refer to [Issues](https://github.com/viiika/Meissonic/issues/14) to find the correct version that might fix the error.

	## 🚀 Inference

	Run the following script in the `editmgt` directory:

	```python
	import os
	import sys
	sys.path.append("./")
	from PIL import Image
	from src.editmgt import init_edit_mgt
	from src.v2_model import negative_prompt

	if __name__ == "__main__":
	pipe = init_edit_mgt(device='cuda:0')
	# Forcing the use of bf16 can improve speed, but it will incur a performance penalty.
	# We noticed that GEditBench dropped by about 0.8.
	# pipe = init_edit_mgt(device, enable_bf16=False)

	# pipe.local_guidance=0.01 # After starting, it will use the local GS auxiliary mode.
	# pipe.local_query_text = 'owl' # Use specific words as attention queries
	# pipe.attention_enable_blocks = [i for i in range(28, 37)] # attention layer used
	input_image = Image.open('assets/case_5.jspg')
	result = pipe(
	prompt=['Make it into Ghibli style'],
	height=1024,
	width=1024,
	num_inference_steps=36, # For some simple tasks, 16 steps are enough!
	guidance_scale=6,
	reference_strength=1.1,
	reference_image=[input_image.resize((1024, 1024))],
	negative_prompt=negative_prompt or None,
	)
	output_dir = "./output"
	os.makedirs(output_dir, exist_ok=True)

	file_path = os.path.join(output_dir, f"edited_case_5.png")
	w, h = input_image.size
	result.images[0].resize((w, h)).save(file_path)
	```

	## 📑 Citation

	```bibtex
	@article{chow2025editmgt,
	title={EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing},
	author={Chow, Wei and Li, Linfeng and Kong, Lingdong and Li, Zefeng and Xu, Qi and Song, Hang and Ye, Tian and Wang, Xian and Bai, Jinbin and Xu, Shilin and others},
	journal={arXiv preprint arXiv:2512.11715},
	year={2025}
	}
	```

	## 🙏 Acknowledgements

	We extend our sincere gratitude to all contributors and the research community for their valuable feedback and support in the development of this project.