WeiChow
/

EditMGT

@@ -10,61 +10,107 @@ tags:
 base_model:
 - google/gemma-2-2b-it
 ---
-# EditMGT Model for HuggingFace Transformers
-This repository contains a HuggingFace Transformers-compatible implementation of the EditMGT model for image editing based on text instructions.
-## Installation
 ```bash
-pip install transformers pillow
-# Install other required dependencies from the original EditMGT repository
 ```
-## Usage
 ```python
-from eval.utils import init_edit_mgt
-from src.v2_model import negative_prompt
 from PIL import Image
-# Initialize the pipeline
-pipe = init_edit_mgt(
-    ckpt_path='./runs/editmgt-ct/checkpoint',
-    enable_fp16=False,
-    device='cuda:0',
-    use_ema=False
-)
-# Load your image
-reference_image = Image.open("your_image.jpg").resize((1024, 1024))
-# Generate edited image
-result = pipe(
-    prompt=["Make the sky more blue and add clouds"],
-    negative_prompt=[negative_prompt],
-    height=1024,
-    width=1024,
-    num_inference_steps=36,
-    guidance_scale=6,
-    num_images_per_prompt=1,
-    reference_image=[reference_image],
-    reference_strength=1.1,
-)
-# Save the result
-result.images[0].save("edited_image.png")
 ```
-## Citation
-If you use this model, please cite the original work:
 ```
-@article{editmgt2023,
-  title={EditMGT: Text-guided Image Editing with Masked Generative Transformer},
-  author={...},
-  journal={...},
-  year={2023}
-}
-```

 base_model:
 - google/gemma-2-2b-it
 ---
+# EditMGT
+<div align="center">
+[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)]()
+[![Dataset](https://img.shields.io/badge/🤗%20CrispEdit2M-Dataset-yellow)](https://huggingface.co/datasets/WeiChow/CrispEdit-2M)
+[![Checkpoint](https://img.shields.io/badge/🧨%20EditMGT-CKPT-blue)](https://huggingface.co/WeiChow/EditMGT)
+[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/weichow23/editmgt/tree/main)
+[![Page](https://img.shields.io/badge/🏠%20Home-Page-b3.svg)](https://weichow23.github.io/editmgt/)
+[![Python 3.9](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python)](https://www.python.org/downloads/release/python-392/)
+</div>
+## 🌟 Overview
+This is the official repository for **EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing** ✨.
+EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.
+<p align="center">
+  <img src="asset/editmgt.png" alt="EditMGT Architecture" width="800px">
+</p>
+## ✨ Features
+- 🎨 Great style transfer capabilities
+- 🔍 Attention control over editing regions
+- ⚡ The model backbone is only 960M, resulting in fast inference speed.
+- 📊 Trained on the [CrispEdit-2M]() dataset
+## ⚡ Quick Start
+First, clone the repository and navigate to the project root:
+```shell
+git clone https://github.com/weichow23/editmgt
+cd editmgt
+```
+## 🔧 Environment Setup
 ```bash
+# Create and activate conda environment
+conda create --name editmgt python=3.9.2
+conda activate editmgt
+# Optional: Install system dependencies
+sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y
+# Install Python dependencies
+pip3 install git+https://github.com/openai/CLIP
+pip3 install -r requirements.txt
 ```
+⚠️ **Note**: If you encounter any strange environment library errors, please refer to [Issues](https://github.com/viiika/Meissonic/issues/14) to find the correct version that might fix the error.
+## 🚀 Inference
+Run the following script in the `editmgt` directory:
 ```python
+import os
+import sys
+sys.path.append("./")
 from PIL import Image
+from src.editmgt import init_edit_mgt
+from src.v2_model import negative_prompt
+if __name__ == "__main__":
+    pipe = init_edit_mgt(device='cuda:0')
+    # Forcing the use of bf16 can improve speed, but it will incur a performance penalty.
+    # We noticed that GEditBench dropped by about 0.8.
+    # pipe = init_edit_mgt(device, enable_bf16=False)
+    # pipe.local_guidance=0.01 # After starting, it will use the local GS auxiliary mode.
+    # pipe.local_query_text = 'owl' # Use specific words as attention queries
+    # pipe.attention_enable_blocks = [i for i in range(28, 37)]  # attention layer used
+    input_image = Image.open('assets/case_5.jspg')
+    result = pipe(
+        prompt=['Make it into Ghibli style'],
+        height=1024,
+        width=1024,
+        num_inference_steps=36, # For some simple tasks, 16 steps are enough!
+        guidance_scale=6,
+        reference_strength=1.1,
+        reference_image=[input_image.resize((1024, 1024))],
+        negative_prompt=negative_prompt or None,
+    )
+    output_dir = "./output"
+    os.makedirs(output_dir, exist_ok=True)
+    file_path = os.path.join(output_dir, f"edited_case_5.png")
+    w, h = input_image.size
+    result.images[0].resize((w, h)).save(file_path)
 ```
+## 📑 Citation
+```bibtex
 ```
+## 🙏 Acknowledgements
+We extend our sincere gratitude to all contributors and the research community for their valuable feedback and support in the development of this project.