EurekaTian
/

MIGE

@@ -1,15 +1,100 @@
 ---
 language:
 - en
-license: agpl-3.0
 library_name: transformers, diffusers
 pipeline_tag: image-to-image
 ---
-Welcome to MIGE, a unified model for instruction-based editing and subject-driven generation.
-[Paper](https://hf.co/papers/2502.21291)
-Here, we provide all models used for training and inference. Users can also refer to our paper and download the original models from their respective sources. We gather them here only for convenience, and they are **for research purposes only**.
-The file 'model.pth' is the final trained MIGE model, used for benchmarking various tasks. For detailed evaluation scripts, please refer to our [GitHub repository](https://github.com/Eureka-Maggie/MIGE).

 ---
 language:
 - en
 library_name: transformers, diffusers
+license: agpl-3.0
 pipeline_tag: image-to-image
+datasets:
+- EurekaTian/MIGEBench
 ---
+# MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing
+Welcome to MIGE, a unified model for instruction-based editing and subject-driven generation. This model was introduced in the paper [**MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing**](https://huggingface.co/papers/2502.21291).
+-   [**Paper**](https://huggingface.co/papers/2502.21291)
+-   [**GitHub Repository**](https://github.com/Eureka-Maggie/MIGE)
+-   [**Hugging Face Dataset (MIGEBench)**](https://huggingface.co/datasets/EurekaTian/MIGEBench)
+The file `model.pth` is the final trained MIGE model, used for benchmarking various tasks. Users can also refer to our paper and download the original component models from their respective sources. We gather them here only for convenience, and they are **for research purposes only**. For detailed evaluation scripts, please refer to our [GitHub repository](https://github.com/Eureka-Maggie/MIGE).
+## Abstract
+Despite significant progress in diffusion-based image generation, subject-driven generation and instruction-based editing remain challenging. Existing methods typically treat them separately, struggling with limited high-quality data and poor generalization. Recognizing that both tasks require capturing complex visual variations while maintaining consistency between inputs and outputs, we propose MIGE, a unified framework that standardizes task representations using multimodal instructions.
+MIGE first treats subject-driven generation as creation on a blank canvas and instruction-based editing as modification of an existing image, establishing a shared input-output formulation. It then introduces a novel multimodal encoder that maps free-form multimodal instructions into a unified vision-language space, integrating visual and semantic features through a feature fusion mechanism. This unification enables joint training of both tasks, providing two key advantages: (1) **Cross-Task Enhancement**: By leveraging shared visual and semantic representations, joint training improves instruction adherence and visual consistency in both subject-driven generation and instruction-based editing. (2) **Generalization**: Learning in a unified format facilitates cross-task knowledge transfer, enabling MIGE to generalize to novel compositional tasks, including instruction-based subject-driven editing. Experiments show that MIGE excels in both subject-driven generation and instruction-based editing while setting a SOTA in the new task of instruction-based subject-driven editing.
+## ✨ Key Highlights
+MIGE offers a unified framework for both subject-driven image generation and instruction-based image editing with several key advantages:
+*   **Mutual Enhancement**: MIGE is the first framework to prove the mutual enhancement between subject-driven generation and instruction-based editing, treating them as complementary tasks and improving both through unified learning.
+*   **Compositional Power**: Unlocks new capabilities like instruction-based subject-driven editing, a challenging and novel task.
+*   **Strong Results**: Achieves state-of-the-art performance on multiple benchmarks (including the new proposed MIGEBench) with just 2.28M training samples.
+## ⚡ Quick Start & Inference
+For comprehensive setup, installation, and detailed inference instructions, including specific scripts for subject-driven image generation, instruction-based image editing, and instruction-based subject-driven image editing, please refer to the [MIGE GitHub repository](https://github.com/Eureka-Maggie/MIGE).
+### 1. Set Up Python Environment
+We recommend using Python version 3.9.2 and setting up a virtual environment:
+```bash
+python3 -m venv mige
+source mige/bin/activate
+pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
+cd MIGE # Navigate to your cloned MIGE repository
+pip install -r requirements.txt
+```
+If you encounter system library errors (e.g., `libGL.so.1`, `libcudart.so.11.0`), refer to the "Install System Libraries" section in the [GitHub README](https://github.com/Eureka-Maggie/MIGE#2-install-system-libraries) for solutions.
+### 2. Inference Example (Conceptual)
+MIGE employs a custom model architecture. For actual runnable inference code, please clone the official GitHub repository and navigate to the `MIGE/infer_scripts` directory. The scripts there provide a complete guide to running the model for various tasks.
+```python
+# This is a conceptual example to illustrate the type of interaction with MIGE.
+# For a functional setup and runnable code, please refer to the official repository:
+# https://github.com/Eureka-Maggie/MIGE
+#
+# git clone https://github.com/Eureka-Maggie/MIGE.git
+# cd MIGE
+# pip install -r requirements.txt
+#
+# Then, refer to the 'infer_scripts' directory in the cloned repository for concrete examples.
+# The model loading and inference typically involve custom classes and functions from the repo.
+print("Please refer to the official GitHub repository for detailed usage and runnable examples:")
+print("https://github.com/Eureka-Maggie/MIGE")
+```
+## 📊 MIGEBench
+The authors introduce **MIGEBench**, a new benchmark designed for evaluating instruction-based subject-driven editing. The benchmark files are publicly available on [Hugging Face Datasets](https://huggingface.co/datasets/EurekaTian/MIGEBench).
+### Data Structure Examples:
+*   **Add task**:
+    *   `add_entity`
+    *   `add_mask`
+    *   `add_source`
+    *   `add_target`
+    ![Subject addition examples in MIGEBench.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/benchmark_add_case.jpg)
+*   **Replace task**:
+    *   `replace_entity`
+    *   `replace_mask`
+    *   `replace_source`
+    *   `replace_target`
+    ![Subject replacement examples in MIGEBench.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/benchmark_replace_case.jpg)
+## 🖼️ More Examples
+![Demonstrating the comprehensive capabilities of MIGE.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/show_new.jpg)
+![Qualitative results of subject-driven image generation (top), instruction-based image editing (middle) and instruction-based subject-driven image editing (bottom)](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/fig.jpg)
+## ✍️ Citation
+If you find MIGE useful for your research or applications, please consider citing our paper:
+```bibtex
+@article{wu2025mige,
+  title={MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing},
+  author={Wu, Zhiyong and Wu, Zhenyu and Xu, Fangzhi and Wang, Yian and Sun, Qiushi and Jia, Chengyou and Cheng, Kanzhi and Ding, Zichen and Chen, Liheng and Liang, Paul Pu},
+  journal={arXiv preprint arXiv:2502.21291},
+  year={2025},
+  url={https://arxiv.org/abs/2502.21291}
+}
+```