Enhance Model Card with Detailed Information, Usage, and Dataset Link

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +90 -5
README.md CHANGED
@@ -1,15 +1,100 @@
1
  ---
2
  language:
3
  - en
4
- license: agpl-3.0
5
  library_name: transformers, diffusers
 
6
  pipeline_tag: image-to-image
 
 
7
  ---
8
 
9
- Welcome to MIGE, a unified model for instruction-based editing and subject-driven generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- [Paper](https://hf.co/papers/2502.21291)
 
 
12
 
13
- Here, we provide all models used for training and inference. Users can also refer to our paper and download the original models from their respective sources. We gather them here only for convenience, and they are **for research purposes only**.
 
14
 
15
- The file 'model.pth' is the final trained MIGE model, used for benchmarking various tasks. For detailed evaluation scripts, please refer to our [GitHub repository](https://github.com/Eureka-Maggie/MIGE).
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  library_name: transformers, diffusers
5
+ license: agpl-3.0
6
  pipeline_tag: image-to-image
7
+ datasets:
8
+ - EurekaTian/MIGEBench
9
  ---
10
 
11
+ # MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing
12
+
13
+ Welcome to MIGE, a unified model for instruction-based editing and subject-driven generation. This model was introduced in the paper [**MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing**](https://huggingface.co/papers/2502.21291).
14
+
15
+ - [**Paper**](https://huggingface.co/papers/2502.21291)
16
+ - [**GitHub Repository**](https://github.com/Eureka-Maggie/MIGE)
17
+ - [**Hugging Face Dataset (MIGEBench)**](https://huggingface.co/datasets/EurekaTian/MIGEBench)
18
+
19
+ The file `model.pth` is the final trained MIGE model, used for benchmarking various tasks. Users can also refer to our paper and download the original component models from their respective sources. We gather them here only for convenience, and they are **for research purposes only**. For detailed evaluation scripts, please refer to our [GitHub repository](https://github.com/Eureka-Maggie/MIGE).
20
+
21
+ ## Abstract
22
+ Despite significant progress in diffusion-based image generation, subject-driven generation and instruction-based editing remain challenging. Existing methods typically treat them separately, struggling with limited high-quality data and poor generalization. Recognizing that both tasks require capturing complex visual variations while maintaining consistency between inputs and outputs, we propose MIGE, a unified framework that standardizes task representations using multimodal instructions.
23
+
24
+ MIGE first treats subject-driven generation as creation on a blank canvas and instruction-based editing as modification of an existing image, establishing a shared input-output formulation. It then introduces a novel multimodal encoder that maps free-form multimodal instructions into a unified vision-language space, integrating visual and semantic features through a feature fusion mechanism. This unification enables joint training of both tasks, providing two key advantages: (1) **Cross-Task Enhancement**: By leveraging shared visual and semantic representations, joint training improves instruction adherence and visual consistency in both subject-driven generation and instruction-based editing. (2) **Generalization**: Learning in a unified format facilitates cross-task knowledge transfer, enabling MIGE to generalize to novel compositional tasks, including instruction-based subject-driven editing. Experiments show that MIGE excels in both subject-driven generation and instruction-based editing while setting a SOTA in the new task of instruction-based subject-driven editing.
25
+
26
+ ## ✨ Key Highlights
27
+ MIGE offers a unified framework for both subject-driven image generation and instruction-based image editing with several key advantages:
28
+
29
+ * **Mutual Enhancement**: MIGE is the first framework to prove the mutual enhancement between subject-driven generation and instruction-based editing, treating them as complementary tasks and improving both through unified learning.
30
+ * **Compositional Power**: Unlocks new capabilities like instruction-based subject-driven editing, a challenging and novel task.
31
+ * **Strong Results**: Achieves state-of-the-art performance on multiple benchmarks (including the new proposed MIGEBench) with just 2.28M training samples.
32
+
33
+ ## ⚡ Quick Start & Inference
34
+ For comprehensive setup, installation, and detailed inference instructions, including specific scripts for subject-driven image generation, instruction-based image editing, and instruction-based subject-driven image editing, please refer to the [MIGE GitHub repository](https://github.com/Eureka-Maggie/MIGE).
35
+
36
+ ### 1. Set Up Python Environment
37
+ We recommend using Python version 3.9.2 and setting up a virtual environment:
38
+
39
+ ```bash
40
+ python3 -m venv mige
41
+ source mige/bin/activate
42
+ pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
43
+ cd MIGE # Navigate to your cloned MIGE repository
44
+ pip install -r requirements.txt
45
+ ```
46
+ If you encounter system library errors (e.g., `libGL.so.1`, `libcudart.so.11.0`), refer to the "Install System Libraries" section in the [GitHub README](https://github.com/Eureka-Maggie/MIGE#2-install-system-libraries) for solutions.
47
+
48
+ ### 2. Inference Example (Conceptual)
49
+ MIGE employs a custom model architecture. For actual runnable inference code, please clone the official GitHub repository and navigate to the `MIGE/infer_scripts` directory. The scripts there provide a complete guide to running the model for various tasks.
50
+
51
+ ```python
52
+ # This is a conceptual example to illustrate the type of interaction with MIGE.
53
+ # For a functional setup and runnable code, please refer to the official repository:
54
+ # https://github.com/Eureka-Maggie/MIGE
55
+ #
56
+ # git clone https://github.com/Eureka-Maggie/MIGE.git
57
+ # cd MIGE
58
+ # pip install -r requirements.txt
59
+ #
60
+ # Then, refer to the 'infer_scripts' directory in the cloned repository for concrete examples.
61
+ # The model loading and inference typically involve custom classes and functions from the repo.
62
+
63
+ print("Please refer to the official GitHub repository for detailed usage and runnable examples:")
64
+ print("https://github.com/Eureka-Maggie/MIGE")
65
+ ```
66
+
67
+ ## 📊 MIGEBench
68
+ The authors introduce **MIGEBench**, a new benchmark designed for evaluating instruction-based subject-driven editing. The benchmark files are publicly available on [Hugging Face Datasets](https://huggingface.co/datasets/EurekaTian/MIGEBench).
69
+
70
+ ### Data Structure Examples:
71
+
72
+ * **Add task**:
73
+ * `add_entity`
74
+ * `add_mask`
75
+ * `add_source`
76
+ * `add_target`
77
+ ![Subject addition examples in MIGEBench.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/benchmark_add_case.jpg)
78
+ * **Replace task**:
79
+ * `replace_entity`
80
+ * `replace_mask`
81
+ * `replace_source`
82
+ * `replace_target`
83
+ ![Subject replacement examples in MIGEBench.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/benchmark_replace_case.jpg)
84
 
85
+ ## 🖼️ More Examples
86
+ ![Demonstrating the comprehensive capabilities of MIGE.](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/show_new.jpg)
87
+ ![Qualitative results of subject-driven image generation (top), instruction-based image editing (middle) and instruction-based subject-driven image editing (bottom)](https://github.com/Eureka-Maggie/MIGE/raw/main/showcases/fig.jpg)
88
 
89
+ ## ✍️ Citation
90
+ If you find MIGE useful for your research or applications, please consider citing our paper:
91
 
92
+ ```bibtex
93
+ @article{wu2025mige,
94
+ title={MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing},
95
+ author={Wu, Zhiyong and Wu, Zhenyu and Xu, Fangzhi and Wang, Yian and Sun, Qiushi and Jia, Chengyou and Cheng, Kanzhi and Ding, Zichen and Chen, Liheng and Liang, Paul Pu},
96
+ journal={arXiv preprint arXiv:2502.21291},
97
+ year={2025},
98
+ url={https://arxiv.org/abs/2502.21291}
99
+ }
100
+ ```