Add pipeline tag and library name to model card, and include citation

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -13
README.md CHANGED
@@ -3,6 +3,8 @@ license: cc-by-4.0
3
  tags:
4
  - image-editing
5
  - diffusion
 
 
6
  ---
7
 
8
  # Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination
@@ -18,18 +20,7 @@ tags:
18
 
19
  Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
20
  arises from an *imbalanced division of responsibilities*. The understanding module is usually treated as a translator
21
- that encodes instructions into conditions, while the generation module must act as both designer and painter. The result
22
- is that the generation module carries too much responsibility, even though it is not optimized for complex reasoning.
23
-
24
- To address this, we introduce **Draw-In-Mind (DIM)**, a dataset with two complementary parts:
25
-
26
- - **DIM-T2I**: 14M long-context image–text pairs that strengthen instruction comprehension.
27
- - **DIM-Edit**: 233K chain-of-thought imaginations from GPT-4o that provide explicit design blueprints.
28
-
29
- We connect a frozen **Qwen2.5-VL-3B** with a trainable **SANA1.5-1.6B** via a lightweight MLP, forming
30
- **DIM-4.6B-T2I/Edit**. With this setup, the understanding module takes on the *designer responsibility*, while the
31
- generation module focuses on rendering. Despite its modest size, DIM-4.6B-Edit achieves SOTA or competitive results on
32
- ImgEdit and GEdit-Bench, outperforming much larger models.
33
 
34
  ## Performance
35
 
@@ -350,4 +341,17 @@ bash scripts/eval_gedit_bench.sh
350
  The generated images will be saved to `cache/inference/DIM-4.6B-Edit/GEdit-Bench`. Please follow the guide
351
  in [GEdit-Bench](https://github.com/stepfun-ai/Step1X-Edit) official repo for metrics calculation.
352
 
353
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
  - image-editing
5
  - diffusion
6
+ pipeline_tag: image-to-image
7
+ library_name: transformers
8
  ---
9
 
10
  # Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination
 
20
 
21
  Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
22
  arises from an *imbalanced division of responsibilities*. The understanding module is usually treated as a translator
23
+ that encodes user instructions into semantic conditions, while the generation module must simultaneously act as designer and painter, inferring the original layout, identifying the target editing region, and rendering the new content. This imbalance is counterintuitive because the understanding module is typically trained with several times more data on complex reasoning tasks than the generation module. To address this issue, we introduce Draw-In-Mind (DIM), a dataset comprising two complementary subsets: (i) DIM-T2I, containing 14M long-context image-text pairs to enhance complex instruction comprehension; and (ii) DIM-Edit, consisting of 233K chain-of-thought imaginations generated by GPT-4o, serving as explicit design blueprints for image edits. We connect a frozen Qwen2.5-VL-3B with a trainable SANA1.5-1.6B via a lightweight two-layer MLP, and train it on the proposed DIM dataset, resulting in DIM-4.6B-T2I/Edit. Despite its modest parameter scale, DIM-4.6B-Edit achieves SOTA or competitive performance on the ImgEdit and GEdit-Bench benchmarks, outperforming much larger models such as UniWorld-V1 and Step1X-Edit. These findings demonstrate that explicitly assigning the design responsibility to the understanding module provides significant benefits for image editing. Our dataset and models will be available at this https URL .
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Performance
26
 
 
341
  The generated images will be saved to `cache/inference/DIM-4.6B-Edit/GEdit-Bench`. Please follow the guide
342
  in [GEdit-Bench](https://github.com/stepfun-ai/Step1X-Edit) official repo for metrics calculation.
343
 
344
+ </details>
345
+
346
+ ## Citation
347
+ If you find our work useful or helpful for your R&D works, please feel free to cite our paper as below.
348
+ ```bibtex
349
+ @article{zhou2025drawinmind,
350
+ title={Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination},
351
+ author={Yifei Zhou and Haozhe Liu and Songhua Liu and Peng Gao and Hongsheng Li and Yu Qiao},
352
+ year={2025},
353
+ journal={arXiv preprint arXiv:2509.01986},
354
+ archivePrefix={arXiv},
355
+ eprint={2509.01986},
356
+ }
357
+ ```