Image-to-Image
English
custom_model
image customization
File size: 4,173 Bytes
162611b
ba6b33c
 
 
 
162611b
 
 
 
 
 
 
 
 
653f340
 
 
162611b
 
653f340
 
 
162611b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
653f340
 
 
162611b
 
ba6b33c
 
 
653f340
 
 
162611b
653f340
 
162611b
653f340
 
162611b
 
 
ba6b33c
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
base_model:
- black-forest-labs/FLUX.1-Fill-dev
language:
- en
license: other
license_name: community-license-agreement
license_link: LICENSE
pipeline_tag: image-to-image
tags:
- image customization
---

<div align="center">
  <a href="https://github.com/TencentARC/IC-Custom">
    <img src='https://github.com/TencentARC/IC-Custom/blob/main/assets/IC-Custom-logo.png?raw=true' width='120px'>
  </a>
</div>

<p align="center"> 
  <b> IC-Custom: Diverse Image Customization via In-Context Learning </b> 
</p>

<p align="center">
  <a href='https://liyaowei-stu.github.io/project/IC_Custom/'><img src='https://img.shields.io/badge/IC_Custom-Page-4B9E9E'></a>
  &nbsp;
  <a href="https://arxiv.org/abs/2507.01926"><img src="https://img.shields.io/badge/IC_Custom-Paper-D94C40"></a>
  &nbsp;
  <a href="https://github.com/TencentARC/IC-Custom"><img src="https://img.shields.io/badge/IC_Custom-Github-1A7C7C"></a>
  &nbsp;
  <a href='https://huggingface.co/TencentARC/IC-Custom'><img src='https://img.shields.io/badge/IC_Custom-Model-0076B6'></a>
  &nbsp;
  <a href="https://huggingface.co/spaces/TencentARC/IC-Custom"><img src="https://img.shields.io/badge/IC_Custom-Demo-00C0A5"></a>
  &nbsp;
  <a href="https://www.youtube.com/watch?v=uaiZA3H5RVY"><img src="https://img.shields.io/badge/IC_Custom-Video-F7C600"></a>
</p>

<div align="center">
  <a href="https://github.com/TencentARC/IC-Custom">
    <img src='https://github.com/TencentARC/IC-Custom/blob/main/assets/teaser-github.jpeg?raw=true' width='680px'>
  </a>
</div>

### Abstract
Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We introduce the In-context Multi-Modal Attention (ICMA) mechanism with learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to correctly handle different task types and distinguish various inputs in polyptych configurations. To bridge the data gap, we carefully curated a high-quality dataset of 12k identity-consistent samples with 8k from real-world sources and 4k from high-quality synthetic data, avoiding the overly glossy and over-saturated synthetic appearance. IC-Custom supports various industrial applications, including try-on, accessory placement, furniture arrangement, and creative IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models, and state-of-the-art open-source approaches. IC-Custom achieves approximately 73% higher human preference across identity consistency, harmonicity, and text alignment metrics, while training only 0.4% of the original model parameters.

<p align="center">
  IC-Custom is designed for diverse image customization scenarios, including:
</p>

- **Position-aware**: Input a reference image, target background, and specify the customization location (via segmentation or drawing)  
  *Examples*: Product placement, virtual try-on.

- **Position-free**: Input a reference image and a target description to generate a new image with the reference image's ID  
  *Examples*: IP customization, character creation.

### Citation

```bibtex
@article{li2025iccustom,
  title={IC-Custom: Diverse Image Customization via In-Context Learning},
  author={Li, Yaowei and Zhu, Yu and Wu, Xu and Liu, Bo and Li, Jia and Lu, Yong and Zhang, Song and Luo, Yujun},
  journal={arXiv preprint arXiv:2507.01926},
  year={2025},
  url={https://arxiv.org/abs/2507.01926}
}
```