File size: 3,954 Bytes
14b53d8 d1e15b2 14b53d8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
license: apache-2.0
language:
- en
tags:
- text-to-image
- image-customization
- diffusion-transformer
- position-control
- multi-subject
- safetensors
---
<h3 align="center">
PositionIC: Unified Position and Identity Consistency for Image Customization
</h3>
<p align="center">
<a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a>
<a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a>
</p>
<p align="center">
<span style="font-family: Gill Sans">Junjie Hu,</span>
<span style="font-family: Gill Sans">Tianyang Han,</span>
<span style="font-family: Gill Sans">Kai Ma,</span>
<span style="font-family: Gill Sans">Jialin Gao,</span>
<span style="font-family: Gill Sans">Song Yang</span>
<br>
<span style="font-family: Gill Sans">Xianhua He,</span>
<span style="font-family: Gill Sans">Junfeng Luo,</span>
<span style="font-family: Gill Sans">Xiaoming Wei,</span>
<span style="font-family: Gill Sans">Wenqiang Zhang</span>
</p>
---
### π₯ News
- β
**[2026.01.12]** We have released our **PositionIC model for FLUX** on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)!
- β
**[2025.07.18]** Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861).
- β¬ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon.
---
## π Introduction
**PositionIC** is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout.
To address this, we introduce:
1. **BMPDS**: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision.
2. **Lightweight Layout-Aware Diffusion**: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation.
Our experiments demonstrate that **PositionIC** achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios.
---
## β‘οΈ Quick Start
### π§ Requirements and Installation
Follow these steps to set up your environment:
```bash
# 1. Create and activate a new conda environment
conda create -n PositionIC python=3.10 -y
conda activate PositionIC
# 2. Install PyTorch (adjust according to your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# 3. Install project dependencies
pip install -r requirements.txt
```
---
## βοΈ Inference
To generate images with precise position and identity control, run the following command:
```bash
python inference_.py \
--eval_json_path "path/to/your/val_config.json" \
--dit_lora_path "ScottHan/PositionIC" \
--saved_dir "./res" \
--width 1024 \
--height 1024 \
--ref_size 512 \
--seed 3074 \
--rope_type "uno" \
--a 5
```
---
## π Acknowledgments
Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions.
---
## π Citation
If you find our work helpful for your research, please consider giving us a star β and citing our paper:
```bibtex
@article{hu2025positionic,
title={PositionIC: Unified Position and Identity Consistency for Image Customization},
author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang},
journal={arXiv preprint arXiv:2507.13861},
year={2025}
}
```
---
## π License
This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
```
---
|