|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- text-to-image |
|
|
- image-customization |
|
|
- diffusion-transformer |
|
|
- position-control |
|
|
- multi-subject |
|
|
- safetensors |
|
|
--- |
|
|
|
|
|
<h3 align="center"> |
|
|
PositionIC: Unified Position and Identity Consistency for Image Customization |
|
|
</h3> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a> |
|
|
<a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<span style="font-family: Gill Sans">Junjie Hu,</span> |
|
|
<span style="font-family: Gill Sans">Tianyang Han,</span> |
|
|
<span style="font-family: Gill Sans">Kai Ma,</span> |
|
|
<span style="font-family: Gill Sans">Jialin Gao,</span> |
|
|
<span style="font-family: Gill Sans">Song Yang</span> |
|
|
<br> |
|
|
<span style="font-family: Gill Sans">Xianhua He,</span> |
|
|
<span style="font-family: Gill Sans">Junfeng Luo,</span> |
|
|
<span style="font-family: Gill Sans">Xiaoming Wei,</span> |
|
|
<span style="font-family: Gill Sans">Wenqiang Zhang</span> |
|
|
</p> |
|
|
|
|
|
--- |
|
|
|
|
|
### π₯ News |
|
|
- β
**[2026.01.12]** We have released our **PositionIC model for FLUX** on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)! |
|
|
- β
**[2025.07.18]** Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861). |
|
|
- β¬ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Introduction |
|
|
**PositionIC** is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout. |
|
|
|
|
|
To address this, we introduce: |
|
|
1. **BMPDS**: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision. |
|
|
2. **Lightweight Layout-Aware Diffusion**: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation. |
|
|
|
|
|
Our experiments demonstrate that **PositionIC** achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios. |
|
|
|
|
|
--- |
|
|
|
|
|
## β‘οΈ Quick Start |
|
|
|
|
|
### π§ Requirements and Installation |
|
|
Follow these steps to set up your environment: |
|
|
|
|
|
```bash |
|
|
# 1. Create and activate a new conda environment |
|
|
conda create -n PositionIC python=3.10 -y |
|
|
conda activate PositionIC |
|
|
|
|
|
# 2. Install PyTorch (adjust according to your CUDA version) |
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 |
|
|
|
|
|
# 3. Install project dependencies |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Inference |
|
|
To generate images with precise position and identity control, run the following command: |
|
|
|
|
|
```bash |
|
|
python inference_.py \ |
|
|
--eval_json_path "path/to/your/val_config.json" \ |
|
|
--dit_lora_path "ScottHan/PositionIC" \ |
|
|
--saved_dir "./res" \ |
|
|
--width 1024 \ |
|
|
--height 1024 \ |
|
|
--ref_size 512 \ |
|
|
--seed 3074 \ |
|
|
--rope_type "uno" \ |
|
|
--a 5 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgments |
|
|
Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Citation |
|
|
If you find our work helpful for your research, please consider giving us a star β and citing our paper: |
|
|
|
|
|
```bibtex |
|
|
@article{hu2025positionic, |
|
|
title={PositionIC: Unified Position and Identity Consistency for Image Customization}, |
|
|
author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang}, |
|
|
journal={arXiv preprint arXiv:2507.13861}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
``` |
|
|
|
|
|
--- |
|
|
|