ScottHan
/

PositionIC

+---
+license: apache-2.0
+language:
+- en
+tags:
+- text-to-image
+- image-customization
+- diffusion-transformer
+- position-control
+- multi-subject
+- safetensors
+---
+<h3 align="center">
+    PositionIC: Unified Position and Identity Consistency for Image Customization
+</h3>
+<p align="center">
+<a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a>
+<a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a>
+</p>
+<p align="center">
+<span style="font-family: Gill Sans">Junjie Hu,</span>
+<span style="font-family: Gill Sans">Tianyang Han,</span>
+<span style="font-family: Gill Sans">Kai Ma,</span>
+<span style="font-family: Gill Sans">Jialin Gao,</span>
+<span style="font-family: Gill Sans">Song Yang</span>
+<br>
+<span style="font-family: Gill Sans">Xianhua He,</span>
+<span style="font-family: Gill Sans">Junfeng Luo,</span>
+<span style="font-family: Gill Sans">Xiaoming Wei,</span>
+<span style="font-family: Gill Sans">Wenqiang Zhang</span>
+</p>
+---
+### 🔥 News
+- ✅ **[2026.01.12]** We have released our **PositionIC model for FLUX** on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)!
+- ✅ **[2025.07.18]** Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861).
+- ⬜ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon.
+---
+## 📖 Introduction
+**PositionIC** is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout.
+To address this, we introduce:
+1. **BMPDS**: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision.
+2. **Lightweight Layout-Aware Diffusion**: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation.
+Our experiments demonstrate that **PositionIC** achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios.
+---
+## ⚡️ Quick Start
+### 🔧 Requirements and Installation
+Follow these steps to set up your environment:
+```bash
+# 1. Create and activate a new conda environment
+conda create -n PositionIC python=3.10 -y
+conda activate PositionIC
+# 2. Install PyTorch (adjust according to your CUDA version)
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
+# 3. Install project dependencies
+pip install -r requirements.txt
+```
+### 📥 Checkpoints Download
+You can download the `.safetensors` weights (e.g., `dit_lora.safetensors`) using `huggingface-cli`:
+```bash
+pip install huggingface_hub
+# Replace [YOUR_REPO] with your actual Hugging Face repository path
+repo_name="[YOUR_USERNAME]/PositionIC"
+local_dir="models/"$repo_name
+huggingface-cli download $repo_name --local-dir $local_dir
+```
+---
+## ✍️ Inference
+To generate images with precise position and identity control, run the following command:
+```bash
+python inference_.py \
+  --eval_json_path "path/to/your/val_config.json" \
+  --dit_lora_path "models/PositionIC/dit_lora.safetensors" \
+  --saved_dir "./res" \
+  --width 1024 \
+  --height 1024 \
+  --ref_size 512 \
+  --seed 3074 \
+  --rope_type "uno" \
+  --a 5
+```
+---
+## 🙏 Acknowledgments
+Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions.
+---
+## 🌟 Citation
+If you find our work helpful for your research, please consider giving us a star ⭐ and citing our paper:
+```bibtex
+@article{hu2025positionic,
+  title={PositionIC: Unified Position and Identity Consistency for Image Customization},
+  author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang},
+  journal={arXiv preprint arXiv:2507.13861},
+  year={2025}
+}
+```
+---
+## 📄 License
+This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
+```
+---