Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- text-to-image
|
| 7 |
+
- image-customization
|
| 8 |
+
- diffusion-transformer
|
| 9 |
+
- position-control
|
| 10 |
+
- multi-subject
|
| 11 |
+
- safetensors
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
<h3 align="center">
|
| 15 |
+
PositionIC: Unified Position and Identity Consistency for Image Customization
|
| 16 |
+
</h3>
|
| 17 |
+
|
| 18 |
+
<p align="center">
|
| 19 |
+
<a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a>
|
| 20 |
+
<a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a>
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
+
<p align="center">
|
| 24 |
+
<span style="font-family: Gill Sans">Junjie Hu,</span>
|
| 25 |
+
<span style="font-family: Gill Sans">Tianyang Han,</span>
|
| 26 |
+
<span style="font-family: Gill Sans">Kai Ma,</span>
|
| 27 |
+
<span style="font-family: Gill Sans">Jialin Gao,</span>
|
| 28 |
+
<span style="font-family: Gill Sans">Song Yang</span>
|
| 29 |
+
<br>
|
| 30 |
+
<span style="font-family: Gill Sans">Xianhua He,</span>
|
| 31 |
+
<span style="font-family: Gill Sans">Junfeng Luo,</span>
|
| 32 |
+
<span style="font-family: Gill Sans">Xiaoming Wei,</span>
|
| 33 |
+
<span style="font-family: Gill Sans">Wenqiang Zhang</span>
|
| 34 |
+
</p>
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
### π₯ News
|
| 39 |
+
- β
**[2026.01.12]** We have released our **PositionIC model for FLUX** on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)!
|
| 40 |
+
- β
**[2025.07.18]** Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861).
|
| 41 |
+
- β¬ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon.
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## π Introduction
|
| 46 |
+
**PositionIC** is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout.
|
| 47 |
+
|
| 48 |
+
To address this, we introduce:
|
| 49 |
+
1. **BMPDS**: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision.
|
| 50 |
+
2. **Lightweight Layout-Aware Diffusion**: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation.
|
| 51 |
+
|
| 52 |
+
Our experiments demonstrate that **PositionIC** achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios.
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## β‘οΈ Quick Start
|
| 57 |
+
|
| 58 |
+
### π§ Requirements and Installation
|
| 59 |
+
Follow these steps to set up your environment:
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
# 1. Create and activate a new conda environment
|
| 63 |
+
conda create -n PositionIC python=3.10 -y
|
| 64 |
+
conda activate PositionIC
|
| 65 |
+
|
| 66 |
+
# 2. Install PyTorch (adjust according to your CUDA version)
|
| 67 |
+
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
|
| 68 |
+
|
| 69 |
+
# 3. Install project dependencies
|
| 70 |
+
pip install -r requirements.txt
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### π₯ Checkpoints Download
|
| 74 |
+
You can download the `.safetensors` weights (e.g., `dit_lora.safetensors`) using `huggingface-cli`:
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
pip install huggingface_hub
|
| 78 |
+
|
| 79 |
+
# Replace [YOUR_REPO] with your actual Hugging Face repository path
|
| 80 |
+
repo_name="[YOUR_USERNAME]/PositionIC"
|
| 81 |
+
local_dir="models/"$repo_name
|
| 82 |
+
|
| 83 |
+
huggingface-cli download $repo_name --local-dir $local_dir
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
## βοΈ Inference
|
| 89 |
+
To generate images with precise position and identity control, run the following command:
|
| 90 |
+
|
| 91 |
+
```bash
|
| 92 |
+
python inference_.py \
|
| 93 |
+
--eval_json_path "path/to/your/val_config.json" \
|
| 94 |
+
--dit_lora_path "models/PositionIC/dit_lora.safetensors" \
|
| 95 |
+
--saved_dir "./res" \
|
| 96 |
+
--width 1024 \
|
| 97 |
+
--height 1024 \
|
| 98 |
+
--ref_size 512 \
|
| 99 |
+
--seed 3074 \
|
| 100 |
+
--rope_type "uno" \
|
| 101 |
+
--a 5
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
---
|
| 105 |
+
|
| 106 |
+
## π Acknowledgments
|
| 107 |
+
Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions.
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## π Citation
|
| 112 |
+
If you find our work helpful for your research, please consider giving us a star β and citing our paper:
|
| 113 |
+
|
| 114 |
+
```bibtex
|
| 115 |
+
@article{hu2025positionic,
|
| 116 |
+
title={PositionIC: Unified Position and Identity Consistency for Image Customization},
|
| 117 |
+
author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang},
|
| 118 |
+
journal={arXiv preprint arXiv:2507.13861},
|
| 119 |
+
year={2025}
|
| 120 |
+
}
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## π License
|
| 126 |
+
This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
---
|