ScottHan commited on
Commit
14b53d8
Β·
verified Β·
1 Parent(s): 5ab4f42

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-to-image
7
+ - image-customization
8
+ - diffusion-transformer
9
+ - position-control
10
+ - multi-subject
11
+ - safetensors
12
+ ---
13
+
14
+ <h3 align="center">
15
+ PositionIC: Unified Position and Identity Consistency for Image Customization
16
+ </h3>
17
+
18
+ <p align="center">
19
+ <a href="https://arxiv.org/abs/2507.13861"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.13861-b31b1b.svg"></a>
20
+ <a href="https://arxiv.org/abs/2507.13861"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a>
21
+ </p>
22
+
23
+ <p align="center">
24
+ <span style="font-family: Gill Sans">Junjie Hu,</span>
25
+ <span style="font-family: Gill Sans">Tianyang Han,</span>
26
+ <span style="font-family: Gill Sans">Kai Ma,</span>
27
+ <span style="font-family: Gill Sans">Jialin Gao,</span>
28
+ <span style="font-family: Gill Sans">Song Yang</span>
29
+ <br>
30
+ <span style="font-family: Gill Sans">Xianhua He,</span>
31
+ <span style="font-family: Gill Sans">Junfeng Luo,</span>
32
+ <span style="font-family: Gill Sans">Xiaoming Wei,</span>
33
+ <span style="font-family: Gill Sans">Wenqiang Zhang</span>
34
+ </p>
35
+
36
+ ---
37
+
38
+ ### πŸ”₯ News
39
+ - βœ… **[2026.01.12]** We have released our **PositionIC model for FLUX** on HuggingFace and [github](https://github.com/MeiGen-AI/PositionIC)!
40
+ - βœ… **[2025.07.18]** Our paper is now available on [arXiv](https://arxiv.org/abs/2507.13861).
41
+ - ⬜ Datasets and PositionIC-v2 model with enhanced generation capabilities are coming soon.
42
+
43
+ ---
44
+
45
+ ## πŸ“– Introduction
46
+ **PositionIC** is a unified framework for high-fidelity, spatially controllable multi-subject image customization. While recent methods excel in fidelity, fine-grained instance-level spatial control remains a challenge due to the entanglement of identity and layout.
47
+
48
+ To address this, we introduce:
49
+ 1. **BMPDS**: The first automatic data-synthesis pipeline for position-annotated multi-subject datasets, providing crucial spatial supervision.
50
+ 2. **Lightweight Layout-Aware Diffusion**: A framework integrating a novel visibility-aware attention mechanism that explicitly models spatial relationships via NeRF-inspired volumetric weight regulation.
51
+
52
+ Our experiments demonstrate that **PositionIC** achieves state-of-the-art performance, setting new records for spatial precision and identity consistency in multi-entity scenarios.
53
+
54
+ ---
55
+
56
+ ## ⚑️ Quick Start
57
+
58
+ ### πŸ”§ Requirements and Installation
59
+ Follow these steps to set up your environment:
60
+
61
+ ```bash
62
+ # 1. Create and activate a new conda environment
63
+ conda create -n PositionIC python=3.10 -y
64
+ conda activate PositionIC
65
+
66
+ # 2. Install PyTorch (adjust according to your CUDA version)
67
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
68
+
69
+ # 3. Install project dependencies
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ### πŸ“₯ Checkpoints Download
74
+ You can download the `.safetensors` weights (e.g., `dit_lora.safetensors`) using `huggingface-cli`:
75
+
76
+ ```bash
77
+ pip install huggingface_hub
78
+
79
+ # Replace [YOUR_REPO] with your actual Hugging Face repository path
80
+ repo_name="[YOUR_USERNAME]/PositionIC"
81
+ local_dir="models/"$repo_name
82
+
83
+ huggingface-cli download $repo_name --local-dir $local_dir
84
+ ```
85
+
86
+ ---
87
+
88
+ ## ✍️ Inference
89
+ To generate images with precise position and identity control, run the following command:
90
+
91
+ ```bash
92
+ python inference_.py \
93
+ --eval_json_path "path/to/your/val_config.json" \
94
+ --dit_lora_path "models/PositionIC/dit_lora.safetensors" \
95
+ --saved_dir "./res" \
96
+ --width 1024 \
97
+ --height 1024 \
98
+ --ref_size 512 \
99
+ --seed 3074 \
100
+ --rope_type "uno" \
101
+ --a 5
102
+ ```
103
+
104
+ ---
105
+
106
+ ## πŸ™ Acknowledgments
107
+ Our code is built upon the [UNO](https://github.com/bytedance/UNO) framework. We sincerely thank the authors for their excellent work and open-source contributions.
108
+
109
+ ---
110
+
111
+ ## 🌟 Citation
112
+ If you find our work helpful for your research, please consider giving us a star ⭐ and citing our paper:
113
+
114
+ ```bibtex
115
+ @article{hu2025positionic,
116
+ title={PositionIC: Unified Position and Identity Consistency for Image Customization},
117
+ author={Hu, Junjie and Han, Tianyang and Ma, Kai and Gao, Jialin and Yang, Song and He, Xianhua and Luo, Junfeng and Wei, Xiaoming and Zhang, Wenqiang},
118
+ journal={arXiv preprint arXiv:2507.13861},
119
+ year={2025}
120
+ }
121
+ ```
122
+
123
+ ---
124
+
125
+ ## πŸ“„ License
126
+ This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
127
+ ```
128
+
129
+ ---