File size: 5,420 Bytes
25614a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<h1 align="center">PICS: Pairwise Image Compositing with Spatial Interactions</h1>
<p align="center"><img src="assets/figure.jpg" width="100%"></p>

***Check out our [Project Page](https://ryanhangzhou.github.io/pics/) for more visual demos!***

<!-- Updates -->
## ⏩ Updates

**02/08/2026**
- Release training and inference code.
- Release training data.

**03/01/2025**
- Release checkpoints. 

<!-- TODO List -->
## 🚧 TODO List
- [x] Release training and inference code for pairwise image compositing
- [x] Release datasets (LVIS, Objects365, etc. in WebDataset format)
- [x] Release pretrained models
- [ ] Release any-object compositing code

<!-- Installation -->
## πŸ“¦ Installation

### Prerequisites
- **OS**: Linux (Tested on Ubuntu 20.04/22.04).
- **Python**: 3.10 or higher.
- **Package Manager**: [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended.   

**Hardware Requirements**
| Stage | GPU (VRAM) | System RAM | Batch Size |
| --- | --- | --- | --- |
| Training | NVIDIA H100 (80GB) | 120GB | 16 |
| Inference | NVIDIA RTX A6000 (48GB) | 64GB | 1 |

### Environment setup
  Create a new conda environment named `PICS` and install the dependencies: 
  ```
  conda env create --file=PICS.yml
  conda activate PICS
  ```

### Weights preparation
***DINOv2***: Download [ViT-g/14](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth) and place it at: checkpoints/dinov2_vitg14_pretrain.pth

<!-- Pretrained Models -->
## πŸ€– Pretrained Models
<!-- Coming soon! We are currently finalizing the model weights for public release. -->
We provide the following pretrained models (to be placed at the same directory with DINOv2):

| Model | Description | size | Download |
| --- | --- | --- | --- |
| PICS | Full model | 18.45GB | [Download](https://drive.google.com/file/d/17JpvhRvHFjfqQDiV9RFfgjGa0iLropXK/view?usp=sharing) |


## Minimal Example for Inference

Here is an [example](run_test.py) of how to use the pretrained models for pairwise image compositing.
Run two-object compositing mode: 
```
python run_test.py \
    --input "sample" \
    --output "results/sample" \
    --obj_thr 2
```


<!-- Dataset -->
## πŸ“š Dataset
Our training set is a mixture of [LVIS](https://www.lvisdataset.org/), [VITON-HD](https://www.kaggle.com/datasets/marquis03/high-resolution-viton-zalando-dataset), [Objects365](https://www.objects365.org/overview.html), [Cityscapes](https://www.cityscapes-dataset.com/), [Mapillary Vistas](https://www.mapillary.com/dataset/vistas) and [BDD100K](https://bair.berkeley.edu/blog/2018/05/30/bdd/). 
We provide the processed ***two-object compositing data*** in WebDataset format (.tar shards) below:
| Model | #Sample | Size | Download |
| --- | --- | --- | --- |
| LVIS | 34,160 | 7.98GB | [Download](https://drive.google.com/drive/folders/1Ir1cwR7K8HALNJiS6kTTlMgKIn8f18XX?usp=sharing) |
| VITON-HD | 11,647 | 2.53GB | [Download](https://drive.google.com/drive/folders/1317fJvvc7J1OTdbiM_Rst0C9AewIcNr2?usp=sharing) |
| Objects365 | 940,764 | 243GB | [Download](https://drive.google.com/drive/folders/1xKLoGv8e5wkGkjdxEGpz5i9TH08vd1AA?usp=sharing) |
| Cityscapes | 536 | 1.21GB | [Download](https://drive.google.com/drive/folders/1HYgEgZcknvEMbK2XZf2isY0pYcluGoKU?usp=sharing) |
| Mapillary Vistas | 603 | 582MB | [Download](https://drive.google.com/drive/folders/1a0756wc2bvvHJ_8a01N0tZ_Kb_BkRZv1?usp=sharing) |
| BDD100K | 1,012 | 204MB | [Download](https://drive.google.com/drive/folders/1zS60KPfZioU4tW1ngDK1KahE7T-TeIim?usp=sharing) |

### Data organization
```
PICS/
β”œβ”€β”€ data/
    β”œβ”€β”€ train/
        β”œβ”€β”€ LVIS/
            β”œβ”€β”€ 00000.tar
            β”œβ”€β”€ ...
        β”œβ”€β”€ VITONHD/
        β”œβ”€β”€ Objects365/
        β”œβ”€β”€ Cityscapes/
        β”œβ”€β”€ MapillaryVistas/
        β”œβ”€β”€ BDD100K/
```

### Data preparation instruction
We provide a script using SAM to extract high-quality object silhouettes for the Objects365 dataset.
To process a specific range of data shards, run:
```
python scripts/annotate_sam.py --is_train --index_low 00000 --index_high 10000
```
To process raw data (e.g., LVIS), run the following command. Replace /path/to/raw_data with your actual local data path:
```
python -m datasets.lvis \
    --dataset_dir "/path/to/raw_data" \
    --construct_dataset_dir "data/train/LVIS" \
    --area_ratio 0.02 \
    --is_build_data \
    --is_train
```

## Training

To train a model on the whole dataset:
```
python run_train.py \
    --root_dir 'LOGS/whole_data' \
    --batch_size 16 \
    --logger_freq 1000 \
    --is_joint
```


<!-- License -->
## βš–οΈ License

This project is licensed under the terms of the MIT license.



<!-- Citation -->
<!-- ## πŸ“œ Citation -->

<!-- If you find this work helpful, please consider citing our paper: -->

<!-- ```bibtex
@inproceedings{zhou2025bootplace,
  title={BOOTPLACE: Bootstrapped Object Placement with Detection Transformers},
  author={Zhou, Hang and Zuo, Xinxin and Ma, Rui and Cheng, Li},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={19294--19303},
  year={2025}
}
``` -->

## πŸ™Œ Acknowledgements
We would like to thank the contributors to the [AnyDoor](https://huggingface.co/papers/2307.09481) repository for their open research.

## Contact Us
For any inquiries, feel free to open a GitHub issue or reach out via email.