PICS: Pairwise Image Compositing with Spatial Interactions

Check out our Project Page for more visual demos!

⏩ Updates

02/08/2026

Release training and inference code.
Release training data.

03/01/2025

Release checkpoints.

🚧 TODO List

Release training and inference code for pairwise image compositing
Release datasets (LVIS, Objects365, etc. in WebDataset format)
Release pretrained models
Release any-object compositing code

📦 Installation

Prerequisites

OS: Linux (Tested on Ubuntu 20.04/22.04).
Python: 3.10 or higher.
Package Manager: Conda is recommended.

Hardware Requirements

Stage	GPU (VRAM)	System RAM	Batch Size
Training	NVIDIA H100 (80GB)	120GB	16
Inference	NVIDIA RTX A6000 (48GB)	64GB	1

Environment setup

Create a new conda environment named PICS and install the dependencies:

conda env create --file=PICS.yml
conda activate PICS

Weights preparation

DINOv2: Download ViT-g/14 and place it at: checkpoints/dinov2_vitg14_pretrain.pth

🤖 Pretrained Models

We provide the following pretrained models (to be placed at the same directory with DINOv2):

Model	Description	size	Download
PICS	Full model	18.45GB	Download

Minimal Example for Inference

Here is an example of how to use the pretrained models for pairwise image compositing. Run two-object compositing mode:

python run_test.py \
    --input "sample" \
    --output "results/sample" \
    --obj_thr 2

📚 Dataset

Our training set is a mixture of LVIS, VITON-HD, Objects365, Cityscapes, Mapillary Vistas and BDD100K. We provide the processed two-object compositing data in WebDataset format (.tar shards) below:

Model	#Sample	Size	Download
LVIS	34,160	7.98GB	Download
VITON-HD	11,647	2.53GB	Download
Objects365	940,764	243GB	Download
Cityscapes	536	1.21GB	Download
Mapillary Vistas	603	582MB	Download
BDD100K	1,012	204MB	Download

Data organization

PICS/
├── data/
    ├── train/
        ├── LVIS/
            ├── 00000.tar
            ├── ...
        ├── VITONHD/
        ├── Objects365/
        ├── Cityscapes/
        ├── MapillaryVistas/
        ├── BDD100K/

Data preparation instruction

We provide a script using SAM to extract high-quality object silhouettes for the Objects365 dataset. To process a specific range of data shards, run:

python scripts/annotate_sam.py --is_train --index_low 00000 --index_high 10000

To process raw data (e.g., LVIS), run the following command. Replace /path/to/raw_data with your actual local data path:

python -m datasets.lvis \
    --dataset_dir "/path/to/raw_data" \
    --construct_dataset_dir "data/train/LVIS" \
    --area_ratio 0.02 \
    --is_build_data \
    --is_train

Training

To train a model on the whole dataset:

python run_train.py \
    --root_dir 'LOGS/whole_data' \
    --batch_size 16 \
    --logger_freq 1000 \
    --is_joint

⚖️ License

This project is licensed under the terms of the MIT license.

🙌 Acknowledgements

We would like to thank the contributors to the AnyDoor repository for their open research.

Contact Us

For any inquiries, feel free to open a GitHub issue or reach out via email.