PICS / Readme.md
Hang Zhou
Upload Readme.md with huggingface_hub
25614a2 verified

PICS: Pairwise Image Compositing with Spatial Interactions

Check out our Project Page for more visual demos!

⏩ Updates

02/08/2026

  • Release training and inference code.
  • Release training data.

03/01/2025

  • Release checkpoints.

🚧 TODO List

  • Release training and inference code for pairwise image compositing
  • Release datasets (LVIS, Objects365, etc. in WebDataset format)
  • Release pretrained models
  • Release any-object compositing code

πŸ“¦ Installation

Prerequisites

  • OS: Linux (Tested on Ubuntu 20.04/22.04).
  • Python: 3.10 or higher.
  • Package Manager: Conda is recommended.

Hardware Requirements

Stage GPU (VRAM) System RAM Batch Size
Training NVIDIA H100 (80GB) 120GB 16
Inference NVIDIA RTX A6000 (48GB) 64GB 1

Environment setup

Create a new conda environment named PICS and install the dependencies:

conda env create --file=PICS.yml
conda activate PICS

Weights preparation

DINOv2: Download ViT-g/14 and place it at: checkpoints/dinov2_vitg14_pretrain.pth

πŸ€– Pretrained Models

We provide the following pretrained models (to be placed at the same directory with DINOv2):

Model Description size Download
PICS Full model 18.45GB Download

Minimal Example for Inference

Here is an example of how to use the pretrained models for pairwise image compositing. Run two-object compositing mode:

python run_test.py \
    --input "sample" \
    --output "results/sample" \
    --obj_thr 2

πŸ“š Dataset

Our training set is a mixture of LVIS, VITON-HD, Objects365, Cityscapes, Mapillary Vistas and BDD100K. We provide the processed two-object compositing data in WebDataset format (.tar shards) below:

Model #Sample Size Download
LVIS 34,160 7.98GB Download
VITON-HD 11,647 2.53GB Download
Objects365 940,764 243GB Download
Cityscapes 536 1.21GB Download
Mapillary Vistas 603 582MB Download
BDD100K 1,012 204MB Download

Data organization

PICS/
β”œβ”€β”€ data/
    β”œβ”€β”€ train/
        β”œβ”€β”€ LVIS/
            β”œβ”€β”€ 00000.tar
            β”œβ”€β”€ ...
        β”œβ”€β”€ VITONHD/
        β”œβ”€β”€ Objects365/
        β”œβ”€β”€ Cityscapes/
        β”œβ”€β”€ MapillaryVistas/
        β”œβ”€β”€ BDD100K/

Data preparation instruction

We provide a script using SAM to extract high-quality object silhouettes for the Objects365 dataset. To process a specific range of data shards, run:

python scripts/annotate_sam.py --is_train --index_low 00000 --index_high 10000

To process raw data (e.g., LVIS), run the following command. Replace /path/to/raw_data with your actual local data path:

python -m datasets.lvis \
    --dataset_dir "/path/to/raw_data" \
    --construct_dataset_dir "data/train/LVIS" \
    --area_ratio 0.02 \
    --is_build_data \
    --is_train

Training

To train a model on the whole dataset:

python run_train.py \
    --root_dir 'LOGS/whole_data' \
    --batch_size 16 \
    --logger_freq 1000 \
    --is_joint

βš–οΈ License

This project is licensed under the terms of the MIT license.

πŸ™Œ Acknowledgements

We would like to thank the contributors to the AnyDoor repository for their open research.

Contact Us

For any inquiries, feel free to open a GitHub issue or reach out via email.