PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval
Zhiheng Fu1 Zixu Li1 Zhiwei Chen1 Chunxiao Wang3 Xuemeng Song2 Yupeng Hu1β Liqiang Nie4
1School of Software, Shandong University
2School of Computer Science and Technology, Shandong University
3Qilu University of Technology (Shandong Academy of Sciences)
4Harbin Institute of Technology (Shenzhen)
These are the official pre-trained model weights for PAIR, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement.
π Paper: [Accepted by ICASSP 2025] π GitHub Repository: ZhihFu/PAIR π Project Website: PAIR Webpage
π Model Information
1. Model Name
PAIR (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.
2. Task Type & Applicable Tasks
- Task Type: Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
- Applicable Tasks: Retrieving target images based on a reference image combined with a relative text modification.
3. Project Introduction
Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images.
PAIR addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively disentangles the visual and textual representations, achieving more precise multimodal alignment and significantly boosting retrieval performance.
4. Training Data Source
The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
- CIRR (Open Domain)
- FashionIQ (Fashion Domain)
- Shoes (Fashion Domain)
π Usage & Basic Inference
These weights are designed to be used directly with the official PAIR GitHub repository.
Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
cd PAIR
conda create -n pair python=3.8.10 -y
conda activate pair
# Install PyTorch
pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)
# Install core dependencies
pip install -r requirements.txt
Step 2: Download Model Weights & Data
Download the checkpoint files (e.g., PAIR_CIRR.pt) from this Hugging Face repository and place them in the checkpoints/ directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the GitHub repo's Data Preparation section.
Step 3: Run Testing / Inference
To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
To train from scratch, please refer to the train.py instructions in the official repository.
β οΈ Limitations & Notes
Disclaimer: This framework and its pre-trained weights are intended for academic research and multimodal evaluation.
- The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.
πβοΈ Citation
If you find our work or these model weights useful in your research, please consider leaving a Star βοΈ on our GitHub repo and citing our paper:
@article{PAIR2025,
title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
journal={IEEE},
year = {2025}
}