ICASSP25-PAIR / README.md
zhihfu's picture
Update README.md
e7a4b16 verified
---
license: apache-2.0
tags:
- composed-image-retrieval
- vision-language
- multimodal
- disentanglement
- pytorch
---
<a id="top"></a>
<div align="center">
<h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1>
<p>
<b>Zhiheng Fu</b><sup>1</sup>&nbsp;
<b>Zixu Li</b><sup>1</sup>&nbsp;
<b>Zhiwei Chen</b><sup>1</sup>&nbsp;
<b>Chunxiao Wang</b><sup>3</sup>&nbsp;
<b>Xuemeng Song</b><sup>2</sup>&nbsp;
<b>Yupeng Hu</b><sup>1βœ‰</sup>&nbsp;
<b>Liqiang Nie</b><sup>4</sup>
</p>
<p>
<sup>1</sup>School of Software, Shandong University&nbsp;
<sup>2</sup>School of Computer Science and Technology, Shandong University<br>
<sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences)&nbsp;
<sup>4</sup>Harbin Institute of Technology (Shenzhen)
</p>
</div>
These are the official pre-trained model weights for **PAIR**, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement.
πŸ”— **Paper:** [Accepted by ICASSP 2025]
πŸ”— **GitHub Repository:** [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
πŸ”— **Project Website:** [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/)
---
## πŸ“Œ Model Information
### 1. Model Name
**PAIR** (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.
### 2. Task Type & Applicable Tasks
- **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
- **Applicable Tasks:** Retrieving target images based on a reference image combined with a relative text modification.
### 3. Project Introduction
Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images.
**PAIR** addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively **disentangles the visual and textual representations**, achieving more precise multimodal alignment and significantly boosting retrieval performance.
### 4. Training Data Source
The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
- **CIRR** (Open Domain)
- **FashionIQ** (Fashion Domain)
- **Shoes** (Fashion Domain)
---
## πŸš€ Usage & Basic Inference
These weights are designed to be used directly with the official PAIR GitHub repository.
### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
```bash
git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
cd PAIR
conda create -n pair python=3.8.10 -y
conda activate pair
# Install PyTorch
pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)
# Install core dependencies
pip install -r requirements.txt
```
### Step 2: Download Model Weights & Data
Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR).
### Step 3: Run Testing / Inference
To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
```bash
python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
```
To train from scratch, please refer to the `train.py` instructions in the official repository.
---
## ⚠️ Limitations & Notes
**Disclaimer:** This framework and its pre-trained weights are intended for **academic research and multimodal evaluation**.
- The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.
---
## πŸ“β­οΈ Citation
If you find our work or these model weights useful in your research, please consider leaving a **Star** ⭐️ on our GitHub repo and citing our paper:
```bibtex
@article{PAIR2025,
title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
journal={IEEE},
year = {2025}
}
```