iLearn-Lab
/

ICASSP25-PAIR

+---
+license: apache-2.0
+tags:
+- composed-image-retrieval
+- vision-language
+- multimodal
+- disentanglement
+- pytorch
+---
+<a id="top"></a>
+<div align="center">
+  <h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1>
+  <p>
+    <b>Zhiheng Fu</b><sup>1</sup>&nbsp;
+    <b>Zixu Li</b><sup>1</sup>&nbsp;
+    <b>Zhiwei Chen</b><sup>1</sup>&nbsp;
+    <b>Chunxiao Wang</b><sup>3</sup>&nbsp;
+    <b>Xuemeng Song</b><sup>2</sup>&nbsp;
+    <b>Yupeng Hu</b><sup>1✉</sup>&nbsp;
+    <b>Liqiang Nie</b><sup>4</sup>
+  </p>
+  <p>
+    <sup>1</sup>School of Software, Shandong University&nbsp;
+    <sup>2</sup>School of Computer Science and Technology, Shandong University<br>
+    <sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences)&nbsp;
+    <sup>4</sup>Harbin Institute of Technology (Shenzhen)
+  </p>
+</div>
+These are the official pre-trained model weights for **PAIR**, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement.
+🔗 **Paper:** [Accepted by ICASSP 2025]
+🔗 **GitHub Repository:** [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
+🔗 **Project Website:** [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/)
+---
+## 📌 Model Information
+### 1. Model Name
+**PAIR** (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.
+### 2. Task Type & Applicable Tasks
+- **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
+- **Applicable Tasks:** Retrieving target images based on a reference image combined with a relative text modification.
+### 3. Project Introduction
+Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images.
+**PAIR** addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively **disentangles the visual and textual representations**, achieving more precise multimodal alignment and significantly boosting retrieval performance.
+### 4. Training Data Source
+The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
+- **CIRR** (Open Domain)
+- **FashionIQ** (Fashion Domain)
+- **Shoes** (Fashion Domain)
+---
+## 🚀 Usage & Basic Inference
+These weights are designed to be used directly with the official PAIR GitHub repository.
+### Step 1: Prepare the Environment
+Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
+```bash
+git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
+cd PAIR
+conda create -n pair python=3.8.10 -y
+conda activate pair
+# Install PyTorch
+pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)
+# Install core dependencies
+pip install -r requirements.txt
+```
+### Step 2: Download Model Weights & Data
+Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR).
+### Step 3: Run Testing / Inference
+To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
+```bash
+python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
+```
+To train from scratch, please refer to the `train.py` instructions in the official repository.
+---
+## ⚠️ Limitations & Notes
+**Disclaimer:** This framework and its pre-trained weights are intended for **academic research and multimodal evaluation**.
+- The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.
+---
+## 📝⭐️ Citation
+If you find our work or these model weights useful in your research, please consider leaving a **Star** ⭐️ on our GitHub repo and citing our paper:
+```bibtex
+@article{PAIR2025,
+    title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
+    author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
+    journal={IEEE},
+    year = {2025}
+}
+```