| --- |
| license: apache-2.0 |
| tags: |
| - composed-image-retrieval |
| - vision-language |
| - multimodal |
| - disentanglement |
| - pytorch |
| --- |
| |
| <a id="top"></a> |
| <div align="center"> |
| <h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1> |
|
|
| <p> |
| <b>Zhiheng Fu</b><sup>1</sup> |
| <b>Zixu Li</b><sup>1</sup> |
| <b>Zhiwei Chen</b><sup>1</sup> |
| <b>Chunxiao Wang</b><sup>3</sup> |
| <b>Xuemeng Song</b><sup>2</sup> |
| <b>Yupeng Hu</b><sup>1β</sup> |
| <b>Liqiang Nie</b><sup>4</sup> |
| </p> |
| |
| <p> |
| <sup>1</sup>School of Software, Shandong University |
| <sup>2</sup>School of Computer Science and Technology, Shandong University<br> |
| <sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences) |
| <sup>4</sup>Harbin Institute of Technology (Shenzhen) |
| </p> |
| </div> |
| |
| These are the official pre-trained model weights for **PAIR**, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement. |
|
|
| π **Paper:** [Accepted by ICASSP 2025] |
| π **GitHub Repository:** [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR) |
| π **Project Website:** [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/) |
|
|
| --- |
|
|
| ## π Model Information |
|
|
| ### 1. Model Name |
| **PAIR** (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints. |
|
|
| ### 2. Task Type & Applicable Tasks |
| - **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment |
| - **Applicable Tasks:** Retrieving target images based on a reference image combined with a relative text modification. |
|
|
| ### 3. Project Introduction |
| Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images. |
|
|
| **PAIR** addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively **disentangles the visual and textual representations**, achieving more precise multimodal alignment and significantly boosting retrieval performance. |
|
|
| ### 4. Training Data Source |
| The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets: |
| - **CIRR** (Open Domain) |
| - **FashionIQ** (Fashion Domain) |
| - **Shoes** (Fashion Domain) |
|
|
| --- |
|
|
| ## π Usage & Basic Inference |
|
|
| These weights are designed to be used directly with the official PAIR GitHub repository. |
|
|
| ### Step 1: Prepare the Environment |
| Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0): |
| ```bash |
| git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR) |
| cd PAIR |
| conda create -n pair python=3.8.10 -y |
| conda activate pair |
| |
| # Install PyTorch |
| pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118) |
| |
| # Install core dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Step 2: Download Model Weights & Data |
| Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR). |
|
|
| ### Step 3: Run Testing / Inference |
| To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run: |
| ```bash |
| python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt |
| ``` |
|
|
| To train from scratch, please refer to the `train.py` instructions in the official repository. |
|
|
| --- |
|
|
| ## β οΈ Limitations & Notes |
|
|
| **Disclaimer:** This framework and its pre-trained weights are intended for **academic research and multimodal evaluation**. |
| - The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets. |
|
|
| --- |
|
|
| ## πβοΈ Citation |
|
|
| If you find our work or these model weights useful in your research, please consider leaving a **Star** βοΈ on our GitHub repo and citing our paper: |
|
|
| ```bibtex |
| @article{PAIR2025, |
| title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval}, |
| author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang}, |
| journal={IEEE}, |
| year = {2025} |
| } |
| ``` |
|
|