File size: 4,426 Bytes
e7a4b16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: apache-2.0
tags:
- composed-image-retrieval
- vision-language
- multimodal
- disentanglement
- pytorch
---

<a id="top"></a>
<div align="center">
  <h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1>

  <p>
    <b>Zhiheng Fu</b><sup>1</sup>&nbsp;
    <b>Zixu Li</b><sup>1</sup>&nbsp;
    <b>Zhiwei Chen</b><sup>1</sup>&nbsp;
    <b>Chunxiao Wang</b><sup>3</sup>&nbsp;
    <b>Xuemeng Song</b><sup>2</sup>&nbsp;
    <b>Yupeng Hu</b><sup>1βœ‰</sup>&nbsp;
    <b>Liqiang Nie</b><sup>4</sup>
  </p>

  <p>
    <sup>1</sup>School of Software, Shandong University&nbsp;
    <sup>2</sup>School of Computer Science and Technology, Shandong University<br>
    <sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences)&nbsp;
    <sup>4</sup>Harbin Institute of Technology (Shenzhen)
  </p>
</div>

These are the official pre-trained model weights for **PAIR**, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement. 

πŸ”— **Paper:** [Accepted by ICASSP 2025] 
πŸ”— **GitHub Repository:** [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
πŸ”— **Project Website:** [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/)

---

## πŸ“Œ Model Information

### 1. Model Name
**PAIR** (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.

### 2. Task Type & Applicable Tasks
- **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
- **Applicable Tasks:** Retrieving target images based on a reference image combined with a relative text modification.

### 3. Project Introduction
Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images. 

**PAIR** addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively **disentangles the visual and textual representations**, achieving more precise multimodal alignment and significantly boosting retrieval performance.

### 4. Training Data Source
The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
- **CIRR** (Open Domain)
- **FashionIQ** (Fashion Domain)
- **Shoes** (Fashion Domain)

---

## πŸš€ Usage & Basic Inference

These weights are designed to be used directly with the official PAIR GitHub repository.

### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
```bash
git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
cd PAIR
conda create -n pair python=3.8.10 -y
conda activate pair

# Install PyTorch
pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)

# Install core dependencies
pip install -r requirements.txt
```

### Step 2: Download Model Weights & Data
Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR).

### Step 3: Run Testing / Inference
To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
```bash
python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
```

To train from scratch, please refer to the `train.py` instructions in the official repository.

---

## ⚠️ Limitations & Notes

**Disclaimer:** This framework and its pre-trained weights are intended for **academic research and multimodal evaluation**.
- The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.

---

## πŸ“β­οΈ Citation

If you find our work or these model weights useful in your research, please consider leaving a **Star** ⭐️ on our GitHub repo and citing our paper:

```bibtex
@article{PAIR2025,
    title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
    author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
    journal={IEEE},
    year = {2025}
}
```