zhihfu commited on
Commit
e7a4b16
ยท
verified ยท
1 Parent(s): 2de465d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -3
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - composed-image-retrieval
5
+ - vision-language
6
+ - multimodal
7
+ - disentanglement
8
+ - pytorch
9
+ ---
10
+
11
+ <a id="top"></a>
12
+ <div align="center">
13
+ <h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1>
14
+
15
+ <p>
16
+ <b>Zhiheng Fu</b><sup>1</sup>&nbsp;
17
+ <b>Zixu Li</b><sup>1</sup>&nbsp;
18
+ <b>Zhiwei Chen</b><sup>1</sup>&nbsp;
19
+ <b>Chunxiao Wang</b><sup>3</sup>&nbsp;
20
+ <b>Xuemeng Song</b><sup>2</sup>&nbsp;
21
+ <b>Yupeng Hu</b><sup>1โœ‰</sup>&nbsp;
22
+ <b>Liqiang Nie</b><sup>4</sup>
23
+ </p>
24
+
25
+ <p>
26
+ <sup>1</sup>School of Software, Shandong University&nbsp;
27
+ <sup>2</sup>School of Computer Science and Technology, Shandong University<br>
28
+ <sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences)&nbsp;
29
+ <sup>4</sup>Harbin Institute of Technology (Shenzhen)
30
+ </p>
31
+ </div>
32
+
33
+ These are the official pre-trained model weights for **PAIR**, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement.
34
+
35
+ ๐Ÿ”— **Paper:** [Accepted by ICASSP 2025]
36
+ ๐Ÿ”— **GitHub Repository:** [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
37
+ ๐Ÿ”— **Project Website:** [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/)
38
+
39
+ ---
40
+
41
+ ## ๐Ÿ“Œ Model Information
42
+
43
+ ### 1. Model Name
44
+ **PAIR** (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.
45
+
46
+ ### 2. Task Type & Applicable Tasks
47
+ - **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
48
+ - **Applicable Tasks:** Retrieving target images based on a reference image combined with a relative text modification.
49
+
50
+ ### 3. Project Introduction
51
+ Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images.
52
+
53
+ **PAIR** addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively **disentangles the visual and textual representations**, achieving more precise multimodal alignment and significantly boosting retrieval performance.
54
+
55
+ ### 4. Training Data Source
56
+ The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
57
+ - **CIRR** (Open Domain)
58
+ - **FashionIQ** (Fashion Domain)
59
+ - **Shoes** (Fashion Domain)
60
+
61
+ ---
62
+
63
+ ## ๐Ÿš€ Usage & Basic Inference
64
+
65
+ These weights are designed to be used directly with the official PAIR GitHub repository.
66
+
67
+ ### Step 1: Prepare the Environment
68
+ Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
69
+ ```bash
70
+ git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
71
+ cd PAIR
72
+ conda create -n pair python=3.8.10 -y
73
+ conda activate pair
74
+
75
+ # Install PyTorch
76
+ pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)
77
+
78
+ # Install core dependencies
79
+ pip install -r requirements.txt
80
+ ```
81
+
82
+ ### Step 2: Download Model Weights & Data
83
+ Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR).
84
+
85
+ ### Step 3: Run Testing / Inference
86
+ To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
87
+ ```bash
88
+ python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
89
+ ```
90
+
91
+ To train from scratch, please refer to the `train.py` instructions in the official repository.
92
+
93
+ ---
94
+
95
+ ## โš ๏ธ Limitations & Notes
96
+
97
+ **Disclaimer:** This framework and its pre-trained weights are intended for **academic research and multimodal evaluation**.
98
+ - The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.
99
+
100
+ ---
101
+
102
+ ## ๐Ÿ“โญ๏ธ Citation
103
+
104
+ If you find our work or these model weights useful in your research, please consider leaving a **Star** โญ๏ธ on our GitHub repo and citing our paper:
105
+
106
+ ```bibtex
107
+ @article{PAIR2025,
108
+ title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
109
+ author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
110
+ journal={IEEE},
111
+ year = {2025}
112
+ }
113
+ ```