Update README.md

e7a4b16 verified about 17 hours ago

4.43 kB

	---
	license: apache-2.0
	tags:
	- composed-image-retrieval
	- vision-language
	- multimodal
	- disentanglement
	- pytorch
	---

	<a id="top"></a>
	<div align="center">
	<h1>PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval</h1>

	<p>
	<b>Zhiheng Fu</b><sup>1</sup>
	<b>Zixu Li</b><sup>1</sup>
	<b>Zhiwei Chen</b><sup>1</sup>
	<b>Chunxiao Wang</b><sup>3</sup>
	<b>Xuemeng Song</b><sup>2</sup>
	<b>Yupeng Hu</b><sup>1✉</sup>
	<b>Liqiang Nie</b><sup>4</sup>
	</p>

	<p>
	<sup>1</sup>School of Software, Shandong University
	<sup>2</sup>School of Computer Science and Technology, Shandong University<br>
	<sup>3</sup>Qilu University of Technology (Shandong Academy of Sciences)
	<sup>4</sup>Harbin Institute of Technology (Shenzhen)
	</p>
	</div>

	These are the official pre-trained model weights for PAIR, a novel framework designed for Composed Image Retrieval (CIR) via complementarity-guided disentanglement.

	🔗 Paper: [Accepted by ICASSP 2025]
	🔗 GitHub Repository: [ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
	🔗 Project Website: [PAIR Webpage](https://zhihfu.github.io/PAIR.github.io/)

	---

	## 📌 Model Information

	### 1. Model Name
	PAIR (Complementarity-guided Disentanglement for Composed Image Retrieval) Checkpoints.

	### 2. Task Type & Applicable Tasks
	- Task Type: Composed Image Retrieval (CIR) / Vision-Language / Multimodal Alignment
	- Applicable Tasks: Retrieving target images based on a reference image combined with a relative text modification.

	### 3. Project Introduction
	Existing methods for Composed Image Retrieval (CIR) often suffer from semantic entanglement between multimodal queries and target images.

	PAIR addresses this limitation by exploring the inherent relationships between these modalities. Guided by their complementarity, PAIR effectively disentangles the visual and textual representations, achieving more precise multimodal alignment and significantly boosting retrieval performance.

	### 4. Training Data Source
	The pre-trained checkpoints are primarily trained and evaluated on three standard CIR datasets:
	- CIRR (Open Domain)
	- FashionIQ (Fashion Domain)
	- Shoes (Fashion Domain)

	---

	## 🚀 Usage & Basic Inference

	These weights are designed to be used directly with the official PAIR GitHub repository.

	### Step 1: Prepare the Environment
	Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.0.0):
	```bash
	git clone [https://github.com/ZhihFu/PAIR](https://github.com/ZhihFu/PAIR)
	cd PAIR
	conda create -n pair python=3.8.10 -y
	conda activate pair

	# Install PyTorch
	pip install torch==2.0.0 torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu118](https://download.pytorch.org/whl/cu118)

	# Install core dependencies
	pip install -r requirements.txt
	```

	### Step 2: Download Model Weights & Data
	Download the checkpoint files (e.g., `PAIR_CIRR.pt`) from this Hugging Face repository and place them in the `checkpoints/` directory of your cloned GitHub repo. Ensure you also download and structure the dataset images as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/PAIR).

	### Step 3: Run Testing / Inference
	To evaluate the model or generate prediction files using the downloaded checkpoint (for example, on the CIRR dataset), run:
	```bash
	python src/cirr_test_submission.py checkpoints/PAIR_CIRR.pt
	```

	To train from scratch, please refer to the `train.py` instructions in the official repository.

	---

	## ⚠️ Limitations & Notes

	Disclaimer: This framework and its pre-trained weights are intended for academic research and multimodal evaluation.
	- The model requires access to the original source datasets (CIRR, FashionIQ, Shoes) for full evaluation. Users must comply with the original licenses of those respective datasets.

	---

	## 📝⭐️ Citation

	If you find our work or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repo and citing our paper:

	```bibtex
	@article{PAIR2025,
	title={PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval},
	author={Fu, Zhiheng and Li, Zixu and Chen, Zhiwei and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
	journal={IEEE},
	year = {2025}
	}
	```