Update README.md

cebcd59 verified 1 day ago

5.1 kB

	---
	license: apache-2.0
	tags:
	- composed-image-retrieval
	- vision-language
	- multimodal
	- noise-mitigation
	- blip-2
	- pytorch
	---

	<a id="top"></a>
	<div align="center">
	<h1>🚀 INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval</h1>

	<p>
	<b>Zhiwei Chen</b><sup>1</sup>
	<b>Yupeng Hu</b><sup>1✉</sup>
	<b>Zhiheng Fu</b><sup>1</sup>
	<b>Zixu Li</b><sup>1</sup>
	<b>Jiale Huang</b><sup>1</sup>
	<b>Qinlei Huang</b><sup>1</sup>
	<b>Yinwei Wei</b><sup>1</sup>
	</p>

	<p>
	<sup>1</sup>School of Software, Shandong University
	</p>
	</div>

	These are the official pre-trained model weights and configuration files for INTENT, a novel approach designed for Composed Image Retrieval (CIR) with Noisy Correspondence, built upon the BLIP-2 architecture.

	🔗 Paper: [Accepted by AAAI 2026]
	🔗 GitHub Repository: [ZivChen-Ty/INTENT](https://github.com/ZivChen-Ty/INTENT)
	🔗 Project Website: [INTENT Webpage](https://zivchen-ty.github.io/INTENT.github.io/)

	---

	## 📌 Model Information

	### 1. Model Name
	INTENT (Invariance and Discrimination-aware Noise Mitigation) Checkpoints.

	### 2. Task Type & Applicable Tasks
	- Task Type: Composed Image Retrieval (CIR) / Vision-Language / Multimodal Learning
	- Applicable Tasks: Robust image retrieval based on a reference image and modification text, specifically designed to handle Noisy Correspondence (NC) in training datasets while maintaining state-of-the-art performance in fully-supervised (0% noise) settings.

	### 3. Project Introduction
	Dataset biases and noisy correspondences significantly degrade the performance of multimodal alignment. INTENT introduces an Invariance and Discrimination-aware Noise Mitigation framework. By explicitly aligning intervened images with original ones (a causal perspective) and effectively blocking potential backdoor paths, INTENT mitigates spurious correlations and decouples true modification intent from inherent background noise.

	> 💡 Note for Fully-Supervised CIR Benchmarking: The 0% noise setting in our framework is equivalent to the traditional fully-supervised CIR paradigm. INTENT achieves highly competitive results even under conventional supervised methods without injected noise.

	### 4. Training Data Source
	The model was primarily trained and evaluated on standard CIR datasets under various noise ratios:
	- CIRR (Open Domain)
	- FashionIQ (Fashion Domain)

	---

	## 🚀 Usage & Basic Inference

	These weights are designed to be used directly with the official INTENT GitHub repository, which is based on the [LAVIS](https://github.com/salesforce/LAVIS) library.

	### Step 1: Prepare the Environment
	Clone the GitHub repository and install dependencies (evaluated on Python 3.9 and PyTorch 2.1.0):
	```bash
	git clone [https://github.com/ZivChen-Ty/INTENT.git](https://github.com/ZivChen-Ty/INTENT.git)
	cd INTENT
	conda create -n intent_env python=3.9 -y
	conda activate intent_env

	# Install PyTorch (CUDA 12.1 compatibility)
	pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

	# Install core dependencies
	pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
	pip install -r requirements.txt
	```

	### Step 2: Download Model Weights & Data
	Download the checkpoint files (e.g., `best_model.pth`) from this Hugging Face repository and place them in your local `checkpoints/intent_run/` directory.

	Ensure you also download and structure the dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZivChen-Ty/INTENT).

	### Step 3: Run Testing / Inference
	To generate the required JSON submission files for the CIRR test server using the downloaded checkpoint, run:
	```bash
	python cirr_sub_BLIP2.py \
	--checkpoint_path ./checkpoints/intent_run/best_model.pth \
	--output_file ./submission.json
	```

	To train the model from scratch, simply run `python train_INTENT.py`.

	---

	## ⚠️ Limitations & Notes

	Disclaimer: This framework and its pre-trained weights are intended for academic research purposes only.
	- The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation.
	- While designed for noise mitigation, the performance may still fluctuate based on extreme domain shifts not covered by the training distribution.

	---

	## 📝⭐️ Citation

	If you find our work or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repo and citing our paper:

	```bibtex
	@inproceedings{INTENT,
	title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
	author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
	booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
	year={2026}
	}
	```