π INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval
Zhiwei Chen1 Yupeng Hu1β Zhiheng Fu1 Zixu Li1 Jiale Huang1 Qinlei Huang1 Yinwei Wei1
1School of Software, Shandong University
These are the official pre-trained model weights and configuration files for INTENT, a novel approach designed for Composed Image Retrieval (CIR) with Noisy Correspondence, built upon the BLIP-2 architecture.
π Paper: [Accepted by AAAI 2026] π GitHub Repository: ZivChen-Ty/INTENT π Project Website: INTENT Webpage
π Model Information
1. Model Name
INTENT (Invariance and Discrimination-aware Noise Mitigation) Checkpoints.
2. Task Type & Applicable Tasks
- Task Type: Composed Image Retrieval (CIR) / Vision-Language / Multimodal Learning
- Applicable Tasks: Robust image retrieval based on a reference image and modification text, specifically designed to handle Noisy Correspondence (NC) in training datasets while maintaining state-of-the-art performance in fully-supervised (0% noise) settings.
3. Project Introduction
Dataset biases and noisy correspondences significantly degrade the performance of multimodal alignment. INTENT introduces an Invariance and Discrimination-aware Noise Mitigation framework. By explicitly aligning intervened images with original ones (a causal perspective) and effectively blocking potential backdoor paths, INTENT mitigates spurious correlations and decouples true modification intent from inherent background noise.
π‘ Note for Fully-Supervised CIR Benchmarking: The 0% noise setting in our framework is equivalent to the traditional fully-supervised CIR paradigm. INTENT achieves highly competitive results even under conventional supervised methods without injected noise.
4. Training Data Source
The model was primarily trained and evaluated on standard CIR datasets under various noise ratios:
- CIRR (Open Domain)
- FashionIQ (Fashion Domain)
π Usage & Basic Inference
These weights are designed to be used directly with the official INTENT GitHub repository, which is based on the LAVIS library.
Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies (evaluated on Python 3.9 and PyTorch 2.1.0):
git clone [https://github.com/ZivChen-Ty/INTENT.git](https://github.com/ZivChen-Ty/INTENT.git)
cd INTENT
conda create -n intent_env python=3.9 -y
conda activate intent_env
# Install PyTorch (CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
pip install -r requirements.txt
Step 2: Download Model Weights & Data
Download the checkpoint files (e.g., best_model.pth) from this Hugging Face repository and place them in your local checkpoints/intent_run/ directory.
Ensure you also download and structure the dataset images (CIRR and FashionIQ) as specified in the GitHub repo's Data Preparation section.
Step 3: Run Testing / Inference
To generate the required JSON submission files for the CIRR test server using the downloaded checkpoint, run:
python cirr_sub_BLIP2.py \
--checkpoint_path ./checkpoints/intent_run/best_model.pth \
--output_file ./submission.json
To train the model from scratch, simply run python train_INTENT.py.
β οΈ Limitations & Notes
Disclaimer: This framework and its pre-trained weights are intended for academic research purposes only.
- The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation.
- While designed for noise mitigation, the performance may still fluctuate based on extreme domain shifts not covered by the training distribution.
πβοΈ Citation
If you find our work or these model weights useful in your research, please consider leaving a Star βοΈ on our GitHub repo and citing our paper:
@inproceedings{INTENT,
title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}