πŸš€ INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen1  Yupeng Hu1βœ‰  Zhiheng Fu1  Zixu Li1  Jiale Huang1  Qinlei Huang1  Yinwei Wei1

1School of Software, Shandong University

These are the official pre-trained model weights and configuration files for INTENT, a novel approach designed for Composed Image Retrieval (CIR) with Noisy Correspondence, built upon the BLIP-2 architecture.

πŸ”— Paper: [Accepted by AAAI 2026] πŸ”— GitHub Repository: ZivChen-Ty/INTENT πŸ”— Project Website: INTENT Webpage


πŸ“Œ Model Information

1. Model Name

INTENT (Invariance and Discrimination-aware Noise Mitigation) Checkpoints.

2. Task Type & Applicable Tasks

  • Task Type: Composed Image Retrieval (CIR) / Vision-Language / Multimodal Learning
  • Applicable Tasks: Robust image retrieval based on a reference image and modification text, specifically designed to handle Noisy Correspondence (NC) in training datasets while maintaining state-of-the-art performance in fully-supervised (0% noise) settings.

3. Project Introduction

Dataset biases and noisy correspondences significantly degrade the performance of multimodal alignment. INTENT introduces an Invariance and Discrimination-aware Noise Mitigation framework. By explicitly aligning intervened images with original ones (a causal perspective) and effectively blocking potential backdoor paths, INTENT mitigates spurious correlations and decouples true modification intent from inherent background noise.

πŸ’‘ Note for Fully-Supervised CIR Benchmarking: The 0% noise setting in our framework is equivalent to the traditional fully-supervised CIR paradigm. INTENT achieves highly competitive results even under conventional supervised methods without injected noise.

4. Training Data Source

The model was primarily trained and evaluated on standard CIR datasets under various noise ratios:

  • CIRR (Open Domain)
  • FashionIQ (Fashion Domain)

πŸš€ Usage & Basic Inference

These weights are designed to be used directly with the official INTENT GitHub repository, which is based on the LAVIS library.

Step 1: Prepare the Environment

Clone the GitHub repository and install dependencies (evaluated on Python 3.9 and PyTorch 2.1.0):

git clone [https://github.com/ZivChen-Ty/INTENT.git](https://github.com/ZivChen-Ty/INTENT.git)
cd INTENT
conda create -n intent_env python=3.9 -y
conda activate intent_env

# Install PyTorch (CUDA 12.1 compatibility)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

# Install core dependencies
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
pip install -r requirements.txt

Step 2: Download Model Weights & Data

Download the checkpoint files (e.g., best_model.pth) from this Hugging Face repository and place them in your local checkpoints/intent_run/ directory.

Ensure you also download and structure the dataset images (CIRR and FashionIQ) as specified in the GitHub repo's Data Preparation section.

Step 3: Run Testing / Inference

To generate the required JSON submission files for the CIRR test server using the downloaded checkpoint, run:

python cirr_sub_BLIP2.py \
  --checkpoint_path ./checkpoints/intent_run/best_model.pth \
  --output_file ./submission.json

To train the model from scratch, simply run python train_INTENT.py.


⚠️ Limitations & Notes

Disclaimer: This framework and its pre-trained weights are intended for academic research purposes only.

  • The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation.
  • While designed for noise mitigation, the performance may still fluctuate based on extreme domain shifts not covered by the training distribution.

πŸ“β­οΈ Citation

If you find our work or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repo and citing our paper:

@inproceedings{INTENT,
  title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
  author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support