| --- |
| license: apache-2.0 |
| tags: |
| - composed-image-retrieval |
| - vision-language |
| - multimodal |
| - noise-mitigation |
| - blip-2 |
| - pytorch |
| --- |
| |
| <a id="top"></a> |
| <div align="center"> |
| <h1>π INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval</h1> |
|
|
| <p> |
| <b>Zhiwei Chen</b><sup>1</sup> |
| <b>Yupeng Hu</b><sup>1β</sup> |
| <b>Zhiheng Fu</b><sup>1</sup> |
| <b>Zixu Li</b><sup>1</sup> |
| <b>Jiale Huang</b><sup>1</sup> |
| <b>Qinlei Huang</b><sup>1</sup> |
| <b>Yinwei Wei</b><sup>1</sup> |
| </p> |
| |
| <p> |
| <sup>1</sup>School of Software, Shandong University |
| </p> |
| </div> |
| |
| These are the official pre-trained model weights and configuration files for **INTENT**, a novel approach designed for Composed Image Retrieval (CIR) with Noisy Correspondence, built upon the BLIP-2 architecture. |
|
|
| π **Paper:** [Accepted by AAAI 2026] |
| π **GitHub Repository:** [ZivChen-Ty/INTENT](https://github.com/ZivChen-Ty/INTENT) |
| π **Project Website:** [INTENT Webpage](https://zivchen-ty.github.io/INTENT.github.io/) |
|
|
| --- |
|
|
| ## π Model Information |
|
|
| ### 1. Model Name |
| **INTENT** (Invariance and Discrimination-aware Noise Mitigation) Checkpoints. |
|
|
| ### 2. Task Type & Applicable Tasks |
| - **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Learning |
| - **Applicable Tasks:** Robust image retrieval based on a reference image and modification text, specifically designed to handle **Noisy Correspondence (NC)** in training datasets while maintaining state-of-the-art performance in fully-supervised (0% noise) settings. |
|
|
| ### 3. Project Introduction |
| Dataset biases and noisy correspondences significantly degrade the performance of multimodal alignment. **INTENT** introduces an Invariance and Discrimination-aware Noise Mitigation framework. By explicitly aligning intervened images with original ones (a causal perspective) and effectively blocking potential backdoor paths, INTENT mitigates spurious correlations and decouples true modification intent from inherent background noise. |
|
|
| > π‘ **Note for Fully-Supervised CIR Benchmarking:** The **0% noise** setting in our framework is equivalent to the traditional fully-supervised CIR paradigm. INTENT achieves highly competitive results even under conventional supervised methods without injected noise. |
|
|
| ### 4. Training Data Source |
| The model was primarily trained and evaluated on standard CIR datasets under various noise ratios: |
| - **CIRR** (Open Domain) |
| - **FashionIQ** (Fashion Domain) |
|
|
| --- |
|
|
| ## π Usage & Basic Inference |
|
|
| These weights are designed to be used directly with the official INTENT GitHub repository, which is based on the [LAVIS](https://github.com/salesforce/LAVIS) library. |
|
|
| ### Step 1: Prepare the Environment |
| Clone the GitHub repository and install dependencies (evaluated on Python 3.9 and PyTorch 2.1.0): |
| ```bash |
| git clone [https://github.com/ZivChen-Ty/INTENT.git](https://github.com/ZivChen-Ty/INTENT.git) |
| cd INTENT |
| conda create -n intent_env python=3.9 -y |
| conda activate intent_env |
| |
| # Install PyTorch (CUDA 12.1 compatibility) |
| pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121) |
| |
| # Install core dependencies |
| pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16 |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Step 2: Download Model Weights & Data |
| Download the checkpoint files (e.g., `best_model.pth`) from this Hugging Face repository and place them in your local `checkpoints/intent_run/` directory. |
|
|
| Ensure you also download and structure the dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZivChen-Ty/INTENT). |
|
|
| ### Step 3: Run Testing / Inference |
| To generate the required JSON submission files for the CIRR test server using the downloaded checkpoint, run: |
| ```bash |
| python cirr_sub_BLIP2.py \ |
| --checkpoint_path ./checkpoints/intent_run/best_model.pth \ |
| --output_file ./submission.json |
| ``` |
|
|
| To train the model from scratch, simply run `python train_INTENT.py`. |
|
|
| --- |
|
|
| ## β οΈ Limitations & Notes |
|
|
| **Disclaimer:** This framework and its pre-trained weights are intended for **academic research purposes only**. |
| - The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation. |
| - While designed for noise mitigation, the performance may still fluctuate based on extreme domain shifts not covered by the training distribution. |
|
|
| --- |
|
|
| ## πβοΈ Citation |
|
|
| If you find our work or these model weights useful in your research, please consider leaving a **Star** βοΈ on our GitHub repo and citing our paper: |
|
|
| ```bibtex |
| @inproceedings{INTENT, |
| title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval}, |
| author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei}, |
| booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, |
| year={2026} |
| } |
| ``` |
|
|