Lee-zixu commited on
Commit
cebcd59
Β·
verified Β·
1 Parent(s): fcb11c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - composed-image-retrieval
5
+ - vision-language
6
+ - multimodal
7
+ - noise-mitigation
8
+ - blip-2
9
+ - pytorch
10
+ ---
11
+
12
+ <a id="top"></a>
13
+ <div align="center">
14
+ <h1>πŸš€ INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval</h1>
15
+
16
+ <p>
17
+ <b>Zhiwei Chen</b><sup>1</sup>&nbsp;
18
+ <b>Yupeng Hu</b><sup>1βœ‰</sup>&nbsp;
19
+ <b>Zhiheng Fu</b><sup>1</sup>&nbsp;
20
+ <b>Zixu Li</b><sup>1</sup>&nbsp;
21
+ <b>Jiale Huang</b><sup>1</sup>&nbsp;
22
+ <b>Qinlei Huang</b><sup>1</sup>&nbsp;
23
+ <b>Yinwei Wei</b><sup>1</sup>
24
+ </p>
25
+
26
+ <p>
27
+ <sup>1</sup>School of Software, Shandong University
28
+ </p>
29
+ </div>
30
+
31
+ These are the official pre-trained model weights and configuration files for **INTENT**, a novel approach designed for Composed Image Retrieval (CIR) with Noisy Correspondence, built upon the BLIP-2 architecture.
32
+
33
+ πŸ”— **Paper:** [Accepted by AAAI 2026]
34
+ πŸ”— **GitHub Repository:** [ZivChen-Ty/INTENT](https://github.com/ZivChen-Ty/INTENT)
35
+ πŸ”— **Project Website:** [INTENT Webpage](https://zivchen-ty.github.io/INTENT.github.io/)
36
+
37
+ ---
38
+
39
+ ## πŸ“Œ Model Information
40
+
41
+ ### 1. Model Name
42
+ **INTENT** (Invariance and Discrimination-aware Noise Mitigation) Checkpoints.
43
+
44
+ ### 2. Task Type & Applicable Tasks
45
+ - **Task Type:** Composed Image Retrieval (CIR) / Vision-Language / Multimodal Learning
46
+ - **Applicable Tasks:** Robust image retrieval based on a reference image and modification text, specifically designed to handle **Noisy Correspondence (NC)** in training datasets while maintaining state-of-the-art performance in fully-supervised (0% noise) settings.
47
+
48
+ ### 3. Project Introduction
49
+ Dataset biases and noisy correspondences significantly degrade the performance of multimodal alignment. **INTENT** introduces an Invariance and Discrimination-aware Noise Mitigation framework. By explicitly aligning intervened images with original ones (a causal perspective) and effectively blocking potential backdoor paths, INTENT mitigates spurious correlations and decouples true modification intent from inherent background noise.
50
+
51
+ > πŸ’‘ **Note for Fully-Supervised CIR Benchmarking:** The **0% noise** setting in our framework is equivalent to the traditional fully-supervised CIR paradigm. INTENT achieves highly competitive results even under conventional supervised methods without injected noise.
52
+
53
+ ### 4. Training Data Source
54
+ The model was primarily trained and evaluated on standard CIR datasets under various noise ratios:
55
+ - **CIRR** (Open Domain)
56
+ - **FashionIQ** (Fashion Domain)
57
+
58
+ ---
59
+
60
+ ## πŸš€ Usage & Basic Inference
61
+
62
+ These weights are designed to be used directly with the official INTENT GitHub repository, which is based on the [LAVIS](https://github.com/salesforce/LAVIS) library.
63
+
64
+ ### Step 1: Prepare the Environment
65
+ Clone the GitHub repository and install dependencies (evaluated on Python 3.9 and PyTorch 2.1.0):
66
+ ```bash
67
+ git clone [https://github.com/ZivChen-Ty/INTENT.git](https://github.com/ZivChen-Ty/INTENT.git)
68
+ cd INTENT
69
+ conda create -n intent_env python=3.9 -y
70
+ conda activate intent_env
71
+
72
+ # Install PyTorch (CUDA 12.1 compatibility)
73
+ pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
74
+
75
+ # Install core dependencies
76
+ pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
77
+ pip install -r requirements.txt
78
+ ```
79
+
80
+ ### Step 2: Download Model Weights & Data
81
+ Download the checkpoint files (e.g., `best_model.pth`) from this Hugging Face repository and place them in your local `checkpoints/intent_run/` directory.
82
+
83
+ Ensure you also download and structure the dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZivChen-Ty/INTENT).
84
+
85
+ ### Step 3: Run Testing / Inference
86
+ To generate the required JSON submission files for the CIRR test server using the downloaded checkpoint, run:
87
+ ```bash
88
+ python cirr_sub_BLIP2.py \
89
+ --checkpoint_path ./checkpoints/intent_run/best_model.pth \
90
+ --output_file ./submission.json
91
+ ```
92
+
93
+ To train the model from scratch, simply run `python train_INTENT.py`.
94
+
95
+ ---
96
+
97
+ ## ⚠️ Limitations & Notes
98
+
99
+ **Disclaimer:** This framework and its pre-trained weights are intended for **academic research purposes only**.
100
+ - The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation.
101
+ - While designed for noise mitigation, the performance may still fluctuate based on extreme domain shifts not covered by the training distribution.
102
+
103
+ ---
104
+
105
+ ## πŸ“β­οΈ Citation
106
+
107
+ If you find our work or these model weights useful in your research, please consider leaving a **Star** ⭐️ on our GitHub repo and citing our paper:
108
+
109
+ ```bibtex
110
+ @inproceedings{INTENT,
111
+ title={INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval},
112
+ author={Chen, Zhiwei and Hu, Yupeng and Fu, Zhiheng and Li, Zixu and Huang, Jiale and Huang, Qinlei and Wei, Yinwei},
113
+ booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
114
+ year={2026}
115
+ }
116
+ ```