zh-mingyu commited on
Commit
e1ac665
Β·
verified Β·
1 Parent(s): 37266c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ task_categories:
4
+ - image-retrieval
5
+ - vision-language-navigation
6
+ tags:
7
+ - composed-image-retrieval
8
+ - robust-learning
9
+ - blip-2
10
+ - pytorch
11
+ - icassp-2026
12
+ ---
13
+ <a id="top"></a>
14
+ <div align="center">
15
+ <h1>(ICASSP 2026) HINT: Composed Image Retrieval with Dual-Path Compositional Contextualized Network (Model Weights)</h1>
16
+ <div>
17
+ <a target="_blank" href="https://zh-mingyu.github.io/">Mingyu&#160;Zhang</a><sup>1</sup>,
18
+ <a target="_blank" href="https://lee-zixu.github.io/">Zixu&#160;Li</a><sup>1</sup>,
19
+ <a target="_blank" href="https://zivchen-ty.github.io/">Zhiwei&#160;Chen</a><sup>1</sup>,
20
+ <a target="_blank" href="https://zhihfu.github.io/">Zhiheng&#160;Fu</a><sup>1</sup>,
21
+ Xiaowei&#160;Zhu<sup>1</sup>,
22
+ Jiajia&#160;Nie<sup>1</sup>,
23
+ <a target="_blank" href="https://faculty.sdu.edu.cn/weiyinwei1/zh_CN/index.htm">Yinwei&#160;Wei</a><sup>1</sup>
24
+ <a target="_blank" href="https://faculty.sdu.edu.cn/huyupeng1/zh_CN/index.htm">Yupeng&#160;Hu</a><sup>1&#9993</sup>,
25
+ </div>
26
+ <sup>1</sup>School of Software, Shandong University &#160&#160&#160</span>
27
+ <br />
28
+ <sup>&#9993&#160;</sup>Corresponding author&#160;&#160;</span>
29
+ <br/>
30
+ <p>
31
+ <a href="https://2026.ieeeicassp.org/"><img src="https://img.shields.io/badge/ICASSP-2026-blue.svg?style=flat-square" alt="ICASSP 2026"></a>
32
+ <a href="https://arxiv.org/pdf/2603.26341v1"><img alt='Paper' src="https://img.shields.io/badge/Paper-ICASSP-green.svg"></a>
33
+ <a href="https://zh-mingyu.github.io/HINT.github.io"><img alt='page' src="https://img.shields.io/badge/Website-orange"></a>
34
+ <a href="https://github.com/iLearn-Lab/ICASSP26-HINT"><img alt='GitHub' src="https://img.shields.io/github/stars/iLearn-Lab/ICASSP26-HINT?style=social"></a>
35
+ </p>
36
+ </div>
37
+ This repository hosts the official pre-trained checkpoints for **HINT**, a novel framework designed to tackle the neglect of contextual information and the absence of discrepancy-amplification mechanisms in Composed Image Retrieval (CIR).
38
+
39
+ ---
40
+
41
+ ## πŸ“Œ Model Information
42
+
43
+ ### 1. Model Name
44
+ **HINT** (dual-patH composItional coNtextualized neTwork) Checkpoints.
45
+
46
+ ### 2. Task Type & Applicable Tasks
47
+ - **Task Type:** Composed Image Retrieval (CIR) / Vision-Language Retrieval.
48
+ - **Applicable Tasks:** Retrieving target images based on a reference image and a modification text.
49
+
50
+ ### 3. Project Introduction
51
+ Existing Composed Image Retrieval (CIR) methods often suffer from the neglect of contextual information in discriminating matching samples , struggling to understand complex modifications and implicit dependencies in real-world scenarios. HINT effectively addresses this through:
52
+
53
+ - 🧩 Dual Context Extraction (DCE): Extracts both intra-modal context and cross-modal context, enhancing joint semantic representation by integrating multimodal contextual information.
54
+
55
+ - πŸ“ Quantification of Contextual Relevance (QCR): Measures the relevance between cross-modal contextual information and the target image semantics, enabling the quantification of the implicit dependencies.
56
+
57
+ - βš–οΈ Dual-Path Consistency Constraints (DPCC): Optimizes the training process by constraining representation consistency, ensuring the stable enhancement of similarity for matching instances while lowering it for non-matching ones.
58
+
59
+ Based on the BLIP-2 architecture , HINT achieves State-of-the-Art (SOTA) retrieval performance across both open-domain and fashion-domain benchmarks.
60
+
61
+ ### 4. Training Data Source & Hosted Weights
62
+ The models were trained on the **FashionIQ** and **CIRR** datasets . This Hugging Face repository provides the corresponding `.pt` checkpoint files organized by dataset:
63
+
64
+
65
+ * `fashioniq.pt` (Trained on FashionIQ)
66
+
67
+ * `cirr.pt` (Trained on CIRR)
68
+
69
+ ---
70
+
71
+ ## πŸš€ Usage & Basic Inference
72
+
73
+ These weights are designed to be evaluated seamlessly using the official [HINT GitHub repository](https://github.com/iLearn-Lab/ICASSP26-HINT).
74
+
75
+ ### Step 1: Prepare the Environment
76
+ Clone the GitHub repository and install dependencies:
77
+ ```bash
78
+ git clone https://github.com/iLearn-Lab/ICASSP26-HINT
79
+ cd ICASSP26-HINT
80
+ conda create -n hint python=3.8 -y
81
+ conda activate hint
82
+ pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
83
+ pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
84
+ ```
85
+
86
+ ### Step 2: Download Model Weights
87
+ Download the specific `.pt` files you wish to evaluate from this Hugging Face repository. Place them into a `checkpoints/` directory within your cloned GitHub repo. For example, to evaluate the CIRR model:
88
+
89
+ ```text
90
+ ICASSP26-HINT/
91
+ └── checkpoints/
92
+ └── cirr.pt <-- (Rename to best_model.pt if required by your specific test script)
93
+ ```
94
+
95
+ ### Step 3: Run Testing / Evaluation
96
+ To generate prediction files on the CIRR dataset for the [CIRR Evaluation Server](https://cirr.cecs.anu.edu.au/), point the test script to the directory containing your downloaded checkpoint:
97
+
98
+ ```bash
99
+ python src/cirr_test_submission.py checkpoints/
100
+ ```
101
+ *(The script will automatically output `.json` files based on the checkpoint for online evaluation.)*
102
+
103
+ ---
104
+
105
+ ## ⚠️ Limitations & Notes
106
+
107
+ - **Hardware Requirements:** Because HINT is built upon the powerful BLIP-2 architecture, inference and further fine-tuning require GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G is recommended).
108
+ - **Intended Use:** These weights are provided for academic research and to facilitate reproducibility of the ICASSP 2026 paper.
109
+
110
+ ---
111
+
112
+ ## πŸ“β­οΈ Citation
113
+
114
+ If you find our work, code, or these model weights useful in your research, please consider leaving a **Star** ⭐️ on our GitHub repository and citing our paper:
115
+
116
+ ```bibtex
117
+ @inproceedings{HINT2026,
118
+ title={HINT: COMPOSED IMAGE RETRIEVAL WITH DUAL-PATH COMPOSITIONAL CONTEXTUALIZED NETWORK},
119
+ author={Zhang, Mingyu and Li, Zixu and Chen, Zhiwei and Fu, Zhiheng and Zhu, Xiaowei and Nie, Jiajia and Wei, Yinwei and Hu, Yupeng},
120
+ booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
121
+ year={2026}
122
+ }
123
+ ```