File size: 5,575 Bytes
c7edeec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: apache-2.0
task_categories:
- image-retrieval
tags:
- composed-image-retrieval
- pytorch
- icassp-2025
---

<div align="center">
  <h1>(ICASSP 2025) MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval</h1>
  <div>
    <a target="_blank" href="https://windlikeo.github.io/HQL.github.io/">Qinlei Huang</a><sup>1</sup>,
    <a target="_blank" href="https://zivchen-ty.github.io">Zhiwei Chen</a><sup>1</sup>,
    <a target="_blank" href="https://lee-zixu.github.io">Zixu Li</a><sup>1</sup>,
    Chunxiao Wang<sup>2</sup>,
    <a target="_blank" href="https://xuemengsong.github.io">Xuemeng Song</a><sup>3</sup>,
    <a target="_blank" href="https://faculty.sdu.edu.cn/huyupeng1/zh_CN/index.htm">Yupeng Hu</a><sup>1&#9993</sup>,
    <a target="_blank" href="https://liqiangnie.github.io/index.html">Liqiang Nie</a><sup>4</sup>
  </div>
  <sup>1</sup>School of Software, Shandong University<br>
  <sup>2</sup>Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)<br>
  <sup>3</sup>School of Computer Science and Technology, Shandong University<br>
  <sup>4</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)<br>
  <sup>&#9993;</sup>Corresponding author
  <br/>
  <p>
    <a href="https://ieeexplore.ieee.org/document/10890642"><img alt="Paper" src="https://img.shields.io/badge/Paper-IEEE-green.svg?style=flat-square"></a>
    <a href="https://windlikeo.github.io/MEDIAN.github.io/"><img alt="Project Page" src="https://img.shields.io/badge/Website-orange"></a>
    <a href="https://github.com/iLearn-Lab/ICASSP25-MEDIAN"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repository-black?style=flat-square&logo=github"></a>
  </p>
</div>

This repository hosts the official pre-trained checkpoints for **MEDIAN**, a composed image retrieval framework that adaptively aggregates intermediate-grained features and performs target-guided semantic alignment to better compose reference images and modification texts.

---

## ๐Ÿ“Œ Model Information

### 1. Model Name
**MEDIAN** (Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval).

### 2. Task Type & Applicable Tasks
- **Task Type:** Composed Image Retrieval (CIR).
- **Applicable Tasks:** Retrieving a target image from a gallery based on a reference image together with a modification text.

### 3. Project Introduction
MEDIAN is designed to improve cross-modal composition in CIR by introducing adaptive intermediate-grained aggregation and target-guided semantic alignment. Instead of relying only on local and global granularity, it models **local-intermediate-global** feature composition to establish more precise correspondences between the reference image and the text query.

### 4. Training Data Source
According to the project README, MEDIAN is evaluated on three standard CIR datasets:

- **CIRR**
- **FashionIQ**
- **Shoes**

### 5. Hosted Weights
This repository currently includes the following checkpoint files:

- `CIRR.pth` โ€” MEDIAN checkpoint for CIRR
- `FashionIQ.pt` โ€” MEDIAN checkpoint for FashionIQ
- `Shoes.pt` โ€” MEDIAN checkpoint for Shoes

---

## ๐Ÿš€ Usage & Basic Inference

These checkpoints are intended to be used with the official [MEDIAN GitHub repository](https://github.com/iLearn-Lab/ICASSP25-MEDIAN).

### Step 1: Prepare the Environment
Set up the environment following the project README:

```bash
git clone https://github.com/iLearn-Lab/ICASSP25-MEDIAN
cd ICASSP25-MEDIAN
conda create -n pair python=3.8.10
conda activate pair
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```

### Step 2: Prepare Data and Weights
The original project README documents support for the following datasets:

- `CIRR`
- `FashionIQ`
- `Shoes`

Place the corresponding checkpoint file in your preferred checkpoint directory and provide the dataset paths when training or evaluating.

### Step 3: Training
The project README documents the following training command:

```bash
python3 train.py \
  --model_dir ./checkpoints/MEDIAN \
  --dataset {cirr,fashioniq,shoes} \
  --cirr_path "" \
  --fashioniq_path "" \
  --shoes_path ""
```

### Step 4: Testing / Evaluation
For CIRR test submission generation, the documented command is:

```bash
python src/cirr_test_submission.py model_path
```

Example checkpoint path:

```text
model_path = /path/to/CIRR.pth
```

---

## โš ๏ธ Limitations & Notes

- These checkpoints are intended for **academic research** and for reproducing the MEDIAN results reported in the ICASSP 2025 paper.
- Dataset preparation is required before training or evaluation, and the supported datasets documented by the project are **CIRR**, **FashionIQ**, and **Shoes**.
- The usage commands above are adapted from the official project README. Please refer to the GitHub repository if you need the full training and evaluation workflow.

---

## ๐Ÿ“ Citation

If you find this work or these checkpoints useful in your research, please consider citing:

```bibtex
@inproceedings{MEDIAN,
  title={MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval},
  author={Huang, Qinlei and Chen, Zhiwei and Li, Zixu and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
  booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing},
  pages={1--5},
  year={2025},
  organization={IEEE}
}
```