File size: 5,575 Bytes
c7edeec | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: apache-2.0
task_categories:
- image-retrieval
tags:
- composed-image-retrieval
- pytorch
- icassp-2025
---
<div align="center">
<h1>(ICASSP 2025) MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval</h1>
<div>
<a target="_blank" href="https://windlikeo.github.io/HQL.github.io/">Qinlei Huang</a><sup>1</sup>,
<a target="_blank" href="https://zivchen-ty.github.io">Zhiwei Chen</a><sup>1</sup>,
<a target="_blank" href="https://lee-zixu.github.io">Zixu Li</a><sup>1</sup>,
Chunxiao Wang<sup>2</sup>,
<a target="_blank" href="https://xuemengsong.github.io">Xuemeng Song</a><sup>3</sup>,
<a target="_blank" href="https://faculty.sdu.edu.cn/huyupeng1/zh_CN/index.htm">Yupeng Hu</a><sup>1✉</sup>,
<a target="_blank" href="https://liqiangnie.github.io/index.html">Liqiang Nie</a><sup>4</sup>
</div>
<sup>1</sup>School of Software, Shandong University<br>
<sup>2</sup>Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)<br>
<sup>3</sup>School of Computer Science and Technology, Shandong University<br>
<sup>4</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)<br>
<sup>✉</sup>Corresponding author
<br/>
<p>
<a href="https://ieeexplore.ieee.org/document/10890642"><img alt="Paper" src="https://img.shields.io/badge/Paper-IEEE-green.svg?style=flat-square"></a>
<a href="https://windlikeo.github.io/MEDIAN.github.io/"><img alt="Project Page" src="https://img.shields.io/badge/Website-orange"></a>
<a href="https://github.com/iLearn-Lab/ICASSP25-MEDIAN"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repository-black?style=flat-square&logo=github"></a>
</p>
</div>
This repository hosts the official pre-trained checkpoints for **MEDIAN**, a composed image retrieval framework that adaptively aggregates intermediate-grained features and performs target-guided semantic alignment to better compose reference images and modification texts.
---
## ๐ Model Information
### 1. Model Name
**MEDIAN** (Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval).
### 2. Task Type & Applicable Tasks
- **Task Type:** Composed Image Retrieval (CIR).
- **Applicable Tasks:** Retrieving a target image from a gallery based on a reference image together with a modification text.
### 3. Project Introduction
MEDIAN is designed to improve cross-modal composition in CIR by introducing adaptive intermediate-grained aggregation and target-guided semantic alignment. Instead of relying only on local and global granularity, it models **local-intermediate-global** feature composition to establish more precise correspondences between the reference image and the text query.
### 4. Training Data Source
According to the project README, MEDIAN is evaluated on three standard CIR datasets:
- **CIRR**
- **FashionIQ**
- **Shoes**
### 5. Hosted Weights
This repository currently includes the following checkpoint files:
- `CIRR.pth` โ MEDIAN checkpoint for CIRR
- `FashionIQ.pt` โ MEDIAN checkpoint for FashionIQ
- `Shoes.pt` โ MEDIAN checkpoint for Shoes
---
## ๐ Usage & Basic Inference
These checkpoints are intended to be used with the official [MEDIAN GitHub repository](https://github.com/iLearn-Lab/ICASSP25-MEDIAN).
### Step 1: Prepare the Environment
Set up the environment following the project README:
```bash
git clone https://github.com/iLearn-Lab/ICASSP25-MEDIAN
cd ICASSP25-MEDIAN
conda create -n pair python=3.8.10
conda activate pair
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```
### Step 2: Prepare Data and Weights
The original project README documents support for the following datasets:
- `CIRR`
- `FashionIQ`
- `Shoes`
Place the corresponding checkpoint file in your preferred checkpoint directory and provide the dataset paths when training or evaluating.
### Step 3: Training
The project README documents the following training command:
```bash
python3 train.py \
--model_dir ./checkpoints/MEDIAN \
--dataset {cirr,fashioniq,shoes} \
--cirr_path "" \
--fashioniq_path "" \
--shoes_path ""
```
### Step 4: Testing / Evaluation
For CIRR test submission generation, the documented command is:
```bash
python src/cirr_test_submission.py model_path
```
Example checkpoint path:
```text
model_path = /path/to/CIRR.pth
```
---
## โ ๏ธ Limitations & Notes
- These checkpoints are intended for **academic research** and for reproducing the MEDIAN results reported in the ICASSP 2025 paper.
- Dataset preparation is required before training or evaluation, and the supported datasets documented by the project are **CIRR**, **FashionIQ**, and **Shoes**.
- The usage commands above are adapted from the official project README. Please refer to the GitHub repository if you need the full training and evaluation workflow.
---
## ๐ Citation
If you find this work or these checkpoints useful in your research, please consider citing:
```bibtex
@inproceedings{MEDIAN,
title={MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval},
author={Huang, Qinlei and Chen, Zhiwei and Li, Zixu and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang},
booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing},
pages={1--5},
year={2025},
organization={IEEE}
}
```
|