--- license: apache-2.0 task_categories: - image-retrieval tags: - composed-image-retrieval - pytorch - icassp-2025 ---

(ICASSP 2025) MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval

Qinlei Huang1, Zhiwei Chen1, Zixu Li1, Chunxiao Wang2, Xuemeng Song3, Yupeng Hu1✉, Liqiang Nie4
1School of Software, Shandong University
2Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)
3School of Computer Science and Technology, Shandong University
4School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Corresponding author

Paper Project Page GitHub

This repository hosts the official pre-trained checkpoints for **MEDIAN**, a composed image retrieval framework that adaptively aggregates intermediate-grained features and performs target-guided semantic alignment to better compose reference images and modification texts. --- ## 📌 Model Information ### 1. Model Name **MEDIAN** (Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval). ### 2. Task Type & Applicable Tasks - **Task Type:** Composed Image Retrieval (CIR). - **Applicable Tasks:** Retrieving a target image from a gallery based on a reference image together with a modification text. ### 3. Project Introduction MEDIAN is designed to improve cross-modal composition in CIR by introducing adaptive intermediate-grained aggregation and target-guided semantic alignment. Instead of relying only on local and global granularity, it models **local-intermediate-global** feature composition to establish more precise correspondences between the reference image and the text query. ### 4. Training Data Source According to the project README, MEDIAN is evaluated on three standard CIR datasets: - **CIRR** - **FashionIQ** - **Shoes** ### 5. Hosted Weights This repository currently includes the following checkpoint files: - `CIRR.pth` — MEDIAN checkpoint for CIRR - `FashionIQ.pt` — MEDIAN checkpoint for FashionIQ - `Shoes.pt` — MEDIAN checkpoint for Shoes --- ## 🚀 Usage & Basic Inference These checkpoints are intended to be used with the official [MEDIAN GitHub repository](https://github.com/iLearn-Lab/ICASSP25-MEDIAN). ### Step 1: Prepare the Environment Set up the environment following the project README: ```bash git clone https://github.com/iLearn-Lab/ICASSP25-MEDIAN cd ICASSP25-MEDIAN conda create -n pair python=3.8.10 conda activate pair pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt ``` ### Step 2: Prepare Data and Weights The original project README documents support for the following datasets: - `CIRR` - `FashionIQ` - `Shoes` Place the corresponding checkpoint file in your preferred checkpoint directory and provide the dataset paths when training or evaluating. ### Step 3: Training The project README documents the following training command: ```bash python3 train.py \ --model_dir ./checkpoints/MEDIAN \ --dataset {cirr,fashioniq,shoes} \ --cirr_path "" \ --fashioniq_path "" \ --shoes_path "" ``` ### Step 4: Testing / Evaluation For CIRR test submission generation, the documented command is: ```bash python src/cirr_test_submission.py model_path ``` Example checkpoint path: ```text model_path = /path/to/CIRR.pth ``` --- ## ⚠️ Limitations & Notes - These checkpoints are intended for **academic research** and for reproducing the MEDIAN results reported in the ICASSP 2025 paper. - Dataset preparation is required before training or evaluation, and the supported datasets documented by the project are **CIRR**, **FashionIQ**, and **Shoes**. - The usage commands above are adapted from the official project README. Please refer to the GitHub repository if you need the full training and evaluation workflow. --- ## 📝 Citation If you find this work or these checkpoints useful in your research, please consider citing: ```bibtex @inproceedings{MEDIAN, title={MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval}, author={Huang, Qinlei and Chen, Zhiwei and Li, Zixu and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang}, booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing}, pages={1--5}, year={2025}, organization={IEEE} } ```