--- license: apache-2.0 task_categories: - image-retrieval tags: - composed-image-retrieval - pytorch - icassp-2025 ---

(ICASSP 2025) MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval

Qinlei Huang¹, Zhiwei Chen¹, Zixu Li¹, Chunxiao Wang², Xuemeng Song³, Yupeng Hu^1✉, Liqiang Nie⁴

¹School of Software, Shandong University
²Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)
³School of Computer Science and Technology, Shandong University
⁴School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
^✉Corresponding author

This repository hosts the official pre-trained checkpoints for **MEDIAN**, a composed image retrieval framework that adaptively aggregates intermediate-grained features and performs target-guided semantic alignment to better compose reference images and modification texts. --- ## 📌 Model Information ### 1. Model Name **MEDIAN** (Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval). ### 2. Task Type & Applicable Tasks - **Task Type:** Composed Image Retrieval (CIR). - **Applicable Tasks:** Retrieving a target image from a gallery based on a reference image together with a modification text. ### 3. Project Introduction MEDIAN is designed to improve cross-modal composition in CIR by introducing adaptive intermediate-grained aggregation and target-guided semantic alignment. Instead of relying only on local and global granularity, it models **local-intermediate-global** feature composition to establish more precise correspondences between the reference image and the text query. ### 4. Training Data Source According to the project README, MEDIAN is evaluated on three standard CIR datasets: - **CIRR** - **FashionIQ** - **Shoes** ### 5. Hosted Weights This repository currently includes the following checkpoint files: - `CIRR.pth` — MEDIAN checkpoint for CIRR - `FashionIQ.pt` — MEDIAN checkpoint for FashionIQ - `Shoes.pt` — MEDIAN checkpoint for Shoes --- ## 🚀 Usage & Basic Inference These checkpoints are intended to be used with the official [MEDIAN GitHub repository](https://github.com/iLearn-Lab/ICASSP25-MEDIAN). ### Step 1: Prepare the Environment Set up the environment following the project README: ```bash git clone https://github.com/iLearn-Lab/ICASSP25-MEDIAN cd ICASSP25-MEDIAN conda create -n pair python=3.8.10 conda activate pair pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt ``` ### Step 2: Prepare Data and Weights The original project README documents support for the following datasets: - `CIRR` - `FashionIQ` - `Shoes` Place the corresponding checkpoint file in your preferred checkpoint directory and provide the dataset paths when training or evaluating. ### Step 3: Training The project README documents the following training command: ```bash python3 train.py \ --model_dir ./checkpoints/MEDIAN \ --dataset {cirr,fashioniq,shoes} \ --cirr_path "" \ --fashioniq_path "" \ --shoes_path "" ``` ### Step 4: Testing / Evaluation For CIRR test submission generation, the documented command is: ```bash python src/cirr_test_submission.py model_path ``` Example checkpoint path: ```text model_path = /path/to/CIRR.pth ``` --- ## ⚠️ Limitations & Notes - These checkpoints are intended for **academic research** and for reproducing the MEDIAN results reported in the ICASSP 2025 paper. - Dataset preparation is required before training or evaluation, and the supported datasets documented by the project are **CIRR**, **FashionIQ**, and **Shoes**. - The usage commands above are adapted from the official project README. Please refer to the GitHub repository if you need the full training and evaluation workflow. --- ## 📝 Citation If you find this work or these checkpoints useful in your research, please consider citing: ```bibtex @inproceedings{MEDIAN, title={MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval}, author={Huang, Qinlei and Chen, Zhiwei and Li, Zixu and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie, Liqiang}, booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing}, pages={1--5}, year={2025}, organization={IEEE} } ```