File size: 5,285 Bytes
fd68db8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | ---
license: apache-2.0
tags:
- composed-image-retrieval
- vision-language
- multimodal
- noisy-correspondence
- blip-2
- pytorch
---
<a id="top"></a>
<div align="center">
<h1>βοΈ Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval</h1>
<p>
<b>Zhiheng Fu</b><sup>1</sup>
<b>Yupeng Hu</b><sup>1β</sup>
<b>Qianyun Yang</b><sup>1</sup>
<b>Shiqi Zhang</b><sup>1</sup>
<b>Zhiwei Chen</b><sup>1</sup>
<b>Zixu Li</b><sup>1</sup>
</p>
<p>
<sup>1</sup>School of Software, Shandong University
</p>
</div>
These are the official pre-trained model weights and configuration files for **Air-Know**, a robust framework designed for Composed Image Retrieval (CIR) under Noisy Correspondence Learning (NCL) settings.
π **Paper:** [Accepted by CVPR 2026]
π **GitHub Repository:** [ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know)
π **Project Website:** [Air-Know Webpage](https://zhihfu.github.io/Air-Know.github.io/)
---
## π Model Information
### 1. Model Name
**Air-Know** (Arbiter-Calibrated Knowledge-Internalizing Robust Network) Checkpoints.
### 2. Task Type & Applicable Tasks
- **Task Type:** Composed Image Retrieval (CIR) / Noisy Correspondence Learning / Vision-Language
- **Applicable Tasks:** Robust multimodal retrieval that effectively mitigates the impact of Noisy Triplet Correspondence (NTC) in training data, while still maintaining highly competitive performance in traditional fully-supervised (0% noise) environments.
### 3. Project Introduction
**Air-Know** is built upon the BLIP-2/LAVIS framework and tackles the noisy correspondence problem in CIR through three primary modules:
- βοΈ **External Prior Arbitration:** Leverages an offline multimodal expert to generate reliable arbitration priors, bypassing the often-unreliable "small-loss hypothesis".
- π§ **Expert-Knowledge Internalization:** Transfers these priors into a lightweight proxy network to structurally prevent the memorization of ambiguous partial matches.
- π **Dual-Stream Reconciliation:** Dynamically integrates the internalized knowledge to provide robust online feedback, guiding the final representation learning.
### 4. Training Data Source
The model was primarily trained and evaluated on standard CIR datasets under various simulated noise ratios (e.g., 0.0, 0.2, 0.5, 0.8):
- **FashionIQ** (Fashion Domain)
- **CIRR** (Open Domain)
---
## π Usage & Basic Inference
These weights are designed to be used directly with the official Air-Know GitHub repository.
### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.1.0 with CUDA 12.1+):
```bash
git clone [https://github.com/ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know)
cd Air-Know
conda create -n airknow python=3.8 -y
conda activate airknow
# Install PyTorch
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# Install core dependencies
pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
```
### Step 2: Download Model Weights & Data
Download the checkpoint folders (e.g., `cirr_noise0.8` or `fashioniq_noise0.8`) from this Hugging Face repository and place them in your local `checkpoints/` directory.
Ensure you also download and structure the base dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/Air-Know).
### Step 3: Run Testing / Inference
To generate prediction files on the CIRR dataset for submission to the CIRR Evaluation Server using the downloaded checkpoint, run:
```bash
python src/cirr_test_submission.py checkpoints/cirr_noise0.8/
```
*(The script will automatically output a `.json` file based on the best checkpoint in the specified folder).*
To train the model under specific noise ratios (e.g., `0.8`), you can run:
```bash
python train_BLIP2.py \
--dataset cirr \
--cirr_path "/path/to/CIRR/" \
--model_dir "./checkpoints/cirr_noise0.8" \
--noise_ratio 0.8 \
--batch_size 256 \
--num_epochs 20 \
--lr 2e-5
```
---
## β οΈ Limitations & Notes
**Disclaimer:** This framework and its pre-trained weights are strictly intended for **academic research purposes**.
- The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation. Users must comply with the original licenses of those respective datasets.
- The `noise_ratio` parameter is a simulated interference during training; performance in wild, unstructured noisy environments may vary.
---
## πβοΈ Citation
If you find our work or these model weights useful in your research, please consider leaving a **Star** βοΈ on our GitHub repo and citing our paper:
```bibtex
@InProceedings{Air-Know,
title={Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval},
author={Fu, Zhiheng and Hu, Yupeng and Qianyun Yang and Shiqi Zhang and Chen, Zhiwei and Li, Zixu},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
year = {2026}
}
```
|