Update README.md

fd68db8 verified 2 days ago

5.29 kB

	---
	license: apache-2.0
	tags:
	- composed-image-retrieval
	- vision-language
	- multimodal
	- noisy-correspondence
	- blip-2
	- pytorch
	---

	<a id="top"></a>
	<div align="center">
	<h1>☁️ Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval</h1>

	<p>
	<b>Zhiheng Fu</b><sup>1</sup>
	<b>Yupeng Hu</b><sup>1✉</sup>
	<b>Qianyun Yang</b><sup>1</sup>
	<b>Shiqi Zhang</b><sup>1</sup>
	<b>Zhiwei Chen</b><sup>1</sup>
	<b>Zixu Li</b><sup>1</sup>
	</p>

	<p>
	<sup>1</sup>School of Software, Shandong University
	</p>
	</div>

	These are the official pre-trained model weights and configuration files for Air-Know, a robust framework designed for Composed Image Retrieval (CIR) under Noisy Correspondence Learning (NCL) settings.

	🔗 Paper: [Accepted by CVPR 2026]
	🔗 GitHub Repository: [ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know)
	🔗 Project Website: [Air-Know Webpage](https://zhihfu.github.io/Air-Know.github.io/)

	---

	## 📌 Model Information

	### 1. Model Name
	Air-Know (Arbiter-Calibrated Knowledge-Internalizing Robust Network) Checkpoints.

	### 2. Task Type & Applicable Tasks
	- Task Type: Composed Image Retrieval (CIR) / Noisy Correspondence Learning / Vision-Language
	- Applicable Tasks: Robust multimodal retrieval that effectively mitigates the impact of Noisy Triplet Correspondence (NTC) in training data, while still maintaining highly competitive performance in traditional fully-supervised (0% noise) environments.

	### 3. Project Introduction
	Air-Know is built upon the BLIP-2/LAVIS framework and tackles the noisy correspondence problem in CIR through three primary modules:
	- ⚖️ External Prior Arbitration: Leverages an offline multimodal expert to generate reliable arbitration priors, bypassing the often-unreliable "small-loss hypothesis".
	- 🧠 Expert-Knowledge Internalization: Transfers these priors into a lightweight proxy network to structurally prevent the memorization of ambiguous partial matches.
	- 🔄 Dual-Stream Reconciliation: Dynamically integrates the internalized knowledge to provide robust online feedback, guiding the final representation learning.

	### 4. Training Data Source
	The model was primarily trained and evaluated on standard CIR datasets under various simulated noise ratios (e.g., 0.0, 0.2, 0.5, 0.8):
	- FashionIQ (Fashion Domain)
	- CIRR (Open Domain)

	---

	## 🚀 Usage & Basic Inference

	These weights are designed to be used directly with the official Air-Know GitHub repository.

	### Step 1: Prepare the Environment
	Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.1.0 with CUDA 12.1+):
	```bash
	git clone [https://github.com/ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know)
	cd Air-Know
	conda create -n airknow python=3.8 -y
	conda activate airknow

	# Install PyTorch
	pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)

	# Install core dependencies
	pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
	```

	### Step 2: Download Model Weights & Data
	Download the checkpoint folders (e.g., `cirr_noise0.8` or `fashioniq_noise0.8`) from this Hugging Face repository and place them in your local `checkpoints/` directory.

	Ensure you also download and structure the base dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/Air-Know).

	### Step 3: Run Testing / Inference
	To generate prediction files on the CIRR dataset for submission to the CIRR Evaluation Server using the downloaded checkpoint, run:
	```bash
	python src/cirr_test_submission.py checkpoints/cirr_noise0.8/
	```
	(The script will automatically output a `.json` file based on the best checkpoint in the specified folder).

	To train the model under specific noise ratios (e.g., `0.8`), you can run:
	```bash
	python train_BLIP2.py \
	--dataset cirr \
	--cirr_path "/path/to/CIRR/" \
	--model_dir "./checkpoints/cirr_noise0.8" \
	--noise_ratio 0.8 \
	--batch_size 256 \
	--num_epochs 20 \
	--lr 2e-5
	```

	---

	## ⚠️ Limitations & Notes

	Disclaimer: This framework and its pre-trained weights are strictly intended for academic research purposes.
	- The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation. Users must comply with the original licenses of those respective datasets.
	- The `noise_ratio` parameter is a simulated interference during training; performance in wild, unstructured noisy environments may vary.

	---

	## 📝⭐️ Citation

	If you find our work or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repo and citing our paper:

	```bibtex
	@InProceedings{Air-Know,
	title={Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval},
	author={Fu, Zhiheng and Hu, Yupeng and Qianyun Yang and Shiqi Zhang and Chen, Zhiwei and Li, Zixu},
	booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
	year = {2026}
	}
	```