| --- |
| license: apache-2.0 |
| tags: |
| - composed-image-retrieval |
| - vision-language |
| - multimodal |
| - noisy-correspondence |
| - blip-2 |
| - pytorch |
| --- |
| |
| <a id="top"></a> |
| <div align="center"> |
| <h1>βοΈ Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval</h1> |
|
|
| <p> |
| <b>Zhiheng Fu</b><sup>1</sup> |
| <b>Yupeng Hu</b><sup>1β</sup> |
| <b>Qianyun Yang</b><sup>1</sup> |
| <b>Shiqi Zhang</b><sup>1</sup> |
| <b>Zhiwei Chen</b><sup>1</sup> |
| <b>Zixu Li</b><sup>1</sup> |
| </p> |
| |
| <p> |
| <sup>1</sup>School of Software, Shandong University |
| </p> |
| </div> |
| |
| These are the official pre-trained model weights and configuration files for **Air-Know**, a robust framework designed for Composed Image Retrieval (CIR) under Noisy Correspondence Learning (NCL) settings. |
|
|
| π **Paper:** [Accepted by CVPR 2026] |
| π **GitHub Repository:** [ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know) |
| π **Project Website:** [Air-Know Webpage](https://zhihfu.github.io/Air-Know.github.io/) |
|
|
| --- |
|
|
| ## π Model Information |
|
|
| ### 1. Model Name |
| **Air-Know** (Arbiter-Calibrated Knowledge-Internalizing Robust Network) Checkpoints. |
|
|
| ### 2. Task Type & Applicable Tasks |
| - **Task Type:** Composed Image Retrieval (CIR) / Noisy Correspondence Learning / Vision-Language |
| - **Applicable Tasks:** Robust multimodal retrieval that effectively mitigates the impact of Noisy Triplet Correspondence (NTC) in training data, while still maintaining highly competitive performance in traditional fully-supervised (0% noise) environments. |
|
|
| ### 3. Project Introduction |
| **Air-Know** is built upon the BLIP-2/LAVIS framework and tackles the noisy correspondence problem in CIR through three primary modules: |
| - βοΈ **External Prior Arbitration:** Leverages an offline multimodal expert to generate reliable arbitration priors, bypassing the often-unreliable "small-loss hypothesis". |
| - π§ **Expert-Knowledge Internalization:** Transfers these priors into a lightweight proxy network to structurally prevent the memorization of ambiguous partial matches. |
| - π **Dual-Stream Reconciliation:** Dynamically integrates the internalized knowledge to provide robust online feedback, guiding the final representation learning. |
|
|
| ### 4. Training Data Source |
| The model was primarily trained and evaluated on standard CIR datasets under various simulated noise ratios (e.g., 0.0, 0.2, 0.5, 0.8): |
| - **FashionIQ** (Fashion Domain) |
| - **CIRR** (Open Domain) |
|
|
| --- |
|
|
| ## π Usage & Basic Inference |
|
|
| These weights are designed to be used directly with the official Air-Know GitHub repository. |
|
|
| ### Step 1: Prepare the Environment |
| Clone the GitHub repository and install dependencies (evaluated on Python 3.8.10 and PyTorch 2.1.0 with CUDA 12.1+): |
| ```bash |
| git clone [https://github.com/ZhihFu/Air-Know](https://github.com/ZhihFu/Air-Know) |
| cd Air-Know |
| conda create -n airknow python=3.8 -y |
| conda activate airknow |
| |
| # Install PyTorch |
| pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121) |
| |
| # Install core dependencies |
| pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16 |
| ``` |
|
|
| ### Step 2: Download Model Weights & Data |
| Download the checkpoint folders (e.g., `cirr_noise0.8` or `fashioniq_noise0.8`) from this Hugging Face repository and place them in your local `checkpoints/` directory. |
|
|
| Ensure you also download and structure the base dataset images (CIRR and FashionIQ) as specified in the [GitHub repo's Data Preparation section](https://github.com/ZhihFu/Air-Know). |
|
|
| ### Step 3: Run Testing / Inference |
| To generate prediction files on the CIRR dataset for submission to the CIRR Evaluation Server using the downloaded checkpoint, run: |
| ```bash |
| python src/cirr_test_submission.py checkpoints/cirr_noise0.8/ |
| ``` |
| *(The script will automatically output a `.json` file based on the best checkpoint in the specified folder).* |
|
|
| To train the model under specific noise ratios (e.g., `0.8`), you can run: |
| ```bash |
| python train_BLIP2.py \ |
| --dataset cirr \ |
| --cirr_path "/path/to/CIRR/" \ |
| --model_dir "./checkpoints/cirr_noise0.8" \ |
| --noise_ratio 0.8 \ |
| --batch_size 256 \ |
| --num_epochs 20 \ |
| --lr 2e-5 |
| ``` |
|
|
| --- |
|
|
| ## β οΈ Limitations & Notes |
|
|
| **Disclaimer:** This framework and its pre-trained weights are strictly intended for **academic research purposes**. |
| - The model requires access to the original source datasets (CIRR, FashionIQ) for full evaluation. Users must comply with the original licenses of those respective datasets. |
| - The `noise_ratio` parameter is a simulated interference during training; performance in wild, unstructured noisy environments may vary. |
|
|
| --- |
|
|
| ## πβοΈ Citation |
|
|
| If you find our work or these model weights useful in your research, please consider leaving a **Star** βοΈ on our GitHub repo and citing our paper: |
|
|
| ```bibtex |
| @InProceedings{Air-Know, |
| title={Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval}, |
| author={Fu, Zhiheng and Hu, Yupeng and Qianyun Yang and Shiqi Zhang and Chen, Zhiwei and Li, Zixu}, |
| booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, |
| year = {2026} |
| } |
| ``` |
|
|