(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval (Model Weights)

Zixu Li¹, Yupeng Hu^1✉, Zhiwei Chen¹, Shiqi Zhang¹, Qinlei Huang¹, Zhiheng Fu¹, Yinwei Wei¹

¹School of Software, Shandong University
^✉Corresponding author

This repository hosts the official pre-trained checkpoints for HABIT, a highly robust progressive learning framework designed to tackle the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).

📌 Model Information

1. Model Name

HABIT (cHrono-synergiA roBust progressIve learning framework for composed image reTrieval) Checkpoints.

2. Task Type & Applicable Tasks

Task Type: Composed Image Retrieval (CIR) / Vision-Language Retrieval.
Applicable Tasks: Retrieving target images based on a reference image and a modification text. These weights are specifically robust against noisy training data (Noise Triplet Correspondence).

3. Project Introduction

Existing Composed Image Retrieval (CIR) methods often suffer from the "Noise Triplet Correspondence (NTC)" problem in real-world scenarios, struggling to precisely estimate composed semantic discrepancies. HABIT effectively addresses this through:

🧠 Mutual Knowledge Estimation (MKE): Quantifies sample cleanliness by computing the transition rate of mutual knowledge.
⏳ Dual-consistency Progressive Learning (DPL): A collaborative mechanism between historical and current models to simulate human habit formation (retaining good habits, calibrating bad ones).

Based on the BLIP-2 architecture, HABIT maintains State-of-the-Art (SOTA) retrieval performance under various noise ratios.

4. Training Data Source & Hosted Weights

The models were trained on the FashionIQ and CIRR datasets under varying simulated noise ratios ($N \in {0.2, 0.5, 0.8}$). This Hugging Face repository provides the corresponding .pt checkpoint files organized by dataset:

📂 fiq/
- HABIT-FIQ_N0.2.pt (Trained on FashionIQ with 20% noise)
- HABIT-FIQ_N0.5.pt (Trained on FashionIQ with 50% noise)
- HABIT-FIQ_N0.8.pt (Trained on FashionIQ with 80% noise)
📂 cirr/
- HABIT-CIRR_N0.2.pt (Trained on CIRR with 20% noise)
- HABIT-CIRR_N0.5.pt (Trained on CIRR with 50% noise)
- HABIT-CIRR_N0.8.pt (Trained on CIRR with 80% noise)

🚀 Usage & Basic Inference

These weights are designed to be evaluated seamlessly using the official HABIT GitHub repository.

Step 1: Prepare the Environment

Clone the GitHub repository and install dependencies:

git clone https://github.com/iLearn-Lab/AAAI26-HABIT
cd HABIT
conda create -n habit python=3.8 -y
conda activate habit
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16

Step 2: Download Model Weights

Download the specific .pt files you wish to evaluate from this Hugging Face repository. Place them into a checkpoints/ directory within your cloned GitHub repo. For example, to evaluate the CIRR model trained with 50% noise:

HABIT/
└── checkpoints/
    └── cirr_noise0.5/
        └── HABIT-CIRR_N0.5.pt  <-- (Rename to best_model.pt if required by your specific test script)

Step 3: Run Testing / Evaluation

To generate prediction files on the CIRR dataset for the CIRR Evaluation Server, point the test script to the directory containing your downloaded checkpoint:

# Example for testing the CIRR 50% noise model
python src/cirr_test_submission.py checkpoints/cirr_noise0.5/

(The script will automatically output .json files based on the checkpoint for online evaluation.)

⚠️ Limitations & Notes

Hardware Requirements: Because HABIT is built upon the powerful BLIP-2 architecture, inference and further fine-tuning require GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G is recommended).
Intended Use: These weights are provided for academic research and to facilitate reproducibility of the AAAI 2026 paper.

📝⭐️ Citation

If you find our work, code, or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repository and citing our paper:

@inproceedings{HABIT,
  title={HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Shiqi and Huang, Qinlei and Fu, Zhiheng and Wei, Yinwei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support