(AAAI 2026) HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval (Model Weights)
1School of Software, Shandong Universityβ Corresponding author
This repository hosts the official pre-trained checkpoints for HABIT, a highly robust progressive learning framework designed to tackle the Noise Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).
π Model Information
1. Model Name
HABIT (cHrono-synergiA roBust progressIve learning framework for composed image reTrieval) Checkpoints.
2. Task Type & Applicable Tasks
- Task Type: Composed Image Retrieval (CIR) / Vision-Language Retrieval.
- Applicable Tasks: Retrieving target images based on a reference image and a modification text. These weights are specifically robust against noisy training data (Noise Triplet Correspondence).
3. Project Introduction
Existing Composed Image Retrieval (CIR) methods often suffer from the "Noise Triplet Correspondence (NTC)" problem in real-world scenarios, struggling to precisely estimate composed semantic discrepancies. HABIT effectively addresses this through:
- π§ Mutual Knowledge Estimation (MKE): Quantifies sample cleanliness by computing the transition rate of mutual knowledge.
- β³ Dual-consistency Progressive Learning (DPL): A collaborative mechanism between historical and current models to simulate human habit formation (retaining good habits, calibrating bad ones).
Based on the BLIP-2 architecture, HABIT maintains State-of-the-Art (SOTA) retrieval performance under various noise ratios.
4. Training Data Source & Hosted Weights
The models were trained on the FashionIQ and CIRR datasets under varying simulated noise ratios ($N \in {0.2, 0.5, 0.8}$). This Hugging Face repository provides the corresponding .pt checkpoint files organized by dataset:
- π
fiq/HABIT-FIQ_N0.2.pt(Trained on FashionIQ with 20% noise)HABIT-FIQ_N0.5.pt(Trained on FashionIQ with 50% noise)HABIT-FIQ_N0.8.pt(Trained on FashionIQ with 80% noise)
- π
cirr/HABIT-CIRR_N0.2.pt(Trained on CIRR with 20% noise)HABIT-CIRR_N0.5.pt(Trained on CIRR with 50% noise)HABIT-CIRR_N0.8.pt(Trained on CIRR with 80% noise)
π Usage & Basic Inference
These weights are designed to be evaluated seamlessly using the official HABIT GitHub repository.
Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
git clone https://github.com/iLearn-Lab/AAAI26-HABIT
cd HABIT
conda create -n habit python=3.8 -y
conda activate habit
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install open-clip-torch==2.24.0 scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
Step 2: Download Model Weights
Download the specific .pt files you wish to evaluate from this Hugging Face repository. Place them into a checkpoints/ directory within your cloned GitHub repo. For example, to evaluate the CIRR model trained with 50% noise:
HABIT/
βββ checkpoints/
βββ cirr_noise0.5/
βββ HABIT-CIRR_N0.5.pt <-- (Rename to best_model.pt if required by your specific test script)
Step 3: Run Testing / Evaluation
To generate prediction files on the CIRR dataset for the CIRR Evaluation Server, point the test script to the directory containing your downloaded checkpoint:
# Example for testing the CIRR 50% noise model
python src/cirr_test_submission.py checkpoints/cirr_noise0.5/
(The script will automatically output .json files based on the checkpoint for online evaluation.)
β οΈ Limitations & Notes
- Hardware Requirements: Because HABIT is built upon the powerful BLIP-2 architecture, inference and further fine-tuning require GPUs with sufficient memory (e.g., NVIDIA A40 48G / V100 32G is recommended).
- Intended Use: These weights are provided for academic research and to facilitate reproducibility of the AAAI 2026 paper.
πβοΈ Citation
If you find our work, code, or these model weights useful in your research, please consider leaving a Star βοΈ on our GitHub repository and citing our paper:
@inproceedings{HABIT,
title={HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Shiqi and Huang, Qinlei and Fu, Zhiheng and Wei, Yinwei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}