Update README.md

cecc32d verified 1 day ago

6.42 kB


	---
	license: apache-2.0
	task_categories:
	- image-retrieval
	- vision-language-navigation
	tags:
	- composed-image-retrieval
	- robust-learning
	- optimal-transport
	- blip-2
	- cvpr-2026
	---

	<a id="top"></a>
	<div align="center">
	<h1>(CVPR 2026) ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for CIR (Model Weights)</h1>
	<div>
	<a target="_blank" href="https://lee-zixu.github.io/">Zixu Li</a><sup>1</sup>,
	<a target="_blank" href="https://faculty.sdu.edu.cn/huyupeng1/zh_CN/index.htm">Yupeng Hu</a><sup>1&#9993</sup>,
	<a target="_blank" href="https://zivchen-ty.github.io/">Zhiwei Chen</a><sup>1</sup>,
	<a target="_blank" href="https://zh-mingyu.github.io/">Mingyu Zhang</a><sup>1</sup>,
	<a target="_blank" href="https://zhihfu.github.io/">Zhiheng Fu</a><sup>1</sup>,
	<a target="_blank" href="https://liqiangnie.github.io">Liqiang Nie</a><sup>2</sup>
	</div>
	<sup>1</sup>School of Software, Shandong University &#160&#160&#160</span> <br>
	<sup>2</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), &#160&#160&#160</span>
	<br />
	<sup>&#9993 </sup>Corresponding author  </span>
	<br/>
	<p>
	<a href="https://cvpr.thecvf.com/"><img src="https://img.shields.io/badge/CVPR-2026-blue.svg?style=flat-square" alt="CVPR 2026"></a>
	<a href="https://arxiv.org/abs/coming soon"><img alt='arXiv' src="https://img.shields.io/badge/arXiv-Coming.Soon-b31b1b.svg"></a>
	<a href="https://lee-zixu.github.io/ConeSep.github.io/"><img alt='Project Page' src="https://img.shields.io/badge/Website-orange"></a>
	<a href="https://github.com/Lee-zixu/ConeSep"><img alt='GitHub' src="https://img.shields.io/badge/GitHub-Repository-black?style=flat-square&logo=github"></a>
	</p>
	</div>

	This repository hosts the official pre-trained checkpoints for ConeSep, a robust noise-unlearning framework that leverages geometric boundary estimation and optimal transport to solve the Noisy Triplet Correspondence (NTC) problem in Composed Image Retrieval (CIR).

	---

	## 📌 Model Information

	### 1. Model Name
	ConeSep (Cone-based robust noisE-unlearning comPositional network) Checkpoints.

	### 2. Task Type & Applicable Tasks
	- Task Type: Composed Image Retrieval (CIR).
	- Applicable Tasks: Retrieving target images based on a reference image and a modification text. These weights provide unmatched robustness under varying degrees of noisy training data (Noise Triplet Correspondence).

	### 3. Project Introduction
	Existing Composed Image Retrieval methods struggle with the "Noisy Triplet Correspondence (NTC)" problem, leading to Modality Suppression, Negative Anchor Deficiency, and Unlearning Backlash. ConeSep actively perceives, structurally models, and precisely "unlearns" noise through three core modules:
	- 📐 Geometric Fidelity Quantization (GFQ): Estimates a noise boundary using cone space geometric separability to quantify sample fidelity.
	- 🛑 Negative Boundary Learning (NBL): Learns a "diagonal negative combination" for each query as an explicit semantic opposite-anchor.
	- 🎯 Boundary-based Targeted Unlearning (BTU): Models noisy correction as an Optimal Transport (OT) problem to execute precise unlearning without backlash on clean samples.

	### 4. Training Data Source & Hosted Weights
	The models were trained on the FashionIQ and CIRR datasets across different simulated noise ratios ($N \in \{0.2, 0.5, 0.8\}$). This Hugging Face repository provides the corresponding `.pt` checkpoint files organized by dataset and noise ratio:

	* 📂 `fashioniq/`
	* `ConeSep-FIQ_N0.2.pt` (Trained with 20% noise)
	* `ConeSep-FIQ_N0.5.pt` (Trained with 50% noise)
	* `ConeSep-FIQ_N0.8.pt` (Trained with 80% noise)
	* 📂 `cirr/`
	* `ConeSep-CIRR_N0.2.pt` (Trained with 20% noise)
	* `ConeSep-CIRR_N0.5.pt` (Trained with 50% noise)
	* `ConeSep-CIRR_N0.8.pt` (Trained with 80% noise)

	---

	## 🚀 Usage & Basic Inference

	These weights are designed to be evaluated out-of-the-box using the official [ConeSep GitHub repository](https://github.com/iLearn-Lab/CVPR26-ConeSep).

	### Step 1: Prepare the Environment
	Clone the GitHub repository and set up the environment:
	```bash
	git clone https://github.com/iLearn-Lab/CVPR26-ConeSep
	cd ConeSep
	conda create -n conesep python=3.8
	conda activate conesep
	pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
	pip install scikit-learn==1.3.2 transformers==4.25.0 salesforce-lavis==1.0.2 timm==0.9.16
	```

	### Step 2: Download Model Weights
	Download the specific `.pt` files you need from this Hugging Face repository and place them into a `checkpoints/` directory within your cloned repo. For example, to evaluate the CIRR model trained with 50% noise:

	```text
	ConeSep/
	└── checkpoints/
	└── cirr_noise0.5/
	└── best_model.pt <-- (Rename the downloaded ConeSep-CIRR_N0.5.pt to best_model.pt)
	```

	### Step 3: Run Testing / Evaluation
	To generate prediction files on the CIRR dataset for the [CIRR Evaluation Server](https://cirr.cecs.anu.edu.au/), run:

	```bash
	# Example for testing the CIRR 50% noise model
	python src/cirr_test_submission.py checkpoints/cirr_noise0.5/
	```
	(The script will automatically generate the required `.json` files based on the checkpoint for online evaluation.)

	---

	## ⚠️ Limitations & Notes

	- Hardware Requirements: ConeSep is built upon the BLIP-2 architecture. It is highly recommended to run inference and training on GPUs with sufficient memory (e.g., NVIDIA A40 48GB or V100 32GB).
	- Intended Use: These weights are intended for academic research, robustness evaluation, and reproducing the results reported in the CVPR 2026 paper.

	---

	## 📝⭐️ Citation

	If you find our framework, code, or these weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repository and citing our CVPR 2026 paper:

	```bibtex
	@InProceedings{ConeSep,
	title={ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval},
	author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Zhang, Mingyu and Fu, Zhiheng and Nie, Liqiang},
	booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
	year = {2026}
	}
	```