HaHaJun1101 commited on
Commit
6923278
·
verified ·
1 Parent(s): badf16a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - HaHaJun1101/OACIRR
5
+ base_model:
6
+ - Salesforce/blip2-itm-vit-g
7
+ - Salesforce/blip2-itm-vit-g-coco
8
+ library_name: pytorch
9
+ tags:
10
+ - composed-image-retrieval
11
+ - object-anchored
12
+ - image-retrieval
13
+ - vision-language
14
+ - multimodal
15
+ - cvpr2026
16
+ ---
17
+
18
+ # **🔍 Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval (CVPR 2026)**
19
+
20
+ [**📖 Paper (arXiv)**](https://arxiv.org) | [**🌐 Homepage**](https://hahajun1101.github.io/OACIR/) | [**🐙 Code (GitHub)**](https://github.com/HaHaJun1101/OACIR) | [**🤗 Dataset (OACIRR)**](https://huggingface.co/datasets/HaHaJun1101/OACIRR) | <a href="#downloading-the-adafocal-weights" style="color: red;">**🛜 Download Weights Now 👇**</a>
21
+
22
+ ---
23
+
24
+ ## 🔔 News
25
+ - **🔥 [2026-04-07]: The *AdaFocal* model checkpoints are officially released and are now available for use!**
26
+ - **🔥 [2026-04-03]: The full Training/Evaluation code are officially released on GitHub!**
27
+ - **🔥 [2026-03-25]: The OACIRR Benchmark is officially released on HuggingFace!**
28
+ - **🎉 [2026-02-21]: Our paper "Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval" has been accepted to CVPR 2026!**
29
+
30
+ ---
31
+
32
+ ## 🤖 Model Description
33
+
34
+ - **Architecture: ViT-G (EVA-CLIP) + BLIP-2 Q-Former + Context-Aware Attention Modulator (CAAM)**
35
+ - **Task: Fine-grained Composed Image Retrieval (CIR) with Instance-level Consistency**
36
+ - **Training Data: Exclusively trained on the [OACIRR Union Dataset](https://huggingface.co/datasets/HaHaJun1101/OACIRR)**
37
+
38
+ ---
39
+
40
+ ## ⚙️ AdaFocal Framework
41
+
42
+ To address the core challenges of the OACIR task, we propose **AdaFocal**, an effective framework that dynamically modulates visual attention for precise, instance-level retrieval. Our approach augments a multimodal fusion backbone with a lightweight **Context-Aware Attention Modulator (CAAM)**, enabling a nuanced balance between instance fidelity and compositional reasoning.
43
+
44
+ <p align="left">
45
+ <img
46
+ src="https://huggingface.co/datasets/HaHaJun1101/OACIRR/resolve/main/figures/AdaFocal_framework.png"
47
+ width="90%"
48
+ alt="AdaFocal Framework Overview"
49
+ />
50
+ </p>
51
+
52
+ Specifically, **AdaFocal** employs a two-stage reasoning process: *Contextual Perception* and *Adaptive Focus*. It first perceives the query's compositional context to predict a modulation scalar (β). This learned signal then drives an Attention Activation Mechanism, which explicitly and adaptively intensifies the visual focus on the user-specified instance region (provided via bounding box) during multimodal feature fusion.
53
+
54
+ By dynamically re-weighting the attention distribution, **AdaFocal** seamlessly synthesizes the anchored instance, the global visual scene, and the textual modification into a coherent representation, establishing a robust and flexible baseline for identity-preserving retrieval.
55
+
56
+ ---
57
+
58
+ ## 🚀 How to Use
59
+
60
+ <a name="downloading-the-adafocal-weights"></a>
61
+
62
+ ### 1. Download the AdaFocal Weights
63
+
64
+ You can download the checkpoints using Git LFS:
65
+ ```bash
66
+ cd OACIR
67
+ git lfs install
68
+ git clone https://huggingface.co/HaHaJun1101/AdaFocal ./checkpoints
69
+ ```
70
+
71
+ Alternatively, download them via the Hugging Face Python API:
72
+ ```python
73
+ from huggingface_hub import snapshot_download
74
+
75
+ snapshot_download(repo_id="HaHaJun1101/AdaFocal", local_dir="OACIR/checkpoints", repo_type="model")
76
+ ```
77
+
78
+ ### 2. Run Evaluation via Official Codebase
79
+
80
+ Once downloaded, you can directly evaluate the models using the `evaluate.sh` script provided in our GitHub codebase. Open `evaluate.sh` and set the path to your downloaded weights:
81
+ ```bash
82
+ # Inside evaluate.sh
83
+ DATASET="Fashion"
84
+ MODEL_NAME="oacir_adafocal"
85
+ MODEL_WEIGHT="./checkpoints/adafocal_scalar.pt" # or adafocal_vector.pt
86
+ ```
87
+ Then execute the script:
88
+ ```bash
89
+ bash evaluate.sh
90
+ ```
91
+
92
+ ---
93
+
94
+ ## 🏆 Model Performance on OACIRR
95
+
96
+ We provide two variants of the **AdaFocal** weights. You can instantly reproduce the following results using our provided `evaluate.sh` script.
97
+
98
+ | Model Variant | Component Type | R<sub>ID</sub>@1 (Avg) | R@1 (Avg) | R@5 (Avg) | Overall Avg | Weights File |
99
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
100
+ | **AdaFocal (Scalar β)** | Default Configuration | 81.52 | 63.08 | 90.98 | **78.53** | [📥 `adafocal_scalar.pt`](https://huggingface.co/HaHaJun1101/AdaFocal/resolve/main/adafocal_scalar.pt) |
101
+ | **AdaFocal (Vector β)** | Vector Ablation | 81.99 | 63.06 | 91.35 | **78.80** | [📥 `adafocal_vector.pt`](https://huggingface.co/HaHaJun1101/AdaFocal/resolve/main/adafocal_vector.pt) |
102
+
103
+ *Detailed breakdowns across the 4 domains:*
104
+
105
+ | Variant | <font color=#990000>Fashion</font> (R<sub>ID</sub>@1 / R@1) | <font color=#CC3300>Car</font> (R<sub>ID</sub>@1 / R@1) | <font color=#003399>Product</font> (R<sub>ID</sub>@1 / R@1) | <font color=#006633>Landmark</font> (R<sub>ID</sub>@1 / R@1) |
106
+ |:---|:---:|:---:|:---:|:---:|
107
+ | **Scalar β** | 73.68 / 64.45 | 78.39 / 54.85 | 91.36 / 73.85 | 82.65 / 59.18 |
108
+ | **Vector β** | 75.71 / 65.97 | 77.97 / 54.35 | 91.39 / 73.30 | 82.90 / 58.63 |
109
+
110
+ ---
111
+
112
+ ## ✒️ Citation
113
+
114
+ If you find our dataset, models, or code useful in your research, please consider citing our paper.