File size: 5,226 Bytes
9ed01de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

<div align="center">

[![Paper](https://img.shields.io/badge/Paper-ICLR_2026-blue)](https://openreview.net/forum?id=yOEmEXmbV8)
[![HuggingFace](https://img.shields.io/badge/🤗_HuggingFace-Model-yellow)](https://huggingface.co/Johnny050407/QCQC/)

[Jianglin Lu](https://jianglin954.github.io/), [Simon Jenni](https://sjenni.github.io/), [Kushal Kafle](https://kushalkafle.com/), [Jing Shi](https://jshi31.github.io/jingshi/), [Handong Zhao](https://hdzhao.github.io/), [Yun Fu](https://www1.ece.neu.edu/~yunfu/)

</div>

---

## Why QCQC?

Text-to-image retrieval usually optimizes for **relevance** only. In practice you often care about **quality** too: more aesthetic photos, fewer blurry or low-IQA images, or a custom trade-off. We call this **Quality-Controllable Retrieval (QCR)**, a new setting where retrieval can be explicitly conditioned on user-defined quality requirements.

We propose **Quality-Conditioned Query Completion (QCQC)**, a query completion framework that leverages LLMs to enrich short queries with quality-aware descriptive details. Specify desired quality (e.g., aesthetic, relevance, image quality), and QCQC completes your query so retrieval returns results that match both meaning and quality.

- **Quality control** — Describe desired quality as the condition; no separate filters or post-hoc ranking.
- **Multi-dimensional quality** — Aesthetic, image quality (IQA), and relevance, composable in one framework (adapt to any quality definition).
- **Reproducible** — MS-COCO workflow, clear data pipeline, and training/inference scripts.

---

## Overview

We use **MS-COCO** and **GPT-2** as the running example: download data, build a search index, generate auxiliary quality scores (aesthetic, IQA, relevance), tokenize the data, train the QCQC model, and then run retrieval. The steps below walk through the full pipeline.

---

## Environment Installation

```bash
bash ./src/setup_envir.sh
conda activate QCQC
```

---

## Dataset Preparation

### Download MS-COCO dataset

```bash
python ./src/download_coco.py
unzip ./coco_data/train2017.zip -d ./coco_data/
unzip ./coco_data/annotations_trainval2017.zip -d ./coco_data/
```

### Build search index

```bash
CUDA_VISIBLE_DEVICES=0 python ./src/search_preparation.py
```

---

## Auxiliary Data Generation

Quality conditioning relies on precomputed scores. Follow the steps below for each type.

### Image Aesthetic Scores

Follow the setup in [improved-aesthetic-predictor](https://github.com/christophschuhmann/improved-aesthetic-predictor).

Install extra dependencies:

```bash
conda run -n QCQC pip install webdataset pytorch-lightning
```

Generate aesthetic scores:

```bash
CUDA_VISIBLE_DEVICES=0 python ./improved-aesthetic-predictor/simple_inference_coco.py
```

### IQA Scores

Follow the setup in [DeQA-Score](https://github.com/zhiyuanyou/DeQA-Score). Create a separate environment:

```bash
conda create -yn DeQA python=3.10
conda activate DeQA
cd DeQA-Score
pip install -e .
pip install pycocotools numpy==1.26.4 protobuf
```

Generate IQA scores:

```bash
CUDA_VISIBLE_DEVICES=0 python ./src/evaluate/scorer_coco.py
```

### Relevance Scores

Relevance scores are computed with CLIP. From the QCQC environment:

```bash
conda activate QCQC
CUDA_VISIBLE_DEVICES=0 python ./src/generate_relevance_scores.py
```

---

## Training & Testing

### 1. Data tokenization

```bash
CUDA_VISIBLE_DEVICES=0 python ./src/run_tokenize.py
```

### 2. Model training

Multi-GPU example (8 GPUs):

```bash
torchrun --nproc_per_node=8 --master_port=1221 ./src/train.py \
    --lr 2e-3 --warmup 100 --epochs 20 --bs 256 \
    --logstep 100 --evalstep 100 --savestep 100 \
    --project_name GPT2_COCO --run_name prompt_gpt2coco
```

### 3. Model testing

```bash
bash src/inference.sh 
```

### 4. Upload to Huggingface
```bash
cd ..
hf upload Johnny050407/QCQC QCQC
```
---


## Pretrained Checkpoints and Processed Data

Pretrained checkpoints and preprocessed auxiliary data for MS-COCO are publicly available on Hugging Face:

https://huggingface.co/Johnny050407/QCQC/


---


## Results

Qualitative examples of quality-conditioned retrieval:

|  |  |
|:---:|:---:|
| ![Results 1](./imgs/results1.png) | ![Results 2](./imgs/results2.png) |
| *Quality-conditioned retrieval examples (1)* | *Quality-conditioned retrieval examples (2)* |

---

## Citation

If you use this code or idea in your work, please cite:

```bibtex
@inproceedings{JianglinQCQC2026,
  title     = {Seeing Through Words: Controlling Visual Retrieval Quality with Language Models},
  author    = {Jianglin Lu and Simon Jenni and Kushal Kafle and Jing Shi and Handong Zhao and Yun Fu},
  booktitle = {The Fourteenth International Conference on Learning Representations (ICLR)},
  year      = {2026},
  url       = {https://openreview.net/forum?id=yOEmEXmbV8},
}
```

---

## Acknowledgement

We use the following open-source projects and thank the authors:

- [improved-aesthetic-predictor](https://github.com/christophschuhmann/improved-aesthetic-predictor) for aesthetic quality evaluation
- [DeQA-Score](https://github.com/zhiyuanyou/DeQA-Score) for IQA score prediction