metadata
license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
- yuandaxia/FashionMV
ProCIR — Multi-View Product-Level Composed Image Retrieval
[Paper (arXiv)] | [Code (GitHub)] | [Dataset]
Model Description
ProCIR (0.8B) is a multi-view composed image retrieval model trained on the FashionMV dataset, based on Qwen3.5-0.8B. It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.
Performance
| Dataset | R@5 | R@10 |
|---|---|---|
| DeepFashion | 89.2 | 94.9 |
| Fashion200K | 77.6 | 86.6 |
| FashionGen-val | 75.0 | 85.3 |
| Average | 80.6 | 88.9 |
Usage
See our GitHub repository for evaluation code and data preparation instructions.
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")
Citation
@article{yuan2026fashionmv,
title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
year={2026}
}
License
Model weights are released under the same license as the base model (Qwen3.5).