ProCIR / README.md
yuandaxia's picture
Add metadata and update paper links (#1)
881a091
---
license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
- yuandaxia/FashionMV
---
# ProCIR — Multi-View Product-Level Composed Image Retrieval
[[Paper (arXiv)]](https://arxiv.org/abs/2604.10297) | [[Code (GitHub)]](https://github.com/yuandaxia2001/FashionMV) | [[Dataset]](https://huggingface.co/datasets/yuandaxia/FashionMV)
## Model Description
**ProCIR** (0.8B) is a multi-view composed image retrieval model trained on the [FashionMV](https://huggingface.co/datasets/yuandaxia/FashionMV) dataset, based on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B). It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.
## Performance
| Dataset | R@5 | R@10 |
|---------|-----|------|
| DeepFashion | 89.2 | 94.9 |
| Fashion200K | 77.6 | 86.6 |
| FashionGen-val | 75.0 | 85.3 |
| **Average** | **80.6** | **88.9** |
## Usage
See our [GitHub repository](https://github.com/yuandaxia2001/FashionMV) for evaluation code and data preparation instructions.
```python
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")
```
## Citation
```bibtex
@article{yuan2026fashionmv,
title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
year={2026}
}
```
## License
Model weights are released under the same license as the base model ([Qwen3.5](https://huggingface.co/Qwen/Qwen3.5-0.8B)).