---
license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
- yuandaxia/FashionMV
---

# ProCIR — Multi-View Product-Level Composed Image Retrieval

[[Paper (arXiv)]](https://arxiv.org/abs/2604.10297) | [[Code (GitHub)]](https://github.com/yuandaxia2001/FashionMV) | [[Dataset]](https://huggingface.co/datasets/yuandaxia/FashionMV)

## Model Description

**ProCIR** (0.8B) is a multi-view composed image retrieval model trained on the [FashionMV](https://huggingface.co/datasets/yuandaxia/FashionMV) dataset, based on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B). It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

## Performance

| Dataset | R@5 | R@10 |
|---------|-----|------|
| DeepFashion | 89.2 | 94.9 |
| Fashion200K | 77.6 | 86.6 |
| FashionGen-val | 75.0 | 85.3 |
| **Average** | **80.6** | **88.9** |

## Usage

See our [GitHub repository](https://github.com/yuandaxia2001/FashionMV) for evaluation code and data preparation instructions.

```python
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")
```

## Citation

```bibtex
@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}
```

## License

Model weights are released under the same license as the base model ([Qwen3.5](https://huggingface.co/Qwen/Qwen3.5-0.8B)).