--- license: other library_name: transformers pipeline_tag: image-text-to-text base_model: Qwen/Qwen3.5-0.8B datasets: - yuandaxia/FashionMV --- # ProCIR — Multi-View Product-Level Composed Image Retrieval [[Paper (arXiv)]](https://arxiv.org/abs/2604.10297) | [[Code (GitHub)]](https://github.com/yuandaxia2001/FashionMV) | [[Dataset]](https://huggingface.co/datasets/yuandaxia/FashionMV) ## Model Description **ProCIR** (0.8B) is a multi-view composed image retrieval model trained on the [FashionMV](https://huggingface.co/datasets/yuandaxia/FashionMV) dataset, based on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B). It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR. ## Performance | Dataset | R@5 | R@10 | |---------|-----|------| | DeepFashion | 89.2 | 94.9 | | Fashion200K | 77.6 | 86.6 | | FashionGen-val | 75.0 | 85.3 | | **Average** | **80.6** | **88.9** | ## Usage See our [GitHub repository](https://github.com/yuandaxia2001/FashionMV) for evaluation code and data preparation instructions. ```python from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR") model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16") ``` ## Citation ```bibtex @article{yuan2026fashionmv, title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data}, author={Yuan, Peng and Mei, Bingyin and Zhang, Hui}, year={2026} } ``` ## License Model weights are released under the same license as the base model ([Qwen3.5](https://huggingface.co/Qwen/Qwen3.5-0.8B)).