File size: 1,753 Bytes
881a091
 
 
 
 
 
 
 
 
727eae3
 
881a091
727eae3
 
 
da88eae
727eae3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
881a091
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
- yuandaxia/FashionMV
---

# ProCIR — Multi-View Product-Level Composed Image Retrieval

[[Paper (arXiv)]](https://arxiv.org/abs/2604.10297) | [[Code (GitHub)]](https://github.com/yuandaxia2001/FashionMV) | [[Dataset]](https://huggingface.co/datasets/yuandaxia/FashionMV)

## Model Description

**ProCIR** (0.8B) is a multi-view composed image retrieval model trained on the [FashionMV](https://huggingface.co/datasets/yuandaxia/FashionMV) dataset, based on [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B). It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

## Performance

| Dataset | R@5 | R@10 |
|---------|-----|------|
| DeepFashion | 89.2 | 94.9 |
| Fashion200K | 77.6 | 86.6 |
| FashionGen-val | 75.0 | 85.3 |
| **Average** | **80.6** | **88.9** |

## Usage

See our [GitHub repository](https://github.com/yuandaxia2001/FashionMV) for evaluation code and data preparation instructions.

```python
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")
```

## Citation

```bibtex
@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}
```

## License

Model weights are released under the same license as the base model ([Qwen3.5](https://huggingface.co/Qwen/Qwen3.5-0.8B)).