ProCIR / README.md
yuandaxia's picture
Add metadata and update paper links (#1)
881a091
metadata
license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
  - yuandaxia/FashionMV

ProCIR — Multi-View Product-Level Composed Image Retrieval

[Paper (arXiv)] | [Code (GitHub)] | [Dataset]

Model Description

ProCIR (0.8B) is a multi-view composed image retrieval model trained on the FashionMV dataset, based on Qwen3.5-0.8B. It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

Performance

Dataset R@5 R@10
DeepFashion 89.2 94.9
Fashion200K 77.6 86.6
FashionGen-val 75.0 85.3
Average 80.6 88.9

Usage

See our GitHub repository for evaluation code and data preparation instructions.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")

Citation

@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}

License

Model weights are released under the same license as the base model (Qwen3.5).