ProCIR / README.md

Add metadata and update paper links (#1)

881a091 1 day ago

1.75 kB

license: other
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3.5-0.8B
datasets:
  - yuandaxia/FashionMV

ProCIR — Multi-View Product-Level Composed Image Retrieval

[Paper (arXiv)] | [Code (GitHub)] | [Dataset]

Model Description

ProCIR (0.8B) is a multi-view composed image retrieval model trained on the FashionMV dataset, based on Qwen3.5-0.8B. It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

Performance

Dataset	R@5	R@10
DeepFashion	89.2	94.9
Fashion200K	77.6	86.6
FashionGen-val	75.0	85.3
Average	80.6	88.9

Usage

See our GitHub repository for evaluation code and data preparation instructions.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")

Citation

@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}

License

Model weights are released under the same license as the base model (Qwen3.5).