--- language: en tags: - reranker - RAG - multimodal - vision-language - Qwen license: cc-by-4.0 pipeline_tag: visual-document-retrieval --- # DocReRank: Multi-Modal Reranker This is the official model from the paper: 📄 **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)** --- ## ✅ Model Overview - **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) - **Architecture:** Vision-Language reranker - **Fine-tuning method:** PEFT (LoRA) - **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline. - **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios. --- ## ✅ How to Use This adapter requires the base Qwen2-VL model. ```python from transformers import AutoProcessor, Qwen2VLForConditionalGeneration from peft import PeftModel import torch from PIL import Image # Load base model base_model = Qwen2VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2-VL-2B-Instruct", torch_dtype=torch.bfloat16, device_map="cuda" ) # Load DocReRank adapter model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval() # Load processor processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") # Example query and image query = "What is the total revenue in the table?" image = Image.open("sample_page.png") inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=16) print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))