File size: 1,721 Bytes
d0adbeb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
language: en
tags:
- reranker
- RAG
- multimodal
- vision-language
- Qwen
license: cc-by-4.0
pipeline_tag: visual-document-retrieval
---
# DocReRank: Multi-Modal Reranker
This is the official model from the paper:
📄 **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)**
---
## ✅ Model Overview
- **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
- **Architecture:** Vision-Language reranker
- **Fine-tuning method:** PEFT (LoRA)
- **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline.
- **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios.
---
## ✅ How to Use
This adapter requires the base Qwen2-VL model.
```python
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
import torch
from PIL import Image
# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-2B-Instruct",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
# Load DocReRank adapter
model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval()
# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
# Example query and image
query = "What is the total revenue in the table?"
image = Image.open("sample_page.png")
inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=16)
print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))
|