| |
|
| | --- |
| | language: en |
| | tags: |
| | - reranker |
| | - RAG |
| | - multimodal |
| | - vision-language |
| | - Qwen |
| | license: cc-by-4.0 |
| | pipeline_tag: visual-document-retrieval |
| | --- |
| | |
| | # DocReRank: Multi-Modal Reranker |
| |
|
| | This is the official model from the paper: |
| |
|
| | ๐ **[DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)** |
| |
|
| | --- |
| |
|
| | ## โ
Model Overview |
| | - **Base model:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) |
| | - **Architecture:** Vision-Language reranker |
| | - **Fine-tuning method:** PEFT (LoRA) |
| | - **Training data:** Generated by **Single-Page Hard Negative Query Generation** Pipeline. |
| | - **Purpose:** Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios. |
| |
|
| | --- |
| |
|
| | ## โ
How to Use |
| |
|
| | This adapter requires the base Qwen2-VL model. |
| |
|
| | ```python |
| | from transformers import AutoProcessor, Qwen2VLForConditionalGeneration |
| | from peft import PeftModel |
| | import torch |
| | from PIL import Image |
| | |
| | # Load base model |
| | base_model = Qwen2VLForConditionalGeneration.from_pretrained( |
| | "Qwen/Qwen2-VL-2B-Instruct", |
| | torch_dtype=torch.bfloat16, |
| | device_map="cuda" |
| | ) |
| | |
| | # Load DocReRank adapter |
| | model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval() |
| | |
| | # Load processor |
| | processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") |
| | |
| | # Example query and image |
| | query = "What is the total revenue in the table?" |
| | image = Image.open("sample_page.png") |
| | |
| | inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16) |
| | |
| | with torch.no_grad(): |
| | outputs = model.generate(**inputs, max_new_tokens=16) |
| | |
| | print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |