DocReRank
/

DocReRank-Reranker

Visual Document Retrieval

vision-language

Model card Files Files and versions

DocReRank-Reranker / README.md

navvew's picture

Update README.md

d0adbeb verified 7 months ago

|

1.72 kB


	---
	language: en
	tags:
	- reranker
	- RAG
	- multimodal
	- vision-language
	- Qwen
	license: cc-by-4.0
	pipeline_tag: visual-document-retrieval
	---

	# DocReRank: Multi-Modal Reranker

	This is the official model from the paper:

	📄 [DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers](https://arxiv.org/abs/2505.22584)

	---

	## ✅ Model Overview
	- Base model: [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
	- Architecture: Vision-Language reranker
	- Fine-tuning method: PEFT (LoRA)
	- Training data: Generated by Single-Page Hard Negative Query Generation Pipeline.
	- Purpose: Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios.

	---

	## ✅ How to Use

	This adapter requires the base Qwen2-VL model.

	```python
	from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
	from peft import PeftModel
	import torch
	from PIL import Image

	# Load base model
	base_model = Qwen2VLForConditionalGeneration.from_pretrained(
	"Qwen/Qwen2-VL-2B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="cuda"
	)

	# Load DocReRank adapter
	model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval()

	# Load processor
	processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

	# Example query and image
	query = "What is the total revenue in the table?"
	image = Image.open("sample_page.png")

	inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=16)

	print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))