Update README.md

d0adbeb verified 7 months ago

1.72 kB

language: en
tags:
  - reranker
  - RAG
  - multimodal
  - vision-language
  - Qwen
license: cc-by-4.0
pipeline_tag: visual-document-retrieval

DocReRank: Multi-Modal Reranker

This is the official model from the paper:

📄 DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers

✅ Model Overview

Base model: Qwen/Qwen2-VL-2B-Instruct
Architecture: Vision-Language reranker
Fine-tuning method: PEFT (LoRA)
Training data: Generated by Single-Page Hard Negative Query Generation Pipeline.
Purpose: Improves second-stage reranking for Retrieval-Augmented Generation (RAG) in multimodal scenarios.

✅ How to Use

This adapter requires the base Qwen2-VL model.

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
import torch
from PIL import Image

# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)

# Load DocReRank adapter
model = PeftModel.from_pretrained(base_model, "DocReRank/DocReRank-Reranker").eval()

# Load processor
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

# Example query and image
query = "What is the total revenue in the table?"
image = Image.open("sample_page.png")

inputs = processor(text=query, images=image, return_tensors="pt").to("cuda", torch.bfloat16)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=16)

print(processor.tokenizer.decode(outputs[0], skip_special_tokens=True))