Instructions to use xxwu/MoLoRAG-QwenVL-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xxwu/MoLoRAG-QwenVL-3B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("xxwu/MoLoRAG-QwenVL-3B") model = AutoModelForImageTextToText.from_pretrained("xxwu/MoLoRAG-QwenVL-3B") - Notebooks
- Google Colab
- Kaggle
MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval
This repository contains the MoLoRAG model, a logic-aware retrieval framework for multi-modal, multi-page document understanding, as presented in the paper MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval.
MoLoRAG introduces a novel approach to Document Question Answering (DocQA) by constructing a page graph to capture contextual and logical relationships between pages. A lightweight VLM performs graph traversal to retrieve relevant pages, combining both semantic and logical relevance for more accurate retrieval. The top-K retrieved pages are then fed into arbitrary Large Vision-Language Models (LVLMs) for question answering. The framework offers both a training-free solution for easy deployment and a fine-tuned version for enhanced logical relevance checking.
For more details, please refer to the official GitHub repository.
- Downloads last month
- 28