YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Explore More, Learn Better: Parallel VLM Embeddings with Mutual Information Regularization

arXiv

This repository contains the official implementation of PDF-VLM2Vec, an efficient training framework for Vision-Language Model (VLM) embedding models.

Table of Contents

Installation

Our code has been tested on Python 3.10 and PyTorch 2.6.0.

  1. Create a Conda environment:

    conda create -n pdf_vlm2vec python=3.10
    conda activate pdf_vlm2vec
    
  2. Install dependencies:

    pip install -r requirements.txt
    

Pre-trained Models

We provide our PDF-VLM2Vec models fine-tuned from Qwen2-VL. You can download them from the following links.

Data

Our training and evaluation data are from the MMEB benchmark. For more details, please refer to the original VLM2Vec repository.

Download the datasets and place them in your preferred data directory.

Training

All training scripts are located in the scripts/train/ directory.

To train the PDF-VLM2Vec-Qwen2VL-2B model, follow these steps:

  1. Modify the script: Open scripts/train/train_qwen2vl_2b.sh.
  2. Update paths: Change the data path and model saving path to your local directories.
  3. Run the script:
    source scripts/train/train_qwen2vl_2b.sh
    

Evaluation

Evaluation scripts are available in the scripts/eval/ directory.

To evaluate a trained model on the MMEB benchmark, for example the 2B model:

  1. Modify the script: Open scripts/eval/eval_qwen2vl_2b.sh and update the MODEL_PATH variable to point to your trained model checkpoint.
  2. Run the evaluation:
    source scripts/eval/eval_qwen2vl_2b.sh
    

Results

For a comprehensive analysis, please refer to our paper.

Citation

If you find our work useful for your research, please consider citing our paper:

@article{wang2025explore,
  title={Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization},
  author={Wang, Zhicheng and Ju, Chen and Chen, Xu and Xiao, Shuai and Lan, Jinsong and Zhu, Xiaoyong and Chen, Ying and Cao, Zhiguo},
  journal={arXiv preprint arXiv:2511.01588},
  year={2025}
}
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for BurgerCheng/PDF-VLM2Vec-Qwen2VL-7B