BurgerCheng commited on
Commit
3ef4e50
·
verified ·
1 Parent(s): e94415b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Explore More, Learn Better: Parallel VLM Embeddings with Mutual Information Regularization
2
+
3
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.01588-b31b1b.svg)](https://arxiv.org/abs/2511.01588)
4
+
5
+ This repository contains the official implementation of **PDF-VLM2Vec**, an efficient training framework for Vision-Language Model (VLM) embedding models.
6
+
7
+ ## Table of Contents
8
+ - [Installation](#installation)
9
+ - [Pre-trained Models](#pre-trained-models)
10
+ - [Data](#data)
11
+ - [Training](#training)
12
+ - [Evaluation](#evaluation)
13
+ - [Results](#results)
14
+ - [Citation](#citation)
15
+
16
+ ## Installation
17
+
18
+ Our code has been tested on Python 3.10 and PyTorch 2.6.0.
19
+
20
+ 1. **Create a Conda environment:**
21
+ ```bash
22
+ conda create -n pdf_vlm2vec python=3.10
23
+ conda activate pdf_vlm2vec
24
+ ```
25
+
26
+ 2. **Install dependencies:**
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ ## Pre-trained Models
32
+
33
+ We provide our PDF-VLM2Vec models fine-tuned from Qwen2-VL. You can download them from the following links.
34
+
35
+ - [**PDF-VLM2Vec-Qwen2VL-2B**](https://huggingface.co/BurgerCheng/PDF-VLM2Vec-Qwen2VL-2B)
36
+ - [**PDF-VLM2Vec-Qwen2VL-7B**](https://huggingface.co/BurgerCheng/PDF-VLM2Vec-Qwen2VL-7B)
37
+
38
+ ## Data
39
+
40
+ Our training and evaluation data are from the **MMEB** benchmark. For more details, please refer to the original [VLM2Vec repository](https://github.com/TIGER-AI-Lab/VLM2Vec).
41
+
42
+ - **Training Data**: [Hugging Face Datasets](https://huggingface.co/datasets/TIGER-Lab/MMEB-train)
43
+ - **Evaluation Data**: [Hugging Face Datasets](https://huggingface.co/datasets/TIGER-Lab/MMEB-eval)
44
+
45
+ Download the datasets and place them in your preferred data directory.
46
+
47
+ ## Training
48
+
49
+ All training scripts are located in the `scripts/train/` directory.
50
+
51
+ To train the **PDF-VLM2Vec-Qwen2VL-2B** model, follow these steps:
52
+
53
+ 1. **Modify the script:** Open `scripts/train/train_qwen2vl_2b.sh`.
54
+ 2. **Update paths:** Change the data path and model saving path to your local directories.
55
+ 3. **Run the script:**
56
+ ```bash
57
+ source scripts/train/train_qwen2vl_2b.sh
58
+ ```
59
+
60
+ ## Evaluation
61
+
62
+ Evaluation scripts are available in the `scripts/eval/` directory.
63
+
64
+ To evaluate a trained model on the MMEB benchmark, for example the 2B model:
65
+
66
+ 1. **Modify the script:** Open `scripts/eval/eval_qwen2vl_2b.sh` and update the `MODEL_PATH` variable to point to your trained model checkpoint.
67
+ 2. **Run the evaluation:**
68
+ ```bash
69
+ source scripts/eval/eval_qwen2vl_2b.sh
70
+ ```
71
+
72
+ ## Results
73
+
74
+ For a comprehensive analysis, please refer to our [paper](https://arxiv.org/abs/2511.01588).
75
+
76
+ ## Citation
77
+
78
+ If you find our work useful for your research, please consider citing our paper:
79
+
80
+ ```bibtex
81
+ @article{wang2025explore,
82
+ title={Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization},
83
+ author={Wang, Zhicheng and Ju, Chen and Chen, Xu and Xiao, Shuai and Lan, Jinsong and Zhu, Xiaoyong and Chen, Ying and Cao, Zhiguo},
84
+ journal={arXiv preprint arXiv:2511.01588},
85
+ year={2025}
86
+ }
87
+ ```