Update README.md
Browse files
README.md
CHANGED
|
@@ -5,11 +5,15 @@ tags: []
|
|
| 5 |
|
| 6 |
# [E5-V: Universal Embeddings with Multimodal Large Language Models](https://arxiv.org/abs/2407.12580)
|
| 7 |
|
|
|
|
|
|
|
| 8 |
## Overview
|
| 9 |
We propose a framework, called E5-V, to adpat MLLMs for achieving multimodal embeddings. E5-V effectively bridges the modality gap between different types of inputs, demonstrating strong performance in multimodal embeddings even without fine-tuning. We also propose a single modality training approach for E5-V, where the model is trained exclusively on text pairs, demonstrating better performance than multimodal training.
|
| 10 |
|
| 11 |
More details can be found in https://github.com/kongds/E5-V
|
| 12 |
|
|
|
|
|
|
|
| 13 |
## Example
|
| 14 |
``` python
|
| 15 |
import torch
|
|
|
|
| 5 |
|
| 6 |
# [E5-V: Universal Embeddings with Multimodal Large Language Models](https://arxiv.org/abs/2407.12580)
|
| 7 |
|
| 8 |
+
E5-V is fine-tuned based on lmms-lab/llama3-llava-next-8b.
|
| 9 |
+
|
| 10 |
## Overview
|
| 11 |
We propose a framework, called E5-V, to adpat MLLMs for achieving multimodal embeddings. E5-V effectively bridges the modality gap between different types of inputs, demonstrating strong performance in multimodal embeddings even without fine-tuning. We also propose a single modality training approach for E5-V, where the model is trained exclusively on text pairs, demonstrating better performance than multimodal training.
|
| 12 |
|
| 13 |
More details can be found in https://github.com/kongds/E5-V
|
| 14 |
|
| 15 |
+
|
| 16 |
+
|
| 17 |
## Example
|
| 18 |
``` python
|
| 19 |
import torch
|