Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ pipeline_tag: feature-extraction
|
|
| 15 |
**VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
|
| 16 |
<p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
|
| 17 |
|
| 18 |
-
## VisRAG
|
| 19 |
|
| 20 |
### VisRAG-Ret
|
| 21 |
**VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
|
|
@@ -118,8 +118,4 @@ print(scores.tolist())
|
|
| 118 |
## Contact
|
| 119 |
|
| 120 |
- Shi Yu: yushi17@foxmail.com
|
| 121 |
-
- Chaoyue Tang: tcy006@gmail.com
|
| 122 |
-
|
| 123 |
-
## Citation
|
| 124 |
-
|
| 125 |
-
If you use any datasets or models from this organization in your research, please cite the original dataset as follows:
|
|
|
|
| 15 |
**VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
|
| 16 |
<p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
|
| 17 |
|
| 18 |
+
## VisRAG Pipeline
|
| 19 |
|
| 20 |
### VisRAG-Ret
|
| 21 |
**VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
|
|
|
|
| 118 |
## Contact
|
| 119 |
|
| 120 |
- Shi Yu: yushi17@foxmail.com
|
| 121 |
+
- Chaoyue Tang: tcy006@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|