Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,9 @@ pipeline_tag: image-feature-extraction
|
|
| 15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/re658pVjHaJEnJerlmRco.webp" alt="Image Description" width="300" height="300">
|
| 16 |
</p>
|
| 17 |
|
| 18 |
-
\[
|
|
|
|
|
|
|
| 19 |
|
| 20 |
We release our new InternViT weights as InternViT-6B-448px-V1-2. The continuous pre-training of the InternViT-6B model is involved in the [InternVL 1.2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) update. Specifically, we increased the resolution of InternViT-6B from 224 to 448 and integrated it with [Nous-Hermes-2-Yi-34B]((https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B).
|
| 21 |
To equip the model with high-resolution processing and OCR capabilities, both the vision encoder and the MLP were activated for training, utilizing a mix of image captioning and OCR-specific datasets.
|
|
@@ -83,6 +85,12 @@ If you find this project useful in your research, please consider citing:
|
|
| 83 |
journal={arXiv preprint arXiv:2312.14238},
|
| 84 |
year={2023}
|
| 85 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
```
|
| 87 |
|
| 88 |
|
|
|
|
| 15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/re658pVjHaJEnJerlmRco.webp" alt="Image Description" width="300" height="300">
|
| 16 |
</p>
|
| 17 |
|
| 18 |
+
[\[🆕 Blog\]](https://internvl.github.io/blog/) [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
|
| 19 |
+
|
| 20 |
+
[\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[🚀 Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/675877376)
|
| 21 |
|
| 22 |
We release our new InternViT weights as InternViT-6B-448px-V1-2. The continuous pre-training of the InternViT-6B model is involved in the [InternVL 1.2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) update. Specifically, we increased the resolution of InternViT-6B from 224 to 448 and integrated it with [Nous-Hermes-2-Yi-34B]((https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B).
|
| 23 |
To equip the model with high-resolution processing and OCR capabilities, both the vision encoder and the MLP were activated for training, utilizing a mix of image captioning and OCR-specific datasets.
|
|
|
|
| 85 |
journal={arXiv preprint arXiv:2312.14238},
|
| 86 |
year={2023}
|
| 87 |
}
|
| 88 |
+
@article{chen2024far,
|
| 89 |
+
title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
|
| 90 |
+
author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
|
| 91 |
+
journal={arXiv preprint arXiv:2404.16821},
|
| 92 |
+
year={2024}
|
| 93 |
+
}
|
| 94 |
```
|
| 95 |
|
| 96 |
|