OpenGVLab
/

InternViT-6B-448px-V1-5

@@ -11,11 +11,20 @@ pipeline_tag: image-feature-extraction
 ---
 # Model Card for InternViT-6B-448px-V1-5
-<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
 | Model                   | Date       | Download                                                               | Note                             |
 | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
 | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
@@ -24,6 +33,14 @@ pipeline_tag: image-feature-extraction
 | InternViT-6B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px)      | vision foundation model          |
 | InternVL-14B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px)      | vision-language foundation model |
 ## Model Details
 - **Model Type:** vision foundation model, feature backbone

 ---
 # Model Card for InternViT-6B-448px-V1-5
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
+</p>
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
+We develop InternViT-6B-448px-V1-5 by continuing the pre-training of the strong foundation of InternViT-6B-448px-V1.2. In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
+Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
+1.5 version model.
+## Released Models
+### Vision Foundation model
 | Model                   | Date       | Download                                                               | Note                             |
 | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
 | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
 | InternViT-6B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px)      | vision foundation model          |
 | InternVL-14B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px)      | vision-language foundation model |
+### Multimodal Large Language Model (MLLM)
+| Model                   | Date       | Download                                                                    | Note                               |
+| ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
+| InternVL-Chat-V1.5      | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)            | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
+| InternVL-Chat-V1.2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus)       | more SFT data and stronger  |
+| InternVL-Chat-V1.2      | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2)            | scaling up LLM to 34B       |
+| InternVL-Chat-V1.1      | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)            | support Chinese and stronger OCR   |
 ## Model Details
 - **Model Type:** vision foundation model, feature backbone