Update README.md
Browse files
README.md
CHANGED
|
@@ -17,9 +17,7 @@ pipeline_tag: image-feature-extraction
|
|
| 17 |
|
| 18 |
\[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
|
| 19 |
|
| 20 |
-
We develop InternViT-
|
| 21 |
-
Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
|
| 22 |
-
1.5 version model.
|
| 23 |
|
| 24 |
## Model Details
|
| 25 |
- **Model Type:** vision foundation model, feature backbone
|
|
|
|
| 17 |
|
| 18 |
\[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
|
| 19 |
|
| 20 |
+
We develop InternViT-300M-448px based on the distillation of the strong foundation of [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). This update primarily focuses on the parameter count. The resolution of training images is expanded from 448×448 to dynamic 448×448, where the basic tile size is 448×448 and the number of tiles ranges from 1 to 12. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capability from InternViT-6B-448px-V1-5.
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Model Details
|
| 23 |
- **Model Type:** vision foundation model, feature backbone
|