Instructions to use OpenGVLab/InternViT-300M-448px with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/InternViT-300M-448px with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="OpenGVLab/InternViT-300M-448px", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/InternViT-300M-448px", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -13,9 +13,9 @@ new_version: OpenGVLab/InternViT-300M-448px-V2_5
|
|
| 13 |
|
| 14 |
# InternViT-300M-448px
|
| 15 |
|
| 16 |
-
[\[π GitHub\]](https://github.com/OpenGVLab/InternVL) [\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5\]](https://arxiv.org/abs/2404.16821)
|
| 17 |
|
| 18 |
-
[\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start) [\[π
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
|
@@ -32,7 +32,10 @@ This update primarily focuses on enhancing the efficiency of the vision foundati
|
|
| 32 |
- **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
|
| 33 |
To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
|
| 34 |
|
| 35 |
-
##
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
```python
|
| 38 |
import torch
|
|
@@ -55,6 +58,10 @@ pixel_values = pixel_values.to(torch.bfloat16).cuda()
|
|
| 55 |
outputs = model(pixel_values)
|
| 56 |
```
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
## Citation
|
| 59 |
|
| 60 |
If you find this project useful in your research, please consider citing:
|
|
|
|
| 13 |
|
| 14 |
# InternViT-300M-448px
|
| 15 |
|
| 16 |
+
[\[π GitHub\]](https://github.com/OpenGVLab/InternVL) [\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[π InternVL 2.5\]](https://github.com/OpenGVLab/InternVL/blob/main/InternVL2_5_report.pdf)
|
| 17 |
|
| 18 |
+
[\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start) [\[π Documents\]](https://internvl.readthedocs.io/en/latest/)
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
|
|
|
| 32 |
- **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
|
| 33 |
To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
|
| 34 |
|
| 35 |
+
## Quick Start
|
| 36 |
+
|
| 37 |
+
> \[!Warning\]
|
| 38 |
+
> π¨ Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
|
| 39 |
|
| 40 |
```python
|
| 41 |
import torch
|
|
|
|
| 58 |
outputs = model(pixel_values)
|
| 59 |
```
|
| 60 |
|
| 61 |
+
## License
|
| 62 |
+
|
| 63 |
+
This project is released under the MIT License.
|
| 64 |
+
|
| 65 |
## Citation
|
| 66 |
|
| 67 |
If you find this project useful in your research, please consider citing:
|