OpenGVLab
/

InternViT-300M-448px

Image Feature Extraction

feature-extraction

Model card Files Files and versions

czczup commited on Dec 8, 2024

Commit

64252c4

·

verified ·

1 Parent(s): 4037281

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -13,9 +13,9 @@ new_version: OpenGVLab/InternViT-300M-448px-V2_5
 # InternViT-300M-448px
-[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[📜 Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
-[\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
 <div align="center">
   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
@@ -32,7 +32,10 @@ This update primarily focuses on enhancing the efficiency of the vision foundati
 - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
   To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
-## Model Usage (Image Embeddings)
 ```python
 import torch
@@ -55,6 +58,10 @@ pixel_values = pixel_values.to(torch.bfloat16).cuda()
 outputs = model(pixel_values)
 ```
 ## Citation
 If you find this project useful in your research, please consider citing:

 # InternViT-300M-448px
+[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5\]](https://arxiv.org/abs/2404.16821)  [\[📜 InternVL 2.5\]](https://github.com/OpenGVLab/InternVL/blob/main/InternVL2_5_report.pdf)
+[\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
 <div align="center">
   <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
   To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
+## Quick Start
+> \[!Warning\]
+> 🚨 Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
 ```python
 import torch
 outputs = model(pixel_values)
 ```
+## License
+This project is released under the MIT License.
 ## Citation
 If you find this project useful in your research, please consider citing: