OpenGVLab
/

InternViT-6B-224px

Image Feature Extraction

feature-extraction

Model card Files Files and versions

czczup commited on Feb 23, 2024

Commit

31da47b

·

verified ·

1 Parent(s): ab5bff1

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -27,6 +27,8 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
   - Params (M): 5903
   - Image size: 224 x 224
 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
 ## Linear Probing Performance

   - Params (M): 5903
   - Image size: 224 x 224
 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
+- **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, **please set mm_vision_select_layer=-4 when using this model to build VLLM.**
 ## Linear Probing Performance