Instructions to use OpenGVLab/InternVL-Chat-V1-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenGVLab/InternVL-Chat-V1-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL-Chat-V1-2", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OpenGVLab/InternVL-Chat-V1-2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use OpenGVLab/InternVL-Chat-V1-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenGVLab/InternVL-Chat-V1-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL-Chat-V1-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenGVLab/InternVL-Chat-V1-2

SGLang

How to use OpenGVLab/InternVL-Chat-V1-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenGVLab/InternVL-Chat-V1-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL-Chat-V1-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenGVLab/InternVL-Chat-V1-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL-Chat-V1-2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenGVLab/InternVL-Chat-V1-2 with Docker Model Runner:
```
docker model run hf.co/OpenGVLab/InternVL-Chat-V1-2
```

czczup commited on Dec 6, 2024

Commit

f797792

verified ·

1 Parent(s): ca1856d

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +15 -4
configuration_internvl_chat.py +2 -2
modeling_intern_vit.py +1 -0

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ library_name: transformers
 base_model:
   - OpenGVLab/InternViT-6B-448px-V1-2
   - NousResearch/Nous-Hermes-2-Yi-34B
 base_model_relation: merge
 language:
   - multilingual
@@ -19,16 +20,20 @@ tags:
 # InternVL-Chat-V1-2
-[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
 [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
 ## Introduction
 We are excited to introduce [🤗 InternVL-Chat-V1-2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2). Inspired by [LLaVA-NeXT-34B](https://llava-vl.github.io/blog/2024-01-30-llava-next/), we have also adopted [Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) as the language model. Below is the pipeline.
 <p align="center">
-    <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/GIEKCvNc1Y5iMQqLv645p.png" style="width: 100%;">
 </p>
 From the experimental results, we've observed that **a stronger language model (34B) can better leverage the powerful capabilities of our vision foundation model.**
@@ -100,7 +105,7 @@ We provide an example code to run InternVL-Chat-V1-2 using `transformers`.
 We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
-> Please use transformers==4.37.2 to ensure the model works normally.
 ### Model Loading
@@ -455,7 +460,7 @@ print(f'User: {question}')
 print(f'Assistant: {response}')
 ```
-#### Streaming output
 Besides this method, you can also use the following code to get streamed output.
@@ -493,6 +498,12 @@ This project is released under the MIT license. Parts of this project contain co
 If you find this project useful in your research, please consider citing:
 ```BibTeX
 @article{chen2023internvl,
   title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
   author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},

 base_model:
   - OpenGVLab/InternViT-6B-448px-V1-2
   - NousResearch/Nous-Hermes-2-Yi-34B
+new_version: OpenGVLab/InternVL2_5-38B
 base_model_relation: merge
 language:
   - multilingual
 # InternVL-Chat-V1-2
+[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[📜 Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
 [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
+<div align="center">
+  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
+</div>
 ## Introduction
 We are excited to introduce [🤗 InternVL-Chat-V1-2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2). Inspired by [LLaVA-NeXT-34B](https://llava-vl.github.io/blog/2024-01-30-llava-next/), we have also adopted [Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) as the language model. Below is the pipeline.
 <p align="center">
+  <img width="600" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/GIEKCvNc1Y5iMQqLv645p.png">
 </p>
 From the experimental results, we've observed that **a stronger language model (34B) can better leverage the powerful capabilities of our vision foundation model.**
 We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
+> Please use transformers>=4.37.2 to ensure the model works normally.
 ### Model Loading
 print(f'Assistant: {response}')
 ```
+#### Streaming Output
 Besides this method, you can also use the following code to get streamed output.
 If you find this project useful in your research, please consider citing:
 ```BibTeX
+@article{gao2024mini,
+  title={Mini-internvl: A flexible-transfer pocket multimodal model with 5\% parameters and 90\% performance},
+  author={Gao, Zhangwei and Chen, Zhe and Cui, Erfei and Ren, Yiming and Wang, Weiyun and Zhu, Jinguo and Tian, Hao and Ye, Shenglong and He, Junjun and Zhu, Xizhou and others},
+  journal={arXiv preprint arXiv:2410.16261},
+  year={2024}
+}
 @article{chen2023internvl,
   title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
   author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},

configuration_internvl_chat.py CHANGED Viewed

@@ -38,11 +38,11 @@ class InternVLChatConfig(PretrainedConfig):
         super().__init__(**kwargs)
         if vision_config is None:
-            vision_config = {}
             logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
         if llm_config is None:
-            llm_config = {}
             logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
         self.vision_config = InternVisionConfig(**vision_config)

         super().__init__(**kwargs)
         if vision_config is None:
+            vision_config = {'architectures': ['InternVisionModel']}
             logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
         if llm_config is None:
+            llm_config = {'architectures': ['LlamaForCausalLM']}
             logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
         self.vision_config = InternVisionConfig(**vision_config)

modeling_intern_vit.py CHANGED Viewed

@@ -3,6 +3,7 @@
 # Copyright (c) 2024 OpenGVLab
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
 from typing import Optional, Tuple, Union
 import torch

 # Copyright (c) 2024 OpenGVLab
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
 from typing import Optional, Tuple, Union
 import torch