Image-Text-to-Text
Transformers
Safetensors
multilingual
internvl_chat
feature-extraction
internvl
custom_code
conversational
Instructions to use OpenGVLab/InternVL2-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/InternVL2-2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2-2B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/InternVL2-2B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OpenGVLab/InternVL2-2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenGVLab/InternVL2-2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/OpenGVLab/InternVL2-2B
- SGLang
How to use OpenGVLab/InternVL2-2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use OpenGVLab/InternVL2-2B with Docker Model Runner:
docker model run hf.co/OpenGVLab/InternVL2-2B
Upload folder using huggingface_hub
Browse files- README.md +23 -7
- configuration_intern_vit.py +1 -1
- configuration_internvl_chat.py +1 -1
- modeling_intern_vit.py +1 -1
README.md
CHANGED
|
@@ -11,7 +11,7 @@ pipeline_tag: image-text-to-text
|
|
| 11 |
|
| 12 |
## Introduction
|
| 13 |
|
| 14 |
-
We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of **instruction-tuned models**, ranging from
|
| 15 |
|
| 16 |
Compared to the state-of-the-art open-source multimodal large language models, InternVL 2.0 surpasses most open-source models. It demonstrates competitive performance on par with proprietary commercial models across various capabilities, including document and chart comprehension, infographics QA, scene text understanding and OCR tasks, scientific and mathematical problem solving, as well as cultural understanding and integrated multimodal capabilities.
|
| 17 |
|
|
@@ -60,8 +60,8 @@ InternVL 2.0 is a multimodal large language model series, featuring models of va
|
|
| 60 |
| Model Size | 4B | 7B | 2.2B | 2.2B |
|
| 61 |
| | | | | |
|
| 62 |
| MVBench | 55.1 | 60.4 | 37.0 | 60.2 |
|
| 63 |
-
| Video-MME<br>wo subs | - | 42.3 |
|
| 64 |
-
| Video-MME<br>w/ subs | - | 54.6 |
|
| 65 |
|
| 66 |
- We evaluate our models on MVBench by extracting 16 frames from each video, and each frame was resized to a 448x448 image.
|
| 67 |
|
|
@@ -432,7 +432,7 @@ To deploy InternVL2 as an API, please configure the chat template config first.
|
|
| 432 |
}
|
| 433 |
```
|
| 434 |
|
| 435 |
-
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup
|
| 436 |
|
| 437 |
```shell
|
| 438 |
lmdeploy serve api_server OpenGVLab/InternVL2-2B --model-name InternVL2-2B --backend turbomind --server-port 23333 --chat-template chat_template.json
|
|
@@ -472,6 +472,14 @@ response = client.chat.completions.create(
|
|
| 472 |
print(response)
|
| 473 |
```
|
| 474 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 475 |
## License
|
| 476 |
|
| 477 |
This project is released under the MIT license, while InternLM is licensed under the Apache-2.0 license.
|
|
@@ -497,7 +505,7 @@ If you find this project useful in your research, please consider citing:
|
|
| 497 |
|
| 498 |
## 简介
|
| 499 |
|
| 500 |
-
我们很高兴宣布 InternVL 2.0 的发布,这是 InternVL 系列多模态大语言模型的最新版本。InternVL 2.0 提供了多种**指令微调**的模型,参数从
|
| 501 |
|
| 502 |
与最先进的开源多模态大语言模型相比,InternVL 2.0 超越了大多数开源模型。它在各种能力上表现出与闭源商业模型相媲美的竞争力,包括文档和图表理解、信息图表问答、场景文本理解和 OCR 任务、科学和数学问题解决,以及文化理解和综合多模态能力。
|
| 503 |
|
|
@@ -546,8 +554,8 @@ InternVL 2.0 是一个多模态大语言模型系列,包含各种规模的模
|
|
| 546 |
| 模型大小 | 4B | 7B | 2.2B | 2.2B |
|
| 547 |
| | | | | |
|
| 548 |
| MVBench | 55.1 | 60.4 | 37.0 | 60.2 |
|
| 549 |
-
| Video-MME<br>wo subs | - | 42.3 |
|
| 550 |
-
| Video-MME<br>w/ subs | - | 54.6 |
|
| 551 |
|
| 552 |
- 我们通过从每个视频中提取16帧来评估我们的模型在MVBench上的性能,每个视频帧被调整为448x448的图像。
|
| 553 |
|
|
@@ -719,6 +727,14 @@ response = client.chat.completions.create(
|
|
| 719 |
print(response)
|
| 720 |
```
|
| 721 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 722 |
## 开源许可证
|
| 723 |
|
| 724 |
该项目采用 MIT 许可证发布,而 InternLM 则采用 Apache-2.0 许可证。
|
|
|
|
| 11 |
|
| 12 |
## Introduction
|
| 13 |
|
| 14 |
+
We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of **instruction-tuned models**, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-2B model.
|
| 15 |
|
| 16 |
Compared to the state-of-the-art open-source multimodal large language models, InternVL 2.0 surpasses most open-source models. It demonstrates competitive performance on par with proprietary commercial models across various capabilities, including document and chart comprehension, infographics QA, scene text understanding and OCR tasks, scientific and mathematical problem solving, as well as cultural understanding and integrated multimodal capabilities.
|
| 17 |
|
|
|
|
| 60 |
| Model Size | 4B | 7B | 2.2B | 2.2B |
|
| 61 |
| | | | | |
|
| 62 |
| MVBench | 55.1 | 60.4 | 37.0 | 60.2 |
|
| 63 |
+
| Video-MME<br>wo subs | - | 42.3 | TODO | TODO |
|
| 64 |
+
| Video-MME<br>w/ subs | - | 54.6 | TODO | TODO |
|
| 65 |
|
| 66 |
- We evaluate our models on MVBench by extracting 16 frames from each video, and each frame was resized to a 448x448 image.
|
| 67 |
|
|
|
|
| 432 |
}
|
| 433 |
```
|
| 434 |
|
| 435 |
+
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
| 436 |
|
| 437 |
```shell
|
| 438 |
lmdeploy serve api_server OpenGVLab/InternVL2-2B --model-name InternVL2-2B --backend turbomind --server-port 23333 --chat-template chat_template.json
|
|
|
|
| 472 |
print(response)
|
| 473 |
```
|
| 474 |
|
| 475 |
+
### vLLM
|
| 476 |
+
|
| 477 |
+
TODO
|
| 478 |
+
|
| 479 |
+
### Ollama
|
| 480 |
+
|
| 481 |
+
TODO
|
| 482 |
+
|
| 483 |
## License
|
| 484 |
|
| 485 |
This project is released under the MIT license, while InternLM is licensed under the Apache-2.0 license.
|
|
|
|
| 505 |
|
| 506 |
## 简介
|
| 507 |
|
| 508 |
+
我们很高兴宣布 InternVL 2.0 的发布,这是 InternVL 系列多模态大语言模型的最新版本。InternVL 2.0 提供了多种**指令微调**的模型,参数从 10 亿到 1080 亿不等。此仓库包含经过指令微调的 InternVL2-2B 模型。
|
| 509 |
|
| 510 |
与最先进的开源多模态大语言模型相比,InternVL 2.0 超越了大多数开源模型。它在各种能力上表现出与闭源商业模型相媲美的竞争力,包括文档和图表理解、信息图表问答、场景文本理解和 OCR 任务、科学和数学问题解决,以及文化理解和综合多模态能力。
|
| 511 |
|
|
|
|
| 554 |
| 模型大小 | 4B | 7B | 2.2B | 2.2B |
|
| 555 |
| | | | | |
|
| 556 |
| MVBench | 55.1 | 60.4 | 37.0 | 60.2 |
|
| 557 |
+
| Video-MME<br>wo subs | - | 42.3 | TODO | TODO |
|
| 558 |
+
| Video-MME<br>w/ subs | - | 54.6 | TODO | TODO |
|
| 559 |
|
| 560 |
- 我们通过从每个视频中提取16帧来评估我们的模型在MVBench上的性能,每个视频帧被调整为448x448的图像。
|
| 561 |
|
|
|
|
| 727 |
print(response)
|
| 728 |
```
|
| 729 |
|
| 730 |
+
### vLLM
|
| 731 |
+
|
| 732 |
+
TODO
|
| 733 |
+
|
| 734 |
+
### Ollama
|
| 735 |
+
|
| 736 |
+
TODO
|
| 737 |
+
|
| 738 |
## 开源许可证
|
| 739 |
|
| 740 |
该项目采用 MIT 许可证发布,而 InternLM 则采用 Apache-2.0 许可证。
|
configuration_intern_vit.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
-
# Copyright (c)
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
import os
|
|
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
+
# Copyright (c) 2024 OpenGVLab
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
import os
|
configuration_internvl_chat.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
-
# Copyright (c)
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
|
|
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
+
# Copyright (c) 2024 OpenGVLab
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
|
modeling_intern_vit.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
-
# Copyright (c)
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
from typing import Optional, Tuple, Union
|
|
|
|
| 1 |
# --------------------------------------------------------
|
| 2 |
# InternVL
|
| 3 |
+
# Copyright (c) 2024 OpenGVLab
|
| 4 |
# Licensed under The MIT License [see LICENSE for details]
|
| 5 |
# --------------------------------------------------------
|
| 6 |
from typing import Optional, Tuple, Union
|