Instructions to use OpenGVLab/InternVL2-Llama3-76B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenGVLab/InternVL2-Llama3-76B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OpenGVLab/InternVL2-Llama3-76B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenGVLab/InternVL2-Llama3-76B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B

SGLang

How to use OpenGVLab/InternVL2-Llama3-76B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenGVLab/InternVL2-Llama3-76B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenGVLab/InternVL2-Llama3-76B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenGVLab/InternVL2-Llama3-76B with Docker Model Runner:
```
docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B
```

device error when using 76B.

by puar-playground - opened Aug 21, 2024

Discussion

puar-playground

Aug 21, 2024

•

edited Aug 21, 2024

I tried to use the 76B model for inference on single image. And I followed the instruction about the customized device_map function:

device_map = split_model('InternVL2-Llama3-76B')

Then, I got this error. The same code is ok for the 8B model. Not sure the reason for 76B.

model_answer, history = self.model.chat(self.tokenizer, pixel_values, question, generation_config,
File "/home/user/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL2-Llama3-76B/cf7914905f78e9e3560ddbd6f5dfc39becac494f/modeling_internvl_chat.py", line 282, in chat
generation_output = self.generate(
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL2-Llama3-76B/cf7914905f78e9e3560ddbd6f5dfc39becac494f/modeling_internvl_chat.py", line 332, in generate
outputs = self.language_model.generate(
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/opt/conda/envs/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
outputs = self.model(
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 977, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

czczup

OpenGVLab org Aug 22, 2024

Hello, thank you for your feedback. Could you tell me what version of your transformers is.

SantoshVasa

Aug 22, 2024

I was using 4.44.0 and was getting the same error. 4.37.2 downgrade solves the issue :)

czczup

OpenGVLab org Aug 24, 2024

ok, thanks for your feedback~

solankibhargav

Sep 2, 2024

I can confirm i can reproduce the error and the downgrade solves the problem

czczup

OpenGVLab org Sep 4, 2024

Hello, thank you for your feedback. There are indeed some compatibility issues when using version 4.44.

czczup changed discussion status to closed Sep 4, 2024

roTripathi

Sep 21, 2024

This Should be an open error. So future users can quickly find and correct for this

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment