Instructions to use Qwen/QVQ-72B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/QVQ-72B-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/QVQ-72B-Preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Qwen/QVQ-72B-Preview")
model = AutoModelForMultimodalLM.from_pretrained("Qwen/QVQ-72B-Preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Qwen/QVQ-72B-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/QVQ-72B-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/QVQ-72B-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/QVQ-72B-Preview

SGLang

How to use Qwen/QVQ-72B-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/QVQ-72B-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/QVQ-72B-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/QVQ-72B-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/QVQ-72B-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/QVQ-72B-Preview with Docker Model Runner:
```
docker model run hf.co/Qwen/QVQ-72B-Preview
```

[ERROR] safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

by luckychao - opened Dec 25, 2024

Discussion

luckychao

Dec 25, 2024

Hi there :)

[root] Loading local model /mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-Preview
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46

Loading checkpoint shards:   0%|          | 0/38 [00:00<?, ?it/s]
Loading checkpoint shards:   3%|▎         | 1/38 [00:04<02:37,  4.26s/it]
Loading checkpoint shards:   3%|▎         | 1/38 [00:04<02:37,  4.26s/it]
Traceback (most recent call last):
...
  File "generate_response.py", line 57, in main
    model = qwen.Qwen_Model(args.model_path, temperature=args.temperature, max_tokens=args.max_tokens)
  File "qwen.py", line 58, in __init__
    self.model = Qwen2VLForConditionalGeneration.from_pretrained(self.model_path, torch_dtype=torch.bfloat16,
  File "/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained
    ) = cls._load_pretrained_model(
  File "lib/python3.9/site-packages/transformers/modeling_utils.py", line 4706, in _load_pretrained_model
    state_dict = load_state_dict(
  File "lib/python3.9/site-packages/transformers/modeling_utils.py", line 555, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

When using Transformers to load the QVQ-72B model, the error occurred. There is no problem with my code and environment loading the qwen2-vl-72b model.

Environment:

python 3.9
torch 2.4.0+cu118
transformers 4.46.1

Thanks for the reply!

jeffwadsworth

Dec 27, 2024

Just put your error into Claude. Try this:

Based on the error message, it seems you're encountering an issue while loading the QVQ-72B-Preview model, specifically a SafetensorError related to an incomplete metadata buffer. Let me help you troubleshoot this.

There are a few potential causes and solutions:

Incomplete or corrupted download:
- The error suggests that the safetensors file might be corrupted or incompletely downloaded
- Try removing the downloaded model files and re-downloading them
- Verify the checksums of the downloaded files if they're provided
Memory issues:
- Loading a 72B parameter model requires significant memory
- Try increasing your system's swap space
- Consider using model sharding or loading in 8-bit precision if available
Version compatibility:
- The warning about Qwen2VLRotaryEmbedding suggests you might be using an older version of the library
- Try updating your transformers library:
```
pip install --upgrade transformers
```
File permissions:
- Check if your process has proper read permissions for the model files
- Verify the ownership and permissions of the files in /mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-Preview

Could you try these steps and let me know:

Which version of transformers are you using?
Do you have enough system memory available?
Are all the model shard files present in the directory?

Anyway, see if this helps.

luckychao

Dec 30, 2024

•

edited Dec 30, 2024

Just put your error into Claude. Try this:

Based on the error message, it seems you're encountering an issue while loading the QVQ-72B-Preview model, specifically a SafetensorError related to an incomplete metadata buffer. Let me help you troubleshoot this.

There are a few potential causes and solutions:
Incomplete or corrupted download:

The error suggests that the safetensors file might be corrupted or incompletely downloaded

Try removing the downloaded model files and re-downloading them

Verify the checksums of the downloaded files if they're provided

Memory issues:

Loading a 72B parameter model requires significant memory

Try increasing your system's swap space

Consider using model sharding or loading in 8-bit precision if available
Version compatibility:

The warning about Qwen2VLRotaryEmbedding suggests you might be using an older version of the library

Try updating your transformers library:
pip install --upgrade transformers
File permissions:

Check if your process has proper read permissions for the model files

Verify the ownership and permissions of the files in /mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-Preview
Could you try these steps and let me know:

Which version of transformers are you using?

Do you have enough system memory available?

Are all the model shard files present in the directory?

Anyway, see if this helps.

Thanks for your reply!!
I verified the checksums of the downloaded files and there is no problem with them. The memory is also sufficient to support me to load this model, because qwen2-vl-72b can be loaded.
I suspect there is a problem with the transformers version. Has anyone been able to load it successfully? If it is convenient, can you provide the version of your transformers? Thanks a lot!!

luckychao changed discussion status to closed Jan 21, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment