Instructions to use Qwen/QVQ-72B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/QVQ-72B-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/QVQ-72B-Preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Qwen/QVQ-72B-Preview") model = AutoModelForImageTextToText.from_pretrained("Qwen/QVQ-72B-Preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Qwen/QVQ-72B-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/QVQ-72B-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/QVQ-72B-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/QVQ-72B-Preview
- SGLang
How to use Qwen/QVQ-72B-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/QVQ-72B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/QVQ-72B-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/QVQ-72B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/QVQ-72B-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/QVQ-72B-Preview with Docker Model Runner:
docker model run hf.co/Qwen/QVQ-72B-Preview
[ERROR] safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Hi there :)
[root] Loading local model /mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-Preview
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 0%| | 0/38 [00:00<?, ?it/s]
Loading checkpoint shards: 3%|β | 1/38 [00:04<02:37, 4.26s/it]
Loading checkpoint shards: 3%|β | 1/38 [00:04<02:37, 4.26s/it]
Traceback (most recent call last):
...
File "generate_response.py", line 57, in main
model = qwen.Qwen_Model(args.model_path, temperature=args.temperature, max_tokens=args.max_tokens)
File "qwen.py", line 58, in __init__
self.model = Qwen2VLForConditionalGeneration.from_pretrained(self.model_path, torch_dtype=torch.bfloat16,
File "/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained
) = cls._load_pretrained_model(
File "lib/python3.9/site-packages/transformers/modeling_utils.py", line 4706, in _load_pretrained_model
state_dict = load_state_dict(
File "lib/python3.9/site-packages/transformers/modeling_utils.py", line 555, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
When using Transformers to load the QVQ-72B model, the error occurred. There is no problem with my code and environment loading the qwen2-vl-72b model.
Environment:
python 3.9
torch 2.4.0+cu118
transformers 4.46.1
Thanks for the reply!
Just put your error into Claude. Try this:
Based on the error message, it seems you're encountering an issue while loading the QVQ-72B-Preview model, specifically a SafetensorError related to an incomplete metadata buffer. Let me help you troubleshoot this.
There are a few potential causes and solutions:
Incomplete or corrupted download:
- The error suggests that the safetensors file might be corrupted or incompletely downloaded
- Try removing the downloaded model files and re-downloading them
- Verify the checksums of the downloaded files if they're provided
Memory issues:
- Loading a 72B parameter model requires significant memory
- Try increasing your system's swap space
- Consider using model sharding or loading in 8-bit precision if available
Version compatibility:
- The warning about
Qwen2VLRotaryEmbeddingsuggests you might be using an older version of the library - Try updating your transformers library:
pip install --upgrade transformers- The warning about
File permissions:
- Check if your process has proper read permissions for the model files
- Verify the ownership and permissions of the files in
/mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-Preview
Could you try these steps and let me know:
- Which version of transformers are you using?
- Do you have enough system memory available?
- Are all the model shard files present in the directory?
Anyway, see if this helps.
Just put your error into Claude. Try this:
Based on the error message, it seems you're encountering an issue while loading the QVQ-72B-Preview model, specifically a
SafetensorErrorrelated to an incomplete metadata buffer. Let me help you troubleshoot this.There are a few potential causes and solutions:
Incomplete or corrupted download:
- The error suggests that the safetensors file might be corrupted or incompletely downloaded
- Try removing the downloaded model files and re-downloading them
- Verify the checksums of the downloaded files if they're provided
Memory issues:
- Loading a 72B parameter model requires significant memory
- Try increasing your system's swap space
- Consider using model sharding or loading in 8-bit precision if available
Version compatibility:
- The warning about
Qwen2VLRotaryEmbeddingsuggests you might be using an older version of the library- Try updating your transformers library:
pip install --upgrade transformersFile permissions:
- Check if your process has proper read permissions for the model files
- Verify the ownership and permissions of the files in
/mnt/petrelfs/share_data/quxiaoye/models/QVQ-72B-PreviewCould you try these steps and let me know:
- Which version of transformers are you using?
- Do you have enough system memory available?
- Are all the model shard files present in the directory?
Anyway, see if this helps.
Thanks for your reply!!
I verified the checksums of the downloaded files and there is no problem with them. The memory is also sufficient to support me to load this model, because qwen2-vl-72b can be loaded.
I suspect there is a problem with the transformers version. Has anyone been able to load it successfully? If it is convenient, can you provide the version of your transformers? Thanks a lot!!