Instructions to use RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8")
model = AutoModelForImageTextToText.from_pretrained("RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8

SGLang

How to use RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 with Docker Model Runner:
```
docker model run hf.co/RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8
```

Qwen2.5-VL-72B-Instruct-quantized.w8a8

Commit History

Update files to match base model keeping quant_config intact

87784ea

shubhrapandit commited on May 1, 2025

Update files to match base model keeping quant_config intact

0d21073

shubhrapandit commited on May 1, 2025

Update preprocessor_config.json

152b8d1
verified

nm-research commited on Apr 3, 2025

Update README.md

96b8aa9
verified

shubhrapandit commited on Mar 31, 2025

Remove image_processor_type (#1)

157deec
verified

mgoin

pooya-davoodi-parasail commited on Mar 5, 2025

Update README.md

f005e30
verified

shubhrapandit commited on Mar 3, 2025

Update README.md

9d691ab
verified

shubhrapandit commited on Feb 26, 2025

Update README.md

efeae75
verified

shubhrapandit commited on Feb 26, 2025

Update README.md

aa8cbb4
verified

shubhrapandit commited on Feb 26, 2025

Update README.md

fd6fec7
verified

shubhrapandit commited on Feb 26, 2025

Update README.md

72343dc
verified

shubhrapandit commited on Feb 25, 2025

Update README.md

804f00c
verified

shubhrapandit commited on Feb 25, 2025

Update README.md

6356bf7
verified

shubhrapandit commited on Feb 25, 2025

Create README.md

0375935
verified

shubhrapandit commited on Feb 25, 2025

Upload model files

4699ffe

Shubhra Pandit commited on Feb 7, 2025

initial commit

b44fde6
verified

shubhrapandit commited on Feb 7, 2025

Commit History

Update files to match base model keeping quant_config intact 87784ea

Update files to match base model keeping quant_config intact 0d21073

Update preprocessor_config.json 152b8d1 verified

Update README.md 96b8aa9 verified

Remove image_processor_type (#1) 157deec verified

Update README.md f005e30 verified

Update README.md 9d691ab verified

Update README.md efeae75 verified

Update README.md aa8cbb4 verified

Update README.md fd6fec7 verified

Update README.md 72343dc verified

Update README.md 804f00c verified

Update README.md 6356bf7 verified

Create README.md 0375935 verified

Upload model files 4699ffe

initial commit b44fde6 verified

Update files to match base model keeping quant_config intact

87784ea

Update files to match base model keeping quant_config intact

0d21073

Update preprocessor_config.json

152b8d1
verified

Update README.md

96b8aa9
verified

Remove image_processor_type (#1)

157deec
verified

Update README.md

f005e30
verified

Update README.md

9d691ab
verified

Update README.md

efeae75
verified

Update README.md

aa8cbb4
verified

Update README.md

fd6fec7
verified

Update README.md

72343dc
verified

Update README.md

804f00c
verified

Update README.md

6356bf7
verified

Create README.md

0375935
verified

Upload model files

4699ffe

initial commit

b44fde6
verified