Instructions to use hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3")
model = AutoModelForImageTextToText.from_pretrained("hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3

SGLang

How to use hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3 with Docker Model Runner:
```
docker model run hf.co/hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3
```

Qwen2.5-VL-7B-Instruct-GPTQ-Int3

This is an UNOFFICIAL GPTQ-Int3 quantized version of the Qwen2.5-VL model using gptqmodel library.

The model is compatible with the latest transformers library (which can run non-quantized Qwen2.5-VL models).

Performance

Model	Size (Disk)	ChartQA (test)	OCRBench
Qwen2.5-VL-3B-Instruct	7.1 GB	83.48	791
Qwen2.5-VL-3B-Instruct-AWQ	3.2 GB	82.52	786
Qwen2.5-VL-3B-Instruct-GPTQ-Int4	3.2 GB	82.56	784
Qwen2.5-VL-3B-Instruct-GPTQ-Int3	2.9 GB	76.68	742
Qwen2.5-VL-7B-Instruct	16.0 GB	83.2	846
Qwen2.5-VL-7B-Instruct-AWQ	6.5 GB	79.68	837
Qwen2.5-VL-7B-Instruct-GPTQ-Int4	6.5 GB	81.48	845
Qwen2.5-VL-7B-Instruct-GPTQ-Int3	5.8 GB	78.56	823

Note

Evaluations are performed using lmms-eval with default setting.
GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.
We recommend use gptqmodel instead of autogptq library, as autogptq is no longer maintained.

Quick Tour

Install the required libraries:

pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
pip install git+https://github.com/huggingface/optimum.git
pip install gptqmodel

Optionally, you may need to install:

pip install tokenicer device_smi logbar

Sample code:

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4", 
    attn_implementation="flash_attention_2",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": "https://raw.githubusercontent.com/ymcui/Chinese-LLaMA-Alpaca-3/refs/heads/main/pics/banner.png"},
        {"type": "text", "text": "请你描述一下这张图片。"},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text], images=image_inputs, videos=video_inputs,
    padding=True, return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text[0])

Response:

这张图片展示了一个中文和英文的标志，内容为“中文LLaMA & Alpaca大模型”和“Chinese LLaMA & Alpaca Large Language Models”。标志左侧有两个卡通形象，一个是红色围巾的羊驼，另一个是白色毛发的羊驼，背景是一个绿色的草地和一座红色屋顶的建筑。标志右侧有一个数字3，旁边有一些电路图案。整体设计简洁明了，使用了明亮的颜色和可爱的卡通形象来吸引注意力。

Disclaimer

This is NOT an official model by Qwen. Use at your own risk.
For detailed usage, please check Qwen2.5-VL's page.

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

I32

BF16

F16

Model tree for hfl/Qwen2.5-VL-7B-Instruct-GPTQ-Int3

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

(138)

this model