Instructions to use Qwen/Qwen2-VL-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen2-VL-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen2-VL-7B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen2-VL-7B-Instruct", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Qwen/Qwen2-VL-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen2-VL-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct

SGLang

How to use Qwen/Qwen2-VL-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen2-VL-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen2-VL-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen2-VL-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen2-VL-7B-Instruct
```

LoRA Finetuning Tool for Qwen2-VL-7B in Web UI (DPO updated)

by hiyouga - opened Aug 29, 2024

Discussion

hiyouga

Qwen org Aug 29, 2024

LLaMA Factory has integrated Qwen2-VL models for SFT and DPO, try our training recipes and webUI🚀

LoRA repices: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_sft.yaml

hiyouga changed discussion title from LoRA Finetuning Qwen2-VL-7B in Web UI to LoRA Finetuning Tool for Qwen2-VL-7B in Web UI Aug 29, 2024

YangJiassh

Aug 30, 2024

There are some issues in the https://github.com/hiyouga/LLaMA-Factory ; the full fine-tuning YAML file and related modifications have not been updated to the latest version.

hiyouga

Qwen org Aug 30, 2024

Hi @YangJiassh , we would be appreciated if you can provide some details

YangJiassh

Aug 30, 2024

https://github.com/hiyouga/LLaMA-Factory/commit/727e1848401d306274fb60ba78f66fed577b7b55

These modifications are removed in the latest version.

YangJiassh

Aug 30, 2024

examples/train_full/qwen2vl_full_sft.yaml

hiyouga

Qwen org Aug 30, 2024

•

edited Aug 30, 2024

@YangJiassh Thanks! You can try this recipe for full tuning:

### model
model_name_or_path: Qwen/Qwen2-VL-7B-Instruct

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: mllm_demo
template: qwen2_vl
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/qwen2_vl-7b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

YangJiassh

Aug 30, 2024

OK thanks I'll try it now

nicolollo

Aug 31, 2024

•

edited Aug 31, 2024

LLaMA Factory has integrated Qwen2-VL models for SFT and DPO, try our training recipes and webUI🚀

LoRA repices: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_sft.yaml

Damn you were fast XD, is there an example for DPO tho ? I don't see any

CED6688

Aug 31, 2024

Both this tool and the model are amazing! SFT with multimedia has never been so easy and it's interesting to see how well the image training translates to the video mode. 加油！

hiyouga

Qwen org Sep 1, 2024

@nicolollo Yeah! We have just supported Qwen2-VL DPO training, try this example: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_dpo.yaml

hiyouga changed discussion title from LoRA Finetuning Tool for Qwen2-VL-7B in Web UI to LoRA Finetuning Tool for Qwen2-VL-7B in Web UI (DPO updated) Sep 1, 2024

Respair

Sep 4, 2024

@hiyouga
Amazing stuff.
Can you kindly add a config for Qwen Audio 2 as well? both fft and lora. this model is essential for those of us in the Audio realm

Rewatiramans

Oct 8, 2024

Can i fine tune it on an L4 GPU?

hiyouga

Qwen org Oct 9, 2024

@Rewatiramans sure

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment