This model was converted to OpenVINO from Qwen/Qwen2-VL-2B-Instruct using optimum-intel via the export space.

Install packages:

pip install optimum[openvino] transformers pillow torch

Sample code to analyze a local image file:

from optimum.intel import OVModelForVisualCausalLM
from transformers import AutoProcessor
from PIL import Image

MODEL_ID = "TheAverageDetective/Qwen2-VL-2B-Instruct-openvino"
image_path = "test.png"

# Load model and processor
model = OVModelForVisualCausalLM.from_pretrained(MODEL_ID, device="GPU")
processor = AutoProcessor.from_pretrained(MODEL_ID)

# Load image
image = Image.open(image_path).convert("RGB")

# Prepare messages
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

# Process and generate
prompt_text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[prompt_text], images=[image], return_tensors="pt")

output_ids = model.generate(**inputs, max_new_tokens=150)
result = processor.batch_decode(output_ids, skip_special_tokens=True)[0]

print("\n",result.split("assistant\n")[-1].strip())

Works on Intel Iris iGPU with 80EU and 16GB system RAM.

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheAverageDetective/Qwen2-VL-2B-Instruct-openvino

Base model

Qwen/Qwen2-VL-2B
Finetuned
(335)
this model