Qwen3-VL-2B-Instruct — OpenVINO NF4 (Intel NPU)
Qwen/Qwen3-VL-2B-Instruct exported to OpenVINO IR with NF4 channel-wise weight compression for Intel NPU inference.
⚠️ NF4 requires Intel® Core™ Ultra Series 2 (Lunar Lake) NPU or newer.
For older NPU hardware use an INT4 channel-wise model instead.
Quantization details
| Property | Value |
|---|---|
| Weight format | nf4 |
| Quantization mode | channel-wise (--group-size -1) |
| Symmetry | symmetric (--sym) |
| 4-bit ratio | 1.0 (100 % of eligible layers) |
| Tool | optimum-intel + NNCF |
NNCF bitwidth distribution
| Component | Mode |
|---|---|
| Language model backbone (196 layers) | nf4, per-channel |
| Embeddings / LM head (1 layer) | int8_asym, per-channel |
| Vision encoder (104 layers) | int8_sym, per-channel |
Installation
pip install openvino-genai openvino openvino-tokenizers
Usage
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline("Qwen3-VL-2B-Instruct-ov-nf4", "NPU")
print(pipe.generate("Hello!", max_new_tokens=200))
# Optional: tune runtime for best throughput
pipeline_config = {
"MAX_PROMPT_LEN": 1024,
"MIN_RESPONSE_LEN": 256,
"GENERATE_HINT": "BEST_PERF",
"CACHE_DIR": ".npucache",
}
pipe = ov_genai.LLMPipeline("Qwen3-VL-2B-Instruct-ov-nf4", "NPU", pipeline_config)
Export command
optimum-cli export openvino \
--model Qwen/Qwen3-VL-2B-Instruct \
--trust-remote-code \
--weight-format nf4 \
--sym \
--ratio 1.0 \
--group-size -1 \
Qwen3-VL-2B-Instruct-ov-nf4
Additional resources
- Downloads last month
- 13
Model tree for 0ldev/Qwen3-VL-2B-Instruct-ov-nf4-npu
Base model
Qwen/Qwen3-VL-2B-Instruct