vllm version for inference of Qwen/Qwen3-VL-4B-Instruct-FP8 and Qwen/Qwen3-VL-4B-Instruct

by saiyanhuang - opened Oct 29, 2025

Oct 29, 2025

hi,
anyone who knows which vllm version needed to inference Qwen/Qwen3-VL-4B-Instruct-FP8 and Qwen/Qwen3-VL-4B-Instruct? i use cuda 12.6 and vllm version 0.11.0,but there is an error of CUDA error: the provided PTX was compiled with an unsupported toolchain.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment