vllm version for inference of Qwen/Qwen3-VL-4B-Instruct-FP8 and Qwen/Qwen3-VL-4B-Instruct
#3
by
saiyanhuang
- opened
hi,
anyone who knows which vllm version needed to inference Qwen/Qwen3-VL-4B-Instruct-FP8 and Qwen/Qwen3-VL-4B-Instruct? i use cuda 12.6 and vllm version 0.11.0,but there is an error of CUDA error: the provided PTX was compiled with an unsupported toolchain.