How to serve with vLLM

#1
by MTKBW - opened

As title,
could you please describe how to use vLLM for this checkpoint?

Hi @MTKBW ,

Sorry for the late response. Here is the docker run command to use the model.

docker run \ 
--gpus all \ 
-p 8000:8000 \ 
--ipc=host \ 
-v ~/.cache/huggingface:/root/.cache/huggingface \ 
vllm/vllm-openai:v0.13.0 \ 
--model OPENZEKA/Qwen3-Coder-30B-A3B-Instruct-NVFP4 \ 
--max-model-len 262144 \ 
--gpu-memory-utilization 0.90 \ 
--max-num-seqs 1

Sign up or log in to comment