How to serve with vLLM
#1
by
MTKBW - opened
As title,
could you please describe how to use vLLM for this checkpoint?
Hi @MTKBW ,
Sorry for the late response. Here is the docker run command to use the model.
docker run \
--gpus all \
-p 8000:8000 \
--ipc=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
vllm/vllm-openai:v0.13.0 \
--model OPENZEKA/Qwen3-Coder-30B-A3B-Instruct-NVFP4 \
--max-model-len 262144 \
--gpu-memory-utilization 0.90 \
--max-num-seqs 1