VLLM example inference server usage:
pip install vllm>=0.17.0
pip install huggingface-hub>=1.6.0 transformers>=5.3.0
vllm serve hhzm/qwen3.5-4b-meow --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder
- Downloads last month
- 3