Working configuration for Nvidia Blackwell
#4
by luismiguelsaez - opened
Hi folks!
The working vLLM configuration posted by the author doesn't work for dual RTX 6000 Pro, so I'm leaving this here, which is what worked for me:
CUDA_VISIBLE_DEVICES=0,1 \
SAFETENSORS_FAST_GPU=1 \
NCCL_P2P_DISABLE=1 \
NCCL_DEBUG=INFO \
VLLM_LOGGING_LEVEL=INFO \
vllm serve lukealonso/MiniMax-M2.7-NVFP4 \
--trust-remote-code \
--enable_expert_parallel \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--disable-custom-all-reduce \
--kv-cache-dtype fp8 \
--max-num-seqs 2
Hope it's useful for someone!
Looks similar to the vllm config I settled on. I'm also running 2x RTX PRO 6000 Blackwell. I found the performance to be slightly slower than 2.5 with similar setup, same hardware. See my thread posted here yesterday. Another user posted a nice sglang docker-compose that is BLAZING FAST.
Thanks, will have a look at the SGLang compose YAML. Regarding the configuration I used, couldn't make it work without --disable-custom-all-reduce and the NCCL variables, because it got stuck during initialization otherwise.