Multi-GPU

RTX PRO 4000 Blackwell workstation cards do not have NVLink. Multi-GPU inference uses PCIe.

Measurement guidance:

Report gpu_count.
Say whether the backend used layer split, tensor parallelism, expert parallelism, pipeline parallelism, or independent model instances.
Keep single-GPU and multi-GPU rows separate.
Do not infer multi-GPU benefit from single-stream tok/s alone; measure concurrent aggregate throughput when that is the real workload.
For TP=2 or all-GPU runs, stop other GPU services first and verify they restart after the benchmark window.