Multi-GPU
RTX PRO 4000 Blackwell workstation cards do not have NVLink. Multi-GPU inference uses PCIe.
Measurement guidance:
- Report
gpu_count. - Say whether the backend used layer split, tensor parallelism, expert parallelism, pipeline parallelism, or independent model instances.
- Keep single-GPU and multi-GPU rows separate.
- Do not infer multi-GPU benefit from single-stream tok/s alone; measure concurrent aggregate throughput when that is the real workload.
- For TP=2 or all-GPU runs, stop other GPU services first and verify they restart after the benchmark window.