# Multi-GPU

RTX PRO 4000 Blackwell workstation cards do not have NVLink. Multi-GPU inference
uses PCIe.

Measurement guidance:

- Report `gpu_count`.
- Say whether the backend used layer split, tensor parallelism, expert
  parallelism, pipeline parallelism, or independent model instances.
- Keep single-GPU and multi-GPU rows separate.
- Do not infer multi-GPU benefit from single-stream tok/s alone; measure
  concurrent aggregate throughput when that is the real workload.
- For TP=2 or all-GPU runs, stop other GPU services first and verify they restart
  after the benchmark window.