How many GPUs for 8 or higher concurrency using RTX 3090s Rig ?

#9
by BiggestFox - opened

So I have 7 x RTX 3090s split across 2 Servers.
I will need to buy a minimum of 1 more GPU and a better motherboard ( to support having all 8 on it ) just to test trial this model.
However, I need to be able to serve 4-5 concurrent users that likely will fire off concurrent requests ( Software Engineers ).
So I have to calculate how many GPUS I need and which motherboard to be able to serve at least that capacity.

Since no CPU offloading, I suspect I will need around 12 GPUs but likely can get away with x4 PCIe gen 3.0 speeds since no CPU offloading.

Conversely, I do have 512GB of DDR4 RAM ( 8* Hynix 64GB 4DRx4 PC4-2400T LRDIMM DDR4-19200 ECC Load Reduced Server Memory RAM) or alternatively 768 GB of DDR4 using RDDIM ( not LRDIMM - can't mix and match the two sets * ), with 24 x 16gb = 768GB of DDR4 RAM allowing me to run with just 8 GPUs and partial (minimal ) CPU offload ( KV on GPUs and ~60-80% of weights on GPU, the rest on CPU) - is my best guestimate..

So if I go with a higher end EPYC ROME Motherboard I could offload partially I guess, but I need to make sure I get ~35 t/s per each concurrent request, serving ~4-5 users that's likely ~12-16 req in parallel ( so batch 16 peak ) and I don't know if that's possible with possible with partial CPU offload.

Before I shell out another $3K-$5K ( Mobo Combo + 1/2/3 more GPUs ) I need to get a better idea of what I should expect.

Thanks guys,
Eddie.

Hey Eddie,

  • model has quantization issues so better not invest based on model (if you intend to use it for AI assisted SW dev) or look for other models
  • for concurrent requests forget everything that is not GPU based
  • offloading even to the latest gen EPYCs, Xeons or TRs is performance wise a no_go (unless you'd be spending huge amounts)
  • vllm / sglang only accept 2 - 4 - 8 - 16... GPUs package for a good reason
  • multiple nodes at vllm / sglang layer level requires far more than multi-gigabit Ethernet (think: RoCE at 400Gigabits/s)

I am running an 8x RTX3090. If interested in the setup (quite stable) happy to assist.

Sign up or log in to comment