How many tokens/s are you seeing on those two 6000 Blackwell's?
#6
by
zelias
- opened
I've tried running this model on a single pro 6000, but the tokens/s are terrible with the latest vllm image.
About 58-60 t/s from the vllm console with AWQ 4bit. Wondering if I should try this one for a meaningful increase.
GLM4.5 Air 4bit AWQ does 180/s, which is pretty insane difference, considering their respective sizes.
I grabbed a 2 x RTX Pro 6000 Blackwell instance and loaded it up. I'm seeing ~106 tok/s running with the command on my model card. I'm actually surprised there would really be much of a difference running 2 vs 1 with this model since it should easily fit in a single RTX Pro 6000.